Jump to content

Server Admin Log/Archive 29

From Wikitech

2016-05-31

  • 23:57 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set collation to uca-it for it.wikipedia (T136647) (duration: 00m 25s)
  • 23:54 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/GeoData/includes/Hooks.php: Don't index non-Earth coordinates (T136559) (duration: 00m 23s)
  • 23:47 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/GeoData/includes/Hooks.php: Don't index non-Earth coordinates (T136559) (duration: 00m 24s)
  • 23:45 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/ve.init.MWWelcomeDialog.js: ve.init.MWWelcomeDialog: Fix keyboard focus on dialog actions (T135808) (duration: 00m 22s)
  • 23:43 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/VisualEditor/modules/ve-mw/init/ve.init.MWWelcomeDialog.js: ve.init.MWWelcomeDialog: Fix keyboard focus on dialog actions (T135808) (duration: 00m 23s)
  • 23:37 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Turn off textcat subtest of search satisfaction (T134319) (duration: 00m 23s)
  • 23:33 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/Echo/modules/ui/mw.echo.ui.NotificationItemWidget.js: Adjust styling for Special:Notification items (T136572, 2/2) (duration: 00m 24s)
  • 23:33 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/Echo/modules/styles/: Adjust styling for Special:Notification items (T136572, 1/2) (duration: 00m 30s)
  • 23:28 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Turn off textcat subtest of search satisfaction (T134319) (duration: 00m 30s)
  • 22:38 tgr: running "mwscript sql.php --wiki=zerowiki /srv/mediawiki/php-1.28.0-wmf.4/maintenance/archives/patch-bot_passwords.sql" for T135074
  • 21:47 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.4
  • 21:43 logmsgbot: thcipriani@tin Synchronized wmf-config/abusefilter.php: Workaround for T136644 (duration: 00m 30s)
  • 21:34 awight: Enable PayPal Express Checkout in paymentswiki config.
  • 21:33 urandom: Bouncing restbase on restbase1015.eqiad.wmnet
  • 21:32 urandom: Bouncing restbase on restbase1013.eqiad.wmnet
  • 21:32 urandom: Bouncing restbase on restbase1012.eqiad.wmnet
  • 21:30 urandom: Bouncing restbase on restbase1009.eqiad.wmnet
  • 21:29 urandom: Bouncing restbase on restbase1008.eqiad.wmnet
  • 21:28 urandom: Bouncing restbase on restbase1010.eqiad
  • 20:58 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 back to 1.28.0-wmf.3
  • 20:55 thcipriani: rolling back group0 wmf.4 for T136644 too much log spam
  • 20:39 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.4
  • 20:35 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.3/extensions/Wikidata/extensions/Wikibase/view/resources/jquery/ui/jquery.ui.tagadata.js: Update Wikibase (duration: 00m 23s)
  • 20:33 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.4/extensions/Wikidata/extensions/Wikibase/view/resources/jquery/ui/jquery.ui.tagadata.js: Update Wikibase (duration: 00m 30s)
  • 20:28 bblack: depooled reboot of cp3043 - T126062
  • 20:23 bblack: depooled reboot of cp3042 - T126062
  • 20:13 logmsgbot: thcipriani@tin Finished scap: testwiki to php-1.28.0-wmf.4 and rebuild l10n cache (duration: 50m 09s)
  • 20:10 urandom: disabling cql binary transport on restbase1007-c
  • 20:09 bblack: depooled reboot of cp3041 - T126062
  • 20:02 bblack: depooled reboot of cp3033 (not 3032) - T126062
  • 20:02 bblack: depooled reboot of cp3032 - T126062
  • 19:57 bblack: depooled reboot of cp3031 - T126062
  • 19:50 bblack: depooled reboot of cp3030 - T126062
  • 19:45 bblack: restarting cp* nginxes for config update
  • 19:23 logmsgbot: thcipriani@tin Started scap: testwiki to php-1.28.0-wmf.4 and rebuild l10n cache
  • 19:14 urandom: Restart restbase1007-c.eqiad.wmnet because reasons
  • 18:13 Krinkle: mwscript deleteEqualMessages.php --wiki hrwikibooks (T45917)
  • 18:07 ejegg: updated payments-wiki from e6807395d7687d521070b83d159b77b242e5c04f to 5bb160e9898224e1d7d0a5c57fe408edb998a262
  • 17:26 jynus: restarting mysqls at sanitarium, some transitional lag on labs
  • 17:17 bblack: depooled reboot of cp3040 - T126062
  • 17:03 thcipriani: starting branch-cut for mediawiki and extensions for version 1.28.0-wmf.4
  • 16:52 elukey: disabling puppet on mc10* hosts as prep step for https://gerrit.wikimedia.org/r/#/c/291916. Memcached 1.4.25 will be deployed to mc1010 as part of a perf. test (T129963)
  • 16:15 urandom: Upgrade of restbase1007-{a,b,c} complete : T126629
  • 16:10 urandom: Upgrading Cassandra to 2.2.6 on restbase1007.eqiad.wmnet : T126629
  • 16:07 bblack: depooling cp3032 to investigate T126062
  • 16:06 urandom: Stopping restbase1007-{a,b,c} in preparation for upgrade : T126629
  • 15:51 urandom: Disabling puppet in preparation for upgrade on restbase1007, 1010, and 1011 : T126629
  • 15:10 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.3/extensions/EventBus/EventBus.hooks.php: SWAT: Replace wfUrlEncode with rawurlencode (duration: 00m 27s)
  • 15:07 moritzm: rebooting mendelevium (ticket.wikimedia.org) for update to Linux 4.4
  • 15:06 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Math: Enable MathML everywhere but private wikis (duration: 00m 34s)
  • 15:03 bblack: restarting all frontend caches for new memory params (randomized order, ~1-min spacing, ~2h to completion) - T135384
  • 14:54 moritzm: rolling reboot of scb systems in codfw for Linux 4.4 upgrade
  • 12:21 moritzm: continue rolling reboot of mc2* systems for Linux 4.4 upgrade
  • 08:50 mobrovac: restbase deploy end of fcd62e1
  • 08:34 mobrovac: restbase deploy start of fcd62e1
  • 08:04 jynus: db1049> megacli -PDOnline -PhysDrv '[32:4]' -a0
  • 07:46 mobrovac: change-prop deploying 980f65c
  • 07:15 jynus: db1049> megacli -PDOffline -PhysDrv '[32:4]' -a0
  • 06:20 moritzm: upgrading hhvm in eqiad (also picking up updated versions of icu and lcms)
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue May 31 02:31:36 UTC 2016 (duration 5m 54s)
  • 02:25 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 09m 30s)

2016-05-30

  • 23:04 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Update BetaFeatures whitelist (duration: 00m 32s)
  • 18:50 Krinkle: mwscript deleteEqualMessages.php --wiki nvwiki (T45917)
  • 18:15 logmsgbot: jzerebecki@tin Synchronized php-1.28.0-wmf.3/extensions/Wikidata/vendor/wikibase/javascript-api/WikibaseJavaScriptApi.php: Wikidata WikibaseJavaScriptApi: Fix getLocationAgnosticMwApi behavior in Internet Explorer b6ae82c71af3d9361cfb9e8d4e6e45bcd5ee9b26 2 of 2 T136543 (duration: 00m 24s)
  • 18:14 logmsgbot: jzerebecki@tin Synchronized php-1.28.0-wmf.3/extensions/Wikidata/vendor/wikibase/javascript-api/src/getLocationAgnosticMwApi.js: Wikidata WikibaseJavaScriptApi: Fix getLocationAgnosticMwApi behavior in Internet Explorer b6ae82c71af3d9361cfb9e8d4e6e45bcd5ee9b26 1 of 2 T136543 (duration: 00m 26s)
  • 17:41 ori: Synced composer.{json,lock} and multiversion for I5ac86f190b
  • 17:27 logmsgbot: krinkle@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiQueryRevisions.php: T136375 (duration: 00m 52s)
  • 15:37 logmsgbot: dcausse@tin Synchronized wmf-config/CommonSettings.php: Make VE RB URLs domain-relative (duration: 00m 26s)
  • 15:17 logmsgbot: dcausse@tin Synchronized wmf-config/InitialiseSettings.php: Changetags should be granted only to sysops and bots in ruwiki (duration: 00m 26s)
  • 15:10 logmsgbot: dcausse@tin Synchronized wmf-config: Send wmf.4 search and ttmserver traffic to codfw (duration: 00m 33s)
  • 14:07 moritzm: rolling reboot of mc2* to Linux 4.4
  • 13:53 logmsgbot: jmm@tin Synchronized wmf-config/CommonSettings.php: firejail security hardening for image scalers (duration: 00m 38s)
  • 13:52 moritzm: enable firejail on image scalers
  • 12:59 gehel: disabling warmers elasticsearch codfw cluster (T133125)
  • 12:53 hashar: Upgrading Zuul 1cc37f7..66c8e52 T128569
  • 12:35 gehel: re-enabling puppet on elasticsearch codfw cluster (T133125)
  • 12:07 gehel: nginx restarted on elasticsearch codfw cluster (T133125)
  • 11:42 moritzm: upgrading hhvm in codfw (also picking up updated versions of icu and lcms)
  • 10:57 moritzm: upgrading hhvm on remaining canaries (also picking up updated versions of icu and lcms)
  • 10:35 hashar: Restarted Zuul.
  • 10:31 moritzm: upgrading hhvm on mw1017 (also picking up updated versions of icu and lcms)
  • 10:30 hashar: Zuul deadlocked :(
  • 10:12 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Reduce db1071 load (duration: 00m 48s)
  • 09:57 moritzm: installing libidn security updates on jessie systems
  • 09:09 gehel: shutting down elasticsearch on codfw for upgrade (T133125)
  • 08:44 volans: Align thread_pool_max_threads to my.cnf value on 1 slave/shard in eqiad (db1065,db1076,db1078,db1040,db1026,db1061,db1039) T133333
  • 08:27 gehel: starting elasticsearch upgrade on codfw (T133125)
  • 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon May 30 02:30:46 UTC 2016 (duration 5m 52s)
  • 02:24 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 09m 04s)

2016-05-29

  • 02:58 urandom: Bootsrapping restbase2004-b.codfw.wmet : T134016
  • 02:27 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun May 29 02:27:27 UTC 2016 (duration 5m 27s)
  • 02:22 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 08m 35s)

2016-05-28

  • 19:47 hoo: Updated Wikidata's property suggester with data from Monday's json dump and removed the external identifiers as a workaround for T132839
  • 13:59 urandom: Bootstrapping restbase1011-c.eqiad.wmnet : T134016
  • 02:26 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat May 28 02:26:10 UTC 2016 (duration 5m 47s)
  • 02:20 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 08m 20s)

2016-05-27

  • 23:58 ori: Forcing a Puppet run on logstash*
  • 23:44 legoktm: ran extensions/GlobalBlocking/fixGlobalBlockWhitelist.php for T56496
  • 23:15 logmsgbot: legoktm@tin Synchronized php-1.28.0-wmf.3/includes/actions/RollbackAction.php: RollbackAction: Don't return true, causes '1' to be output (duration: 00m 34s)
  • 22:50 legoktm: mwscript initSiteStats.php --wiki=lrcwiki --update for T109635
  • 19:49 logmsgbot: krinkle@tin Synchronized php-1.28.0-wmf.3/includes/Linker.php: Iba17ce55ff9 (duration: 00m 25s)
  • 19:49 logmsgbot: krinkle@tin Synchronized php-1.28.0-wmf.3/includes/actions/RollbackAction.php: Iba17ce55ff9 (duration: 00m 31s)
  • 19:06 paravoid: un-draining esams
  • 18:15 thcipriani: restarted zuul due to deadlock issue
  • 17:10 volans: Stop slave, stop mysql and shutdown es2017 and es2019 for hardware maintenance T130702
  • 16:35 dcausse: elasticsearch in codfw: creating jamwiki index
  • 16:34 volans: Align runtime MySQL max_connections on codfw masters with the my.cnf ones T133333
  • 16:04 paravoid: shutting down ms-fe3001/ms-fe3002
  • 15:26 paravoid: draining esams for network maintenance
  • 15:02 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.3/extensions/CentralNotice/resources/subscribing: rv due to T136387 (duration: 00m 36s)
  • 14:36 paravoid: restarting pybal on lvs3003/lvs3004
  • 14:19 mobrovac: change-prop deployed 3747ebd
  • 13:54 urandom: Bootstrapping restbase2003-b.codfw.wmnet : T134016
  • 12:27 hashar: CI/Zuul deadlocked quickly due to a dependency set on a repository not known to Zuul
  • 12:22 volans: Align runtime MySQL configurations on codfw slaves with the my.cnf ones T133333
  • 11:27 elukey: restarted jmxtrans on kafka10* hosts
  • 10:11 volans: restarting MySQL on db2038 to test change 286858 - T133333
  • 10:04 moritzm: disable firejail test on mw1153, all went well, but rather revert back since it's Friday and enable this along with the other image scalers on Monday
  • 09:14 moritzm: enable firejail for image scaling on mw1153 as a canary
  • 08:57 volans: Set sync_binlog=1 on db2011 (m2) and monitoring it. T133333
  • 08:00 elukey: restarted memcached on mc1009 to collect metrics for T129963
  • 07:04 _joe_: updating HHVM on the remaining hosts (mira, wasat, snapshot1*)
  • 02:33 ejegg: rolled back payments from de0398a244094f2bad6bc70eefce8388a616e575 to e6807395d7687d521070b83d159b77b242e5c04f
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri May 27 02:31:01 UTC 2016 (duration 9m 12s)
  • 02:21 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 07m 50s)
  • 01:18 ejegg: updated payments from e6807395d7687d521070b83d159b77b242e5c04f to de0398a244094f2bad6bc70eefce8388a616e575
  • 00:48 mutante: purging pk.wikimedia.org from varnish, cache_text non-eqiad backends, then frontends
  • 00:48 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTargetLoader.js: https://gerrit.wikimedia.org/r/#/c/291145/ (duration: 00m 23s)
  • 00:30 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.3/extensions/WikimediaEvents/extension.json: https://gerrit.wikimedia.org/r/#/c/291143/1 (duration: 00m 28s)
  • 00:28 mutante: purging pk.wikimedia.org from varnish, cache_text eqiad backends

2016-05-26

  • 23:56 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.3/extensions/CentralNotice/resources/subscribing: https://gerrit.wikimedia.org/r/#/c/291120/1 (duration: 00m 24s)
  • 23:54 ejegg: updated SmashPig from f0bf4385afac65a27f99c5f657c3d0931c991fa8 to 90757321a3bfa1045202e06e3dd1960a0043493a
  • 23:40 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.3/extensions/VisualEditor/modules/ve-mw/dm/models/ve.dm.MWTransclusionModel.js: https://gerrit.wikimedia.org/r/#/c/290994/ (duration: 00m 25s)
  • 23:36 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.3/extensions/Math/modules/ve-math/ve.ui.MWMathContextItem.js: touch (duration: 00m 27s)
  • 23:33 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.3/extensions/Math/modules/ve-math/ve.ui.MWMathContextItem.js: https://gerrit.wikimedia.org/r/#/c/290971/ (duration: 00m 28s)
  • 23:14 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.3/extensions/OATHAuth/special/SpecialOATHEnable.php: https://gerrit.wikimedia.org/r/#/c/291007/ (duration: 00m 39s)
  • 22:30 urandom: Bootstrapping restbase2007-c.codfw.wmnet : T134016
  • 21:59 logmsgbot: ori@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiStashEdit.php: 8521b7b069: Send edit stash metrics for cache attempts (duration: 00m 25s)
  • 21:50 logmsgbot: legoktm@tin Synchronized php-1.28.0-wmf.2/includes/api/ApiStashEdit.php: Bail out in ApiStashEdit for bots for sanity (duration: 00m 24s)
  • 21:49 logmsgbot: legoktm@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiStashEdit.php: Bail out in ApiStashEdit for bots for sanity (duration: 00m 25s)
  • 21:43 logmsgbot: legoktm@tin Synchronized php-1.28.0-wmf.3/includes/title/MediaWikiTitleCodec.php: TitleParser: In formatTitle(), don't throw exceptions on bad namespaces - T136352, T136356 (duration: 00m 28s)
  • 21:12 mutante: running update-ubuntu-mirror on carbon to check for T136307
  • 20:38 logmsgbot: twentyafterfour@tin Synchronized php-1.28.0-wmf.3/includes/specials/SpecialSearch.php: deploy hotfix for itwiki search T136356 (duration: 00m 23s)
  • 19:41 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.3
  • 19:31 logmsgbot: aaron@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiStashEdit.php: 9a9ec26d25 (duration: 00m 24s)
  • 19:28 Dereckson: mwscript initSiteStats.php --wiki fowiki --update (T136353)
  • 17:56 mdholloway: mobileapps finished deploying 5ce4f31 (n.b. last deployment, on 23 May, appears to have re-deployed b8c396a)
  • 17:37 mdholloway: starting mobileapps deployment
  • 15:45 jynus: applying schema change to s3 hosts echo wikis T135699
  • 15:31 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.3/resources/src/mediawiki.special/mediawiki.special.search.css: SWAT: [[gerrit:290710|Fix regression: text color in .mw-search-result-data (duration: 00m 27s)
  • 15:27 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: CommonSettings: cleanup temp cache file if rename fails (duration: 00m 30s)
  • 15:20 kart_: Update cxserver to b431aef
  • 15:12 urandom: Starting cleanup of restbase1012-a.eqiad.wmnet
  • 15:07 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.3/extensions/EventBus: SWAT: Use getPrefixedURL and getPrefixedDBkey instead of getText (duration: 00m 35s)
  • 15:06 urandom: Bootstrapping restbase1014-c.eqiad.wmnet : T134016
  • 13:52 jynus: restarting es2017 for kernel upgrade
  • 12:39 jynus: powercycling es2017
  • 12:31 jynus: updating user table on labswiki to fix incorrect encoding T131630
  • 12:07 moritzm: rolling reboot of restbase-test cluster
  • 11:55 moritzm: rebooting mx2001 for kernel update to Linux 4.4
  • 09:58 _joe_: starting updateCollations.php forced run on all wikis with uca category collation
  • 09:28 _joe_: all traffic serving appservers are now running with libicu52 (T86096)
  • 09:07 jynus: converting user table on labswiki to utf8
  • 09:05 mobrovac: restbase deployment end of bd38b1b
  • 08:58 logmsgbot: dcausse@tin Synchronized wmf-config/CirrusSearch-labs.php: Cirrus: disable the safeifier in labs (duration: 00m 25s)
  • 08:55 mobrovac: restbase deployment start of bd38b1b
  • 08:53 logmsgbot: dcausse@tin Synchronized wmf-config/CirrusSearch-labs.php: Cirrus: disable the safeifier in labs (duration: 02m 36s)
  • 08:45 moritzm: powercycling snapshot1004 (stuck after reboot)
  • 08:36 moritzm: installing openssh security updates on trusty systems
  • 08:15 _joe_: upgrading hhvm on eqiad's appserver cluster, (T86096)
  • 07:50 _joe_: upgrading hhvm on eqiad's api cluster, (T86096)
  • 07:43 mobrovac: restbase starting partial mobile-sections dump of enwiki for T135571 on restbase1009
  • 07:36 _joe_: upgrading hhvm on eqiad jobrunners, tin + terbium (T86096)
  • 07:31 dcausse: elastic: updating cirrussearch warmers on eqiad and codfw
  • 07:29 _joe_: upgrading hhvm on the eqiad imagescalers, T86096
  • 06:41 _joe_: upgrading hhvm on the eqiad canaries, T86096
  • 05:50 _joe_: starting upgrades of hhvm to newer libicu in codfw (T86096)
  • 03:05 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu May 26 03:05:45 UTC 2016 (duration 9m 27s)
  • 02:56 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 15m 49s)
  • 02:24 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 10m 33s)
  • 01:47 mutante: mw1249 - restart hhvm
  • 00:57 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/WikimediaMessages/i18n/wikimedia: Add i18n messages for new Support and Safety group (duration: 00m 26s)
  • 00:52 logmsgbot: aaron@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiMain.php: 01e68e966413c (duration: 00m 29s)
  • 00:27 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu May 26 00:27:50 UTC 2016 (duration 10m 15s)
  • 00:17 logmsgbot: dereckson@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 16m 15s)

2016-05-25

  • 23:46 logmsgbot: dereckson@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 10m 12s)
  • 23:44 urandom: Bootstrapping restbase2005-b.codfw.wmnet : T134016
  • 22:05 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: wikidata back to 1.28.0-wmf.3
  • 22:04 twentyafterfour: re-re-verting wikidata back to wmf.3
  • 22:00 logmsgbot: twentyafterfour@tin Synchronized php-1.28.0-wmf.3: (no message) (duration: 08m 18s)
  • 21:52 twentyafterfour: syncing https://gerrit.wikimedia.org/r/#/c/290789/ and https://gerrit.wikimedia.org/r/#/c/290799/
  • 19:31 urandom: Start cleanup of restbase2001-b.codfw.wmnet : T1340116
  • 19:14 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: roll back wikidata to wmf.2
  • 19:07 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.3
  • 18:58 mobrovac: restbase running a partial mobile-sections dump of eswiki for T135571 on restbase1009
  • 18:25 bblack: restarting varnish-frontend on cp3048 with lg_dirty_multi:6 unpuppetized - T135384
  • 18:04 logmsgbot: aaron@tin Synchronized wmf-config/db-eqiad.php: Fix slave lag wait calls for read-only ES clusters (duration: 00m 27s)
  • 18:03 logmsgbot: aaron@tin Synchronized wmf-config/db-codfw.php: Fix slave lag wait calls for read-only ES clusters (duration: 00m 23s)
  • 18:02 logmsgbot: aaron@tin scap aborted: file wmf-config/db-codfw.php Fix slave lag wait calls for read-only ES clusters (duration: 02m 53s)
  • 18:00 logmsgbot: aaron@tin Started scap: file wmf-config/db-codfw.php Fix slave lag wait calls for read-only ES clusters
  • 17:56 bblack: rolling restart of global text frontend memory caches for upsizing - reduce spacing to 5 mins, will finish notably faster - T135384
  • 17:25 logmsgbot: dpatrick@tin Synchronized wmf-config/InitialiseSettings.php: Enabling OATHAuth on CentralAuth wikis (duration: 00m 24s)
  • 16:40 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1056 after hardware maintenance (duration: 00m 23s)
  • 16:13 jynus: reloading haproxy config to repool db1047
  • 16:13 volans: kill very long query on db1047 (ID 89274525, client disconnected) T136214
  • 15:54 hashar: puppet fails on gallium (Precise) due to E: Unable to locate package firejail
  • 15:33 bblack: rolling restart of global text frontend memory caches for upsizing - 15 min spacing, ~8H to completion - T135384
  • 15:28 jynus: applying third schema change on x1 hosts T135699
  • 15:27 _joe_: imaging mw1261, with debian jessie
  • 15:25 logmsgbot: thcipriani@tin Synchronized portals: SWAT: T136019 updating portal stats. (duration: 00m 26s)
  • 15:24 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: SWAT: T136019 updating portal stats. (duration: 00m 23s)
  • 15:05 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Math: Enable MathML on all wikibooks (duration: 00m 29s)
  • 15:01 mobrovac: restbase cassandra truncated local_group_globaldomain_T_mathoid_svg.data
  • 15:00 urandom: Bootstrap restbase1012-c.eqiad.wmnet : T134016
  • 14:50 moritzm: restarted hhvm on mw1116 and mw1117 (got stuck, output of hhvm-dump-debug available)
  • 14:25 jynus: shutting down db1056 for hardware maintenance
  • 14:24 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1056 for hardware maintenance (duration: 00m 24s)
  • 14:23 urandom: Starting cleanup on restbase1010-a.eqiad.wmnet : T134016
  • 13:31 logmsgbot: oblivian@tin Synchronized wmf-config/InitialiseSettings.php: revert uca-ta collation (duration: 00m 31s)
  • 11:12 mobrovac: restbase deploy end of 8f39fa4
  • 11:02 mobrovac: restbase deploy start of 8f39fa4
  • 09:42 elukey: restarted gmetad on uranium to test if new memcached metrics would be picked up (T129963)
  • 09:26 akosiaris: mw1114's logs and ganglia indicate OOM error.
  • 09:19 akosiaris: powercycling mw1114
  • 09:02 volans: Apply grant for tendril on labservices1002 T106303
  • 09:02 akosiaris: upload php5_5.3.10-1ubuntu3.23+wmf1 on apt.wikimedia.org/precise-wikimedia
  • 08:39 jynus: perform second schema change on x1 wikis for T135699
  • 08:27 moritzm: restarting apache on uranium
  • 08:08 moritzm: installing php5 security updates on trusty systems
  • 07:41 jynus: rm runJobs.log-20160[1-3]*.gz on fluorine archive log
  • 02:00 mutante: reboot unresponse mw1140
  • 01:48 logmsgbot: dereckson@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 15m 24s)
  • 00:22 logmsgbot: dereckson@tin Synchronized /srv/mediawiki-staging/php-1.28.0-wmf.3/extensions/Flow/maintenance/FlowRemoveOldTopics.php: Don't assume workflows/revisions are inserted in chronological order (T119509) (duration: 00m 23s)
  • 00:16 logmsgbot: dereckson@tin Synchronized /srv/mediawiki-staging/php-1.28.0-wmf.2/extensions/Flow/maintenance/FlowRemoveOldTopics.php: Don't assume workflows/revisions are inserted in chronological order (T119509) (duration: 00m 28s)

2016-05-24

  • 23:58 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.2/extensions/MobileFrontend/tests/phpunit/MobileFormatterTest.php: "Pi" article on mobile en.wp throws a 503 fatal (no-op, Gerrit:290587) (duration: 00m 23s)
  • 23:58 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.2/extensions/MobileFrontend/includes/MobileFormatter.php: "Pi" article on mobile en.wp throws a 503 fatal (T135923, Gerrit:290587 + Gerrit:290598) (duration: 00m 24s)
  • 23:46 Dereckson: scap pull on mw1140 (duration: 02m 42s)
  • 23:20 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set Tamil projects to use uca-ta collation (T75453) (duration: 02m 18s)
  • 23:07 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Adding WMF Support and Safety user groups to meta (T136046) (duration: 00m 26s)
  • 22:29 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.3
  • 22:25 logmsgbot: twentyafterfour@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 00m 37s)
  • 22:21 logmsgbot: twentyafterfour@tin Finished scap: sync testwiki to wmf/1.28.0-wmf.3 (duration: 41m 48s)
  • 21:40 logmsgbot: twentyafterfour@tin Started scap: sync testwiki to wmf/1.28.0-wmf.3
  • 21:24 volans: set innodb_flush_log_at_trx_commit=0 on db1056 that is lagging behind as a temporary measure
  • 21:13 Krinkle: mwscript deleteEqualMessages.php --wiki mrwiki (T45917)
  • 21:08 mutante: mw1137,mw1146 restarted hhvm service
  • 20:34 mutante: ocg1003 - reinstalled, replaced puppet cert, salt key..re-added
  • 20:07 Krinkle: mwscript deleteEqualMessages.php --wiki diqwiki (T45917)
  • 19:55 Krinkle: mwscript deleteEqualMessages.php --wiki cywiktionary (T45917)
  • 19:50 mutante: mw1133 - Could not allocate memory, restarted hhvm service, ran puppet
  • 19:48 mutante: mw1142 - Could allocate memory on puppet run, restarted hhvm service, that fixed it
  • 19:44 Krinkle: mwscript deleteEqualMessages.php --wiki warwiki (T45917)
  • 19:14 mutante: ocg1003, powercycle for reinstall, scheduled downtime
  • 19:06 urandom: Performing cleanup on restbase2001-a.codfw.wmnet
  • 19:05 urandom: Performing cleanup on restbase2003-a.codfw.wmnet
  • 18:59 urandom: Performing cleanup on restbase2008-a.codfw.wmnet
  • 18:55 urandom: Performing cleanups on restbase2004.codfw.wmnet
  • 18:28 urandom: Starting bootstrap of restbase1010-c.eqiad.wmnet : T134016
  • 18:08 urandom: Starting bootstrap of restbase2006-b.codfw.wmnet : T95253
  • 17:38 elukey: soft rebooted mw1134 due to unresponsiveness (no root login, no ssh login, memory exhaustion from server-board)
  • 17:22 yurik: deployed & restarted kartotherian & tilerator. https://gerrit.wikimedia.org/r/#/c/290494/ https://gerrit.wikimedia.org/r/#/c/290497/
  • 16:11 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Compact Language Links as default in Beta PART II (duration: 00m 25s)
  • 16:10 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Enable Compact Language Links as default in Beta PART I (duration: 00m 23s)
  • 16:05 Krinkle: mwscript deleteEqualMessages.php --wiki fiwikinews (T45917)
  • 15:57 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Enable Compact Language Links as default in Beta PART II (duration: 00m 25s)
  • 15:57 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Compact Language Links as default in Beta PART I (duration: 00m 23s)
  • 15:26 _joe_: shutting down mw2001-2060
  • 15:11 logmsgbot: thcipriani@tin Synchronized wmf-config/Wikibase.php: SWAT: Make entityNamespaces setting available to Wikibase Client wikis (duration: 00m 29s)
  • 15:01 _joe_: revoking certs, puppet facts, salt keys for mw2001-60
  • 14:01 jynus: dropping labswiki.bounce_records on db1-BETA
  • 13:29 elukey: restarted hhvm on mw1145 (ran hhvm-dump-debug, output available in hhvm.14155.bt)
  • 13:23 moritzm: reverted net.netfilter.nf_conntrack_tcp_timeout_time_wait on kafka1013 back to 65 (as it should have been set by sysctl.d)
  • 12:49 elukey: stopping kafka on kafka1013 and rebooting the host for kernel upgrade
  • 12:33 jynus: deploying GTID to all codfw db production hosts
  • 12:10 _joe_: cleaning puppet facts, salt keys for mw2001-2060
  • 11:53 mobrovac: restbase resting nodes to pick up https://gerrit.wikimedia.org/r/#/c/290264/
  • 11:22 jynus: enabling GTID on s1 codfw db servers
  • 11:21 moritzm: rolling restart of elasticsearch in logstash cluster to pick up openjdk security update
  • 10:50 _joe_: disabling puppet on mw2001-60 (minus 2017) for decommissioning
  • 10:32 jynus: creating backup of beta database just in case T119567
  • 10:05 Nemo_bis: Phabricator search by keyword (aka field "contains words") has been down about 20 min, looks ok now.
  • 09:51 godog: reenable puppet on graphite1001 T135385
  • 09:51 jynus: applying schema change to x1 hosts T135699
  • 09:18 godog: reboot restbase2003 for multi-instance conversion T113714
  • 08:45 godog: reboot restbase2006 for multi-instance conversion T113714
  • 08:12 moritzm: enabled base::firewall on potassium (pool counter)
  • 07:56 mobrovac: restbase deploy end of a5d00d1
  • 07:33 mobrovac: restbase deploy start of a5d00d1
  • 07:17 mobrovac: change-prop deploying 20eda89
  • 02:25 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 11m 38s)

2016-05-23

  • 23:36 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on frwikivoyage (T135702) (duration: 00m 25s)
  • 23:26 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.2/extensions/Flow/maintenance/FlowRemoveOldTopics.php: More reliable post sorting (T119509, 2/2) (duration: 00m 34s)
  • 23:24 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.2/extensions/Flow/includes/Data/Index/BoardHistoryIndex.php: More reliable post sorting (T119509, 1/2) (duration: 00m 26s)
  • 21:36 andrewbogott: reimaging holmium to labservices1002.
  • 21:14 ejegg: updated payments from 76b5c559e08b340fb3d66254b27f624a9c0a4b95 to e6807395d7687d521070b83d159b77b242e5c04f
  • 21:11 ejegg: rolled back payments to 76b5c559e08b340fb3d66254b27f624a9c0a4b95
  • 21:09 ejegg: updated payments from 76b5c559e08b340fb3d66254b27f624a9c0a4b95 to e6807395d7687d521070b83d159b77b242e5c04f
  • 20:52 awight: Update SmashPig from 5cadcf3abcfcda4552b068c783337d82b743e2e5 to f0bf4385afac65a27f99c5f657c3d0931c991fa8
  • 20:36 bearND: mobileapps deployed cd76f5a
  • 20:18 bearND: starting mobileapps deploy
  • 19:55 gehel: putting wdqs1001 out of maintenance
  • 19:54 gehel: deployed latest WDQS version
  • 19:35 hashar: killed all mysqld process on Trusty CI slaves
  • 19:32 gehel: putting wdqs1001 in maintenance to fix deployment issues
  • 19:11 legoktm: manually kicking stuck global renames (T135656)
  • 19:01 awight: rollback SmashPig from f0bf4385afac65a27f99c5f657c3d0931c991fa8 to 5cadcf3abcfcda4552b068c783337d82b743e2e5
  • 19:01 awight: update SmashPig from 5cadcf3abcfcda4552b068c783337d82b743e2e5 to f0bf4385afac65a27f99c5f657c3d0931c991fa8
  • 18:50 awight: Rollback SmashPig from aa1614afa845358669208c2f6c4cd62e83a98f4c to 5cadcf3abcfcda4552b068c783337d82b743e2e5
  • 18:49 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/290290 (duration: 00m 38s)
  • 18:39 awight: Disable Adyen SmashPig runner and just stop spamming the server admin log.
  • 18:39 awight: Reenable Adyen SmashPig job runner
  • 18:38 awight: Paused Adyen SmashPig job runner
  • 18:34 awight: update SmashPig from 5cadcf3abcfcda4552b068c783337d82b743e2e5 to aa1614afa845358669208c2f6c4cd62e83a98f4c
  • 18:25 andrewbogott: temporarily turning off pdns and recursor on holmium (https://phabricator.wikimedia.org/T106303)
  • 18:10 gehel: removing maintenance from wdqs1001
  • 17:35 gehel: putting wdqs1001 in maintenance to fix deployment issues
  • 17:13 andrewbogott: rebooting labvirt1003
  • 17:10 elukey: restarting Yarn Resource manager (master node) on analytics1001 to apply a new Spark configuration. The service will automatically failover to analytics1002
  • 16:45 bd808: Stashbot back online. Will continue to monitor for a while to see if ES cluster is happier.
  • 16:32 bd808: Stashbot down due to backing elasticsearch cluster instability. Investigating.
  • 16:28 urandom: Bouncing RESTBase on restbase-test200[1-3].codfw.wmnet
  • 16:12 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.2/extensions/Wikidata/extensions/Wikibase/client/includes/Hooks/DataUpdateHookHandlers.php: Update Wikidata - fix file deletion issue on commons (duration: 00m 29s)
  • 16:02 elukey: restarting yarn on analytics10* hosts to pick up the new Spark shuffler process
  • 16:02 volans: testing thread_pool_max_threads=2000 on db1072 (s1) [instead of db1076 (s2)] T133333
  • 15:47 volans: testing thread_pool_max_threads=2000 on db1076 (s2) T133333
  • 15:42 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Final Commons configuration for $wgUploadDialog (duration: 00m 28s)
  • 15:32 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: revert Final Commons configuration for $wgUploadDialog (duration: 00m 28s)
  • 15:29 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Final Commons configuration for $wgUploadDialog (duration: 00m 30s)
  • 15:21 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Set interwiki sorting order for West Frisian Wikibooks (duration: 00m 25s)
  • 15:13 jynus: performing schema change on s3 T130692
  • 15:11 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Creation of page mover userright on enwiki (duration: 00m 30s)
  • 15:06 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Adjust groups permissions on fa.wikipedia (duration: 00m 41s)
  • 14:44 godog: reboot restbase2005 in single user mode for T113714
  • 14:43 logmsgbot: filippo@palladium conftool action : set/pooled=no; selector: restbase2005.codfw.wmnet
  • 12:57 moritzm: rolling restart of cassandra on maps-test cluster for openjdk security update
  • 12:44 mobrovac: restbase restarting to apply https://gerrit.wikimedia.org/r/#/c/289092/
  • 12:33 jynus: stopping, backing up and reimage db1016 T135973 (it will also affect db2010 lag)
  • 12:15 logmsgbot: filippo@palladium conftool action : set/pooled=yes; selector: restbase2009.codfw.wmnet
  • 12:15 logmsgbot: filippo@palladium conftool action : set/pooled=yes; selector: restbase2008.codfw.wmnet
  • 12:15 logmsgbot: filippo@palladium conftool action : set/pooled=yes; selector: restbase2007.codfw.wmnet
  • 11:42 mobrovac: restbase deploying 75a94ee to restbase2009
  • 11:24 godog: run puppet and roll-restart cassandra-metrics-collector on restbase codfw/eqiad
  • 11:20 godog: deploy new version of cassandra-metrics-collector T135385
  • 10:57 moritzm: restarting hhvm on app servers in codfw for librsvg update
  • 10:32 _joe_: running updateCollations.php --force on ptwiki, T58041
  • 10:24 moritzm: rolling restart of restbase1* for openjdk-8 update
  • 10:00 moritzm: reverted net.netfilter.nf_conntrack_tcp_timeout_time_wait on kafka1013 back to 65 (as set by default by puppet)
  • 08:57 jynus: stopping, backing up and reimage db2010 T135973
  • 08:44 moritzm: rolling restart of restbase2* for openjdk-8 update
  • 08:20 moritzm: install restarting hhvm on canary systems for librsvg update
  • 08:01 jynus: performing schema change on s7 T130692
  • 07:00 _joe_: installed the new hhvm package on mw2017, T86096
  • 06:55 _joe_: uploaded a new hhvm package for trusty linked to libicu52, T86096
  • 06:42 elukey: Removed Kafka temp. override for webrequest_upload retention.ms after freeing some disk space.
  • 06:31 elukey: Set kafka retention.ms=172800000 for the topic webrequest_upload to free some disk space on kafka1022
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon May 23 02:33:00 UTC 2016 (duration 8m 38s)
  • 02:24 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 10m 41s)

2016-05-22

  • 17:16 jynus: defragmenting db1028
  • 14:38 jynus: trying to restart kraz and planet2001 (both service and console unresponsive)
  • 14:02 jynus: performing schema change on s6 T130692
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun May 22 02:31:13 UTC 2016 (duration 8m 45s)
  • 02:22 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 09m 57s)

2016-05-21

  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat May 21 02:32:13 UTC 2016 (duration 8m 44s)
  • 02:23 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 10m 22s)
  • 01:24 ejegg: updated payments from da453121d4086d6c1218d603149e4f568db321b5 to 76b5c559e08b340fb3d66254b27f624a9c0a4b95
  • 00:34 ejegg: updated payments-wiki from 0b2d6807f68e7ccfc36a431e90d563b4d5a9e966 to da453121d4086d6c1218d603149e4f568db321b5

2016-05-20

  • 20:36 chasemp: restart rabbitmq on labcontrol1001
  • 17:05 moritzm: uploaded librsvg 2.40.2-1+wm2 for trusty-wikimedia to carbon (backported patches from librsvg DSA to our custom trusty build)
  • 16:54 bd808: Cleaned up /tmp/mw-cache-1.27.0-wmf.2* cache files on tin
  • 16:50 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri May 20 16:50:25 UTC 2016 (duration 9m 38s)
  • 16:40 logmsgbot: bd808@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 10m 06s)
  • 16:18 bd808: kicking off manual l10nupdate run on tin
  • 16:11 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.23) (duration: 20990m 58s)
  • 15:52 jynus: performing schema change on s5 T130692
  • 15:16 godog: roll-restart cassandra-metrics-collector in eqiad for T135385
  • 15:05 gehel: starting cluster rejoining for cassandra onmaps-test2001
  • 15:03 godog: roll-restart cassandra-metrics-collector in codfw for T135385
  • 15:02 moritzm: uploaded librsvg 2.40.5-1+deb8u2+wmf1 for jessie-wikimedia to carbon (rebase of locally patched package on top of latest security update)
  • 14:50 gehel: shutting down kartotherian on maps-test2001 (accidental data deletion)
  • 14:22 gehel: cassandra downgraded on maps2*.codfw.wmnet
  • 14:04 _joe_: removing libicu48 from trusty archives, kept a copy of the packages in my homedir on carbon
  • 13:45 mark: Enabled cr2-codfw et-0/* interfaces, reenabling OSPF/OSPF3
  • 13:38 mark: Bringing cr2-codfw FPC 0 back up
  • 13:37 moritzm: upgraded java on xenon/praseodymium/cerium and restbase2001 to latest openjdk-8 release (along with restarts of Cassandra)
  • 13:37 mark: Offlining cr2-codfw FPC 0
  • 13:35 jynus: changing dbstore1001 to be a direct slave of db1075
  • 13:31 mark: Disabling cr2-codfw et-0/* interfaces
  • 13:24 mark: Disabling OSPF on all cr2-codfw row subnets to drain FPC0
  • 13:14 mark: Lowering VRRP priority to 50 on all VRRP groups on cr2-codfw to drain FPC0
  • 13:07 mark: Performing acupuncture on cr2-codfw:ae4.2020 (Lowered VRRP priority from 100 to 50, inet/inet6)
  • 13:05 jynus: freeing up space on db1038 by defragmenting its tables
  • 12:31 moritzm: rolling restart of nginx in eqiad to pick up expat update
  • 12:22 moritzm: rolling restart of nginx in codfw to pick up expat update
  • 12:16 elukey: restarting cassandra on aqs100[123] for Java upgrades
  • 12:02 moritzm: rolling restart of nginx in esams to pick up expat update
  • 11:49 moritzm: rolling restart of nginx in ulsfo to pick up expat update
  • 10:40 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Increase db1029 and db1033 weight back to normal after upgrade (duration: 01m 52s)
  • 09:00 jynus: altering db1040 commonswiki.categorylinks
  • 08:50 elukey: upgrading cassandra from 2.1.12 to 2.1.13 on aqs1003.eqiad.mwnet
  • 08:45 awight: Update payments fraud config
  • 08:11 elukey: upgrading cassandra from 2.1.12 to 2.1.13 on aqs1002.eqiad.mwnet
  • 08:08 mobrovac: mathoid deploying 243a530
  • 07:36 jynus: testing medata lock detectiom on db1069
  • 06:11 bblack: restarted gdnsd on eeden.esams (with new config, esams marked down)
  • 05:58 bblack: gdnsd stopped on eeden.esams, puppet disabled
  • 05:56 volans: Killed transaction 3262258 on db1040 (alter table stuck in "Waiting for table metadata lock" blocking the replica) T130692
  • 04:24 moritzm: restarted hhvm on mw1014 (got stuck, output of hhvm-dump-debug available)
  • 00:52 mutante: stashbot test (T122690)
  • 00:07 logmsgbot: dereckson@tin Finished scap: Revert "Convert Special:WhatLinksHere from XML form to OOUI form" (Gerrit:289772, T135773) (duration: 34m 12s)

2016-05-19

  • 23:33 logmsgbot: dereckson@tin Started scap: Revert "Convert Special:WhatLinksHere from XML form to OOUI form" (Gerrit:289772, T135773)
  • 23:29 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Drop already-enabled VisualEditorNewAccountEnableProportion wikis (duration: 00m 27s)
  • 23:25 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Move VisualEditor to secondary status on English Wikipedia (T132806) (duration: 00m 29s)
  • 23:13 logmsgbot: dereckson@tin Synchronized wmf-config/throttle.php: Add Account throttle exception for SF Edit-a-thon (T135777) (duration: 00m 27s)
  • 23:10 logmsgbot: dereckson@tin Synchronized wmf-config/CommonSettings.php: Enable experimental Video.js player on test2wiki (2/2) (duration: 00m 27s)
  • 23:10 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable experimental Video.js player on test2wiki (1/2) (duration: 00m 26s)
  • 23:09 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable experimental Video.js player on test2wiki (1/2) (duration: 00m 30s)
  • 22:13 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.2/extensions/VisualEditor/ApiVisualEditor.php: rv for now, have another idea for later (duration: 00m 40s)
  • 21:28 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.2/extensions/VisualEditor/ApiVisualEditor.php: more debug logs (duration: 00m 25s)
  • 21:02 yurik_: deployed tilerator service update https://gerrit.wikimedia.org/r/#/c/289736/
  • 21:01 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.2/extensions/VisualEditor/ApiVisualEditor.php: (no message) (duration: 00m 27s)
  • 20:50 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.2/extensions/VisualEditor/ApiVisualEditor.php: (no message) (duration: 00m 25s)
  • 20:49 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.2/extensions/VisualEditor/ApiVisualEditor.php: update to the debug logging to try to find other params (duration: 00m 28s)
  • 20:45 urandom: RESTBase staging Cassandra upgrade complete : T126629
  • 20:44 urandom: restbase-test2003.codfw.wmnet upgraded and online : T126629
  • 20:42 urandom: Upgrading Cassandra to 2.2.6 on restbase-test2003.codfw.wmnet : T126629
  • 20:42 urandom: Stopping Cassandra on restbase-test2003.codfw.wmnet : T126629
  • 20:38 urandom: restbase-test2002.codfw.wmnet upgraded and online : T126629
  • 20:36 logmsgbot: krenair@tin Synchronized php-1.28.0-wmf.2/extensions/VisualEditor/ApiVisualEditor.php: slight update to the debug logging to try to find the callers (duration: 00m 31s)
  • 20:36 urandom: Upgrading Cassandra to 2.2.6 on restbase-test2002.codfw.wmnet : T126629
  • 20:35 urandom: Stopping Cassandra on restbase-test2002.codfw.wmnet : T126629
  • 20:35 chasemp: iridium try to block vandal by ip temp so puppet disable and edit of /etc/apache2/phabbanlist.conf
  • 20:31 urandom: restbase-test2001.codfw.wmnet upgraded and online : T126629
  • 20:29 urandom: Upgrading Cassandra to 2.2.6 on restbase-test2001.codfw.wmnet : T126629
  • 20:28 urandom: Stopping Cassandra on restbase-test2001.codfw.wmnet : T126629
  • 20:19 urandom: praseodymium.eqiad.wmnet upgraded and online : T126629
  • 20:16 urandom: Upgrading Cassandra to 2.2.6 on praseodymium.eqiad.wmnet : T126629
  • 20:16 logmsgbot: aaron@tin Synchronized wmf-config/CommonSettings.php: Lowered to 5 (duration: 00m 29s)
  • 20:15 urandom: Stopping Cassandra on praseodymium.eqiad.wmnet : T126629
  • 20:00 urandom: Upgrading Cassandra on cerium.eqiad.wmnet, and forcing puppet run : T126629
  • 19:59 urandom: Stopping Cassanra on cerium.eqiad.wmnet for Cassandra 2.2.6 upgrade : T126629
  • 19:32 urandom: Disabling puppet in RESTBase staging : T126629
  • 19:26 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.2
  • 18:38 cscott: updated Parsoid to version 67816adf (T100681, T130638)
  • 18:27 urandom: Enabling puppet on xenon.eqiad.wmnet and forcing run : T126629
  • 18:26 urandom: Upgrading Cassandra on xenon.eqiad.wmnet to 2.2.6 : T126629
  • 18:26 cscott: synced; restarted Parsoid on wtp1001.eqiad as a canary
  • 18:25 urandom: Stopping Cassandra on xenon.eqiad.wmnet : T126629
  • 18:23 cscott: re-attempting parsoid deploy of 67816adf
  • 18:22 cscott: to cleanup Parsoid repos ori ran: salt 'wtp*' cmd.run "sed -i -e '/106801025/d' /srv/deployment/parsoid/deploy/src/lib/api/routes.js"
  • 18:13 cscott: parsoid deploy reverted to parsoid/deploy-sync-20160504-200410 tag (b0d015fa); 21 repos still dirty
  • 18:09 cscott: starting to revert Parsoid deploy due to unresolved dirty repos
  • 18:05 cscott: git-deploy of Parsoid failed with "21/44 minions completed checkout" due to dirty repos, root had applied patch during the restbase/changeprop/parsoid outage.
  • 17:39 cscott: starting Parsoid deploy (of 67816adf)
  • 17:25 elukey: execute sysctl -w net.netfilter.nf_conntrack_max=512000 on kafka1013 as temporary measure (investigating why conntrack count is higher after leader election) - T135557
  • 17:13 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1033 after maintenance with low weight; increase db1029 weight (duration: 00m 29s)
  • 16:31 volans: Set runtime value for max_allowed_packet, innodb_buffer_pool_dump_at_shutdown, innodb_buffer_pool_load_at_startup to their configured values for s1-s7, es1-es3, x1 T133333
  • 16:31 urandom: Disabling puppet on xenon.eqiad.wmnet in preparation for Cassandra upgrade : T126629
  • 16:29 elukey: upgrading cassandra from 2.1.12 to 2.1.13 on aqs1001.eqiad.mwnet
  • 15:48 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.1/extensions/VisualEditor/ApiVisualEditor.php: SWAT: Debug log strange-looking ETags being sent to RB (duration: 00m 29s)
  • 15:41 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.2/extensions/VisualEditor/ApiVisualEditor.php: SWAT: Debug log strange-looking ETags being sent to RB (duration: 00m 44s)
  • 15:16 hashar: Restarted zuul-merger daemons on both gallium and scandium : file descriptors leaked
  • 14:20 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Repool db1029 (x1) with low weight - T112079 (duration: 00m 40s)
  • 14:18 akosiaris: enable puppet on maps-test200{2,3,4}.
  • 14:02 akosiaris: enabled and ran puppet on maps-test2001
  • 13:58 akosiaris: disable puppet on maps-test200{1,2,3,4} for enabling cassandra metrics collection selectively
  • 13:31 chasemp: reboot labstore1003 kernel upgrade
  • 12:54 godog: bounce carbon-c-relay on graphite1001, run with debug version
  • 12:45 elukey: restarted oozie on analytics1003 for security upgrades
  • 12:28 elukey: restarted hue on analytics1027 for security upgrades
  • 11:26 moritzm: restarting salt-master on neodymium
  • 11:09 ori: dropped negative values from mc_get_hits_rate ganglia metrics for eqiad memcached hosts by running https://phabricator.wikimedia.org/P3138
  • 10:49 volans: db1029 stop, backup and reimage T112079
  • 10:48 jynus: db1033 stop, backup and reimage T134555
  • 10:41 volans: Disable puppet on db1029 for reimaging T112079
  • 10:03 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1033 (s7 old master) & db1029 (x1-slave) for maintenance (duration: 02m 05s)
  • 09:51 moritzm: restarting apache2 on pallaium (will impose a few temporary puppet failures)
  • 09:46 moritzm: restarting apache2 on strontium (will impose a few temporary puppet failures)
  • 09:39 hashar: Restarting Jenkins
  • 09:23 kart_: updated cxserver to 4aaec58
  • 09:22 moritzm: restarting apache on neon (hosting icinga) for security update
  • 09:08 moritzm: restarting apache on silver (hosting wikitech) for security update
  • 08:35 hashar: gallium: purging old Linux kernel packages (~2.2Gbytes)
  • 08:27 moritzm: restarting apache on ytterbium (hosting gerrit.wikimedia.org) for security update
  • 08:06 moritzm: rolling restart of hhvm on mediawiki in eqiad to pick up expat security update
  • 07:17 jynus: performing schema change on s4 T130692
  • 07:04 moritzm: installed chromium security updates on osmium
  • 06:11 gehel: completed rolling restart of Elasticsearch codfw for Java update (T135499)
  • 03:10 ejegg|away: updated fundraising tools from 220afdeaa36bc3feaaff1f781e7761d7878c4ee8 to b2425aef2154d6b689900f4848cca02880321230
  • 02:47 ejegg|away: updated misc fundraising tools from e2978024e6f6b6881d087ac5d07e4c40f7374709 to 220afdeaa36bc3feaaff1f781e7761d7878c4ee8
  • 02:06 ejegg: enabled banner history queue consumer
  • 02:02 ejegg: updated civicrm from 7952ba43a012cb6a2e8d16af19bb13ed520bd56f to b7b46740d701942507dca0a98a75f3f87b6b31b1
  • 01:22 twentyafterfour: Phabricator upgrade completed and service restored.
  • 01:15 twentyafterfour: Phabricator deployment T134443 starting momentarily. Downtime should be minimal but there will be a short interruption while the service restarts.

2016-05-18

  • 23:28 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.2/extensions/Echo/: SWAT: fix URLs in notification emails (T135625) (duration: 00m 35s)
  • 23:14 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on outreachwiki (T135582) (duration: 00m 31s)
  • 22:23 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: second attempt: group1 to 1.28.0-wmf.2
  • 22:19 logmsgbot: twentyafterfour@tin Synchronized php-1.28.0-wmf.2/includes/deferred/LinksUpdate.php: deploy https://gerrit.wikimedia.org/r/#/c/289569/ (duration: 00m 29s)
  • 22:00 akosiaris: restart cassandra on maps-test200{2,3,4} to get logstash changing working
  • 21:39 akosiaris: enable puppet on aqs1001 aqs1002 aqs1003 aqs1004 aqs1005 aqs1006 maps-test2001 maps-test2002 maps-test2003 maps-test2004 maps2001 maps2002 maps2003 maps2004
  • 21:32 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: roll back group1 to 1.28.0-wmf.1
  • 21:25 ejegg: updated DjangoBannerStats from 8172c4176ea2b78df0dccbfa052064f2739c64bd to 220f80ece976aecbfbfbe1cb476d79910eb48ff4
  • 20:41 urandom: Renabling puppet on restbase production cluster : T126629
  • 20:27 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.2
  • 20:26 yurik: git deployed tilerator, will restart one by one manually. https://gerrit.wikimedia.org/r/#/c/289542/
  • 20:19 urandom: Rolling restart of restbase-test200[1-3].codfw.wmnet : T126629
  • 20:17 urandom: restarting Cassandra on cerium.eqiad.wmnet : T126629
  • 20:16 urandom: restarting Cassandra on praseodymium.eqiad.wmnet : T126629
  • 20:14 twentyafterfour: Deployment of 1.28.0-wmf.2 [T134450] is no longer blocked Preparing to deploy to group1 wikis.
  • 20:02 urandom: Restarting Cassandra on xenon.eqiad.wmnet post-puppet-run : T126629
  • 19:55 akosiaris: disable puppet on restbase1010 restbase1011 restbase1012 restbase1013 restbase1014 restbase1015 as well
  • 19:51 akosiaris: disabled puppet on aqs1001 aqs1002 aqs1003 aqs1004 aqs1005 aqs1006 maps-test2001 maps-test2002 maps-test2003 maps-test2004 maps2001 maps2002 maps2003 maps2004 restbase1007 restbase1008 restbase1009 restbase2001 restbase2002 restbase2003 restbase2004 restbase2005 restbase2006 restbase2007 restbase2008 restbase2009 for cassandra upgrade
  • 18:51 csteipp: added oathauth-enable right to Staff group on testwiki.
  • 18:40 urandom: Starting bootstrap of restbase2008-b.codfw.wmnet : T95253
  • 18:37 logmsgbot: csteipp@tin Synchronized wmf-config/CommonSettings.php: Enable OATH on test wikis (duration: 00m 28s)
  • 18:36 logmsgbot: csteipp@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 34s)
  • 18:35 urandom: Starting restbase2008-a.codfw.wmnet : T95253
  • 18:32 urandom: Stopping restbase2008-a.codfw.wmnet and downgrading Cassandra to 2.1.13 : T95253
  • 18:31 urandom: Stopping failed bootstrap of restbase2008-b.codfw.wmnet : T95253
  • 18:14 csteipp: created oathauth_users in centralauth db
  • 17:45 logmsgbot: twentyafterfour@tin Synchronized php-1.28.0-wmf.2/extensions/Renameuser/RenameuserSQL.php: (no message) (duration: 00m 38s)
  • 17:33 moritzm: rolling restart of hhvm on mediawiki canaries in eqiad to pick up expat security update
  • 16:01 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.2/includes/DefaultSettings.php: SWAT: Increase BotPasswordSessionProvider default priority (duration: 00m 26s)
  • 15:52 bblack: restarting varnish frontends on cache_misc + cache_maps to clear cached X-Cache entries
  • 15:43 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.2/extensions/CentralAuth/includes/CentralAuthHooks.php: SWAT: Fix central logout (duration: 00m 26s)
  • 15:32 hashar: Deleted Nodepool snapshot images created around 14:30 apparently they havent been provisioned properly. Started nodepool again. Poke T135631
  • 15:29 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgDisableAuthManager = true (duration: 00m 25s)
  • 15:18 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets/js/index-1a803501de.js: SWAT: manual js sync for portals (duration: 00m 28s)
  • 15:14 logmsgbot: thcipriani@tin Synchronized portals: (no message) (duration: 00m 31s)
  • 15:14 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 30s)
  • 15:12 hashar: Nodepool / labs instance issue filled as https://phabricator.wikimedia.org/T135631
  • 15:11 logmsgbot: thcipriani@tin Synchronized portals: (no message) (duration: 00m 32s)
  • 15:10 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 33s)
  • 15:10 thcipriani: SWAT: running sync-portals for Updating footer on www.wikipedia.org portal
  • 15:09 hashar: Stopping Nodepool on labnodepool1001 , it can't spawn instances
  • 15:02 hashar: Zuul/Nodepool is out of instances. Looking
  • 14:15 moritzm: rolling restart of hhvm in codfw to pick up expat security update
  • 13:57 chasemp: downtime for dataset1001 puppet runs as T134896 causes failure (temporary for resize)
  • 13:55 jynus: disabling puppet on all database masters to test replication monitoring change T133337
  • 13:29 chasemp: resize volume for nfs dumps per T134896
  • 12:45 gehel: elasticsearch eqiad - reducing high watermark to rebalance disk space
  • 11:41 gehel: starting rolling restart of Elasticsearch codfw for Java update (T135499)
  • 11:41 gehel: rolling restart of Elasticsearch equiad for Java update completed (T135499)
  • 11:17 moritzm: install expat security updates
  • 10:41 kart_: Updated cxserver to 700dac2
  • 10:39 mobrovac: zotero restarted it on sca100, was on 50% mem
  • 10:38 mobrovac: zotero restarted it on sca1002, was on 50% mem and 100% cpu
  • 10:37 kart_: Updating cxserver to 700dac2
  • 09:35 moritzm: installing jansson security updates
  • 09:01 moritzm: installing libarchive security updates
  • 08:43 moritzm: installing xerces-c security updates
  • 08:36 jynus: performing schema change on s2 T130692
  • 08:10 gehel: starting rolling restart of Elasticsearch equiad fro Java update (T135499)
  • 07:04 mobrovac: restbase deploy end of 75a94ee
  • 06:54 mobrovac: restbase deploy start of 75a94ee
  • 06:38 moritzm: restarted hhvm on mw1207
  • 03:09 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed May 18 03:09:11 UTC 2016 (duration 9m 40s)
  • 02:59 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 11m 15s)
  • 02:31 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 11m 05s)
  • 01:51 logmsgbot: dereckson@tin Finished scap: Echo: Bring back Echo email messages (Gerrit:289346, for wmf1) (duration: 26m 30s)
  • 01:25 logmsgbot: dereckson@tin Started scap: Echo: Bring back Echo email messages (Gerrit:289346, for wmf1)
  • 01:03 logmsgbot: dereckson@tin Finished scap: Echo: Bring back Echo email messages (duration: 50m 15s)
  • 00:12 logmsgbot: dereckson@tin Started scap: Echo: Bring back Echo email messages
  • 00:09 Dereckson: mwscript namespaceDupes.php jamwiki --fix: 4 pages to fix, 4 were resolvable. (T135479)
  • 00:07 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add Puotal: namespace to jam.wikipedia (T135479) (duration: 00m 27s)
  • 00:01 logmsgbot: dereckson@tin Synchronized wmf-config/: Undeploy Gather extension (T128568) (duration: 00m 33s)

2016-05-17

  • 23:52 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add tasnimnews.com & khamenei.ir to wgCopyUploadsDomains (duration: 00m 36s)
  • 23:24 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.2/extensions/Echo/includes/formatters/EchoHtmlEmailFormatter.php: HTML email footer shows raw HTML (Gerrit:289270) (duration: 00m 31s)
  • 23:14 logmsgbot: dereckson@tin Synchronized wmf-config/CommonSettings.php: Disable $wgCentralAuthCheckSULMigration functionality (T127887) (duration: 00m 25s)
  • 23:10 logmsgbot: dereckson@tin Synchronized portals: Turning off survey banner on www.wikipedia.org (T135235) (duration: 00m 26s)
  • 23:09 logmsgbot: dereckson@tin Synchronized portals/prod/wikipedia.org/assets: Turning off survey banner on www.wikipedia.org (T135235) (duration: 00m 25s)
  • 22:40 logmsgbot: ori@tin Synchronized php-1.28.0-wmf.1/includes/api/ApiStashEdit.php: Id1e0808c: Improve edit stash hit rate for logged-out users (duration: 00m 32s)
  • 21:45 gehel: stopping rolling restart of elasticsearch cluster for the night (T135499)
  • 21:21 logmsgbot: ori@tin Synchronized php-1.28.0-wmf.1/extensions/NavigationTiming: I62e20087c1: Expand coverage of conformance test (duration: 00m 28s)
  • 21:15 twentyafterfour: finished deploying wmf/1.28.0-wmf.2
  • 21:15 logmsgbot: ori@tin Synchronized php-1.28.0-wmf.2/includes/api/ApiStashEdit.php: Id1e0808c: Improve edit stash hit rate for logged-out users (duration: 00m 35s)
  • 20:53 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.2
  • 20:52 logmsgbot: twentyafterfour@tin Purged l10n cache for 1.27.0-wmf.22
  • 20:51 logmsgbot: twentyafterfour@tin Purged l10n cache for 1.27.0-wmf.21
  • 20:51 logmsgbot: twentyafterfour@tin Purged l10n cache for 1.27.0-wmf.20
  • 20:41 logmsgbot: twentyafterfour@tin Finished scap: (no message) (duration: 27m 42s)
  • 20:27 bblack: rebooting kafka1013 from racadm
  • 20:14 twentyafterfour: syncing testwiki to wmf/1.28.0-wmf.2
  • 20:13 logmsgbot: twentyafterfour@tin Started scap: (no message)
  • 20:09 bblack: starting upgrade of varnish4 packages on cache_maps
  • 20:09 bblack: finished upgrade of varnish4 packages on cache_misc
  • 20:03 ejegg: enabled banner impression loader
  • 20:02 ejegg: updated DjangoBannerStats from 30396a6df5bb9eadd5d4485a08fc4f6cb4096bd9 to 8172c4176ea2b78df0dccbfa052064f2739c64bd
  • 20:00 ejegg: disabled banner impression loader
  • 19:58 bblack: starting upgrade of varnish4 packages on cache_misc
  • 19:01 twentyafterfour: preparing to deploy wmf/1.28.0-wmf.2
  • 18:37 matt_flaschen: Ran manual SQL write in production to work around T122262; see task for query.
  • 18:36 urandom: Restarting (failed) bootstrap of restbase2008-b.codfw.wmnet : T95253
  • 18:34 urandom: Restart restbase2008-a.codfw.wmnet; Hail Mary pass for failed 2008-b bootstraps : T95253
  • 17:22 ejegg: updated CiviCRM from 6e3581692bb81c2c507612e7cd66dcdf78ef3cc0 to 7952ba43a012cb6a2e8d16af19bb13ed520bd56f
  • 16:23 godog: disable mod_deflate and restart apache2 on graphite1001 T135515
  • 16:01 twentyafterfour: branching wmf/1.28.0-wmf.2
  • 15:12 logmsgbot: thcipriani@tin Synchronized portals: (no message) (duration: 00m 25s)
  • 15:12 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 25s)
  • 15:11 thcipriani: running /srv/mediawiki-staging/portals/sync-portals for gerrit:289218
  • 15:06 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor for IP users on the Japanese Wikipedia gerrit:287646 (duration: 00m 35s)
  • 14:59 elukey: disabled puppet on aqs* nodes as prep step to bootstrap the new testing cluster (https://gerrit.wikimedia.org/r/#/c/288373/5)
  • 14:46 gehel: taking elastic1001 down for investigation (T135509)
  • 14:37 mobrovac: change-prop deploying 5b5a07a3
  • 14:16 gehel: restarting elastic1001 - high load (T135509)
  • 13:32 hashar: upgrading Jenkins T133737
  • 13:25 godog: bounce carbon/frontend-relay on graphite1001 to increase queue size T135385
  • 13:22 elukey: memcahced restarted on mc1009 with -o slab_reassign,slab_automove,lru_crawler,lru_maintainer as part of a perf experiment (T129963)
  • 12:21 gehel: starting rolling restart of Elasticsearch equiad fro Java update (T135499)
  • 12:16 mobrovac: mathoid deployed 10c7cb8
  • 11:04 elukey: updated facts on puppet compiler following https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet3-diffs
  • 10:59 jynus: disabling puppet and starting decom process of db1027
  • 10:30 jynus: running schema change on s1 T130692
  • 09:59 logmsgbot: filippo@palladium conftool action : set/pooled=yes; selector: mw2050.codfw.wmnet
  • 09:14 ema: upgrading ulsfo cache_misc to varnish 4 (T131501, T134989)
  • 09:02 ema: upgrading esams cache_misc to varnish 4 (T126206, T134989)
  • 08:54 jynus: increased pool stall limit to 500 on db1049
  • 08:53 logmsgbot: filippo@palladium conftool action : set/pooled=no; selector: mw2050.codfw.wmnet
  • 08:31 ema: upgrading codfw cache_misc to varnish 4 (T126206, T134989)
  • 08:15 jynus: reducing durability and enabling GTID on db1026 T135100
  • 08:14 jynus: reducing durability and enabling GTID on db1026
  • 08:13 ema: upgrading eqiad cache_misc to varnish 4 (T126206, T134989)
  • 08:11 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Set db1026 back as rc node; move roles around (duration: 00m 31s)
  • 07:08 _joe_: restarted hhvm on mw1255, stuck in a deadlock on HPHP::Treadmill::getAgeOldestRequest
  • 07:02 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Reduce db1026 load (duration: 00m 37s)
  • 03:44 ori: Upgraded Grafana from 3.0.0-pre1 to 3.0.2
  • 02:37 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue May 17 02:37:35 UTC 2016 (duration 8m 46s)
  • 02:28 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 10m 33s)
  • 01:13 logmsgbot: krenair@tin Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/289118/ (duration: 00m 27s)
  • 00:54 YuviPanda: restarted rabbitmq on labcontrol1001
  • 00:52 YuviPanda: restarted nova-scheduler, let's see if this fixes nodepool
  • 00:52 YuviPanda: restarted nova-conductor a few mins ago, no help for nodepool
  • 00:08 logmsgbot: ebernhardson@tin Finished scap: Full scap to sync out WikimediaMessages update (duration: 25m 31s)
  • 00:04 mutante: created planet2001 ganeti VM on ganeti2001
  • 00:03 mutante: mw1230, restart hhvm

2016-05-16

  • 23:42 logmsgbot: ebernhardson@tin Started scap: Full scap to sync out WikimediaMessages update
  • 23:38 logmsgbot: ebernhardson@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 00m 34s)
  • 23:37 logmsgbot: ebernhardson@tin Synchronized php-1.28.0-wmf.1/extensions/WikimediaMessages/WikimediaMessages.php: re-add interwiki search result messages (duration: 00m 25s)
  • 23:26 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/includes/EmailFormatter.php: Fix unsubstituted message in emails (duration: 00m 25s)
  • 23:18 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/WikimediaEvents: SWAT (duration: 00m 26s)
  • 23:17 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/UploadWizard: SWAT (duration: 00m 28s)
  • 23:13 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: SWAT (duration: 00m 25s)
  • 23:13 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: SWAT (duration: 00m 25s)
  • 23:09 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: SWAT (duration: 00m 36s)
  • 23:09 logmsgbot: catrope@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT (duration: 00m 25s)
  • 17:06 ejegg: updated payments from 1af7208b4c7387a701f9ce83e33483d1e19213cb to 0b2d6807f68e7ccfc36a431e90d563b4d5a9e966
  • 17:00 godog: bounce cassandra-b on restbase2008-b to restart bootstrap
  • 16:59 ottomata: restarting kafka broker on kafka1022
  • 16:54 ottomata: restarted kafka broker on kafka1020 (about 5 mins ago)
  • 16:36 ottomata: restarting kafka broker on kafka1018
  • 16:23 ottomata: restarting kafka broker on kafka1014
  • 16:15 ottomata: restarting broker on kafka1013
  • 15:42 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add an alias "ΒΠ" for Project namespace in elwiki gerrit:288965 (duration: 00m 26s)
  • 15:36 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Enable MathML rendering by default on test, wikidata and dewikibooks PART II gerrit:286180 (duration: 00m 24s)
  • 15:35 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable MathML rendering by default on test, wikidata and dewikibooks PART I gerrit:286180 (duration: 00m 26s)
  • 15:31 bblack: starting slow cache_upload frontend restarts (wipes) for cache size upgrades (~10H process)
  • 15:29 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add "deletedtext" right to "eliminator" user group on fawiki gerrit:288956 (duration: 00m 26s)
  • 15:25 logmsgbot: thcipriani@tin Synchronized wmf-config/abusefilter.php: SWAT: Enable blocking feature of AbuseFilter on ptwiktionary gerrit:288810 (duration: 00m 24s)
  • 15:14 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Set meta namespace names for jamwiki gerrit:288628 (duration: 00m 27s)
  • 15:09 bblack: varnish package upgrade on cache_maps done
  • 15:08 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: wgCopyUploadProxy: Vary per datacenter gerrit:287095 (duration: 00m 26s)
  • 15:01 logmsgbot: hoo@tin Synchronized wmf-config/Wikibase.php: Update Wikidata property blacklist (duration: 00m 25s)
  • 14:55 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Remove all mentions to db1027, db2008 and db2009 (duration: 00m 33s)
  • 14:54 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Remove all mentions to db1027, db2008 and db2009 (duration: 00m 34s)
  • 14:45 mobrovac: change-propagation deployed ef5f6ff55
  • 14:45 bblack: starting varnish package upgrade on cache_maps
  • 14:39 bblack: general package updates on cache clusters (4.4.0 image, libtasn6-1, libjansson4, libidn11)
  • 11:20 jynus: testing create table on s7-master T130700
  • 10:08 jynus: applying puppet and restarting sanitarium instances to apply new filter (temporary labs lag)
  • 09:37 volans: Slowly re-enabling Puppet on db*, es*, pc*, labsdb*, labservices*, silver, holmium after merged change 288361 T133780
  • 09:31 volans: restarted mysql on db2034 to test merged change 288361 , T133780
  • 09:00 volans: Temporary disabling Puppet on db*, es*, pc*, labsdb*, labservices*, silver, holmium to merge change 288361 T133780
  • 08:48 volans: !log Temporary disabling Puppet on db*, es*, pc*, labsdb*, labservices*, silver, holmium to merge change 288361 T133780
  • 08:45 elukey: !log memcached on mc101[0123] got restarted because puppet did run gerrit/288880 and gerrit/288886 at the same time (operators fault of course)
  • 08:29 elukey: !log puppet disabled on mc* hosts as prep step for https://gerrit.wikimedia.org/r/#/c/288886 (precaution, not really needed)
  • 08:08 elukey: !log restarted memcached on mc1007 to ensure that https://gerrit.wikimedia.org/r/288880 was applied and working correctly. Will not do the same thing with the other mc hosts.
  • 08:01 elukey: !log puppet disabled on mc* hosts as prep step for https://gerrit.wikimedia.org/r/#/c/288880 (precaution, not really needed)
  • 07:07 _joe_: !log restarted hhvm on mw1161, stuck in HPHP::Treadmill::getAgeOldestRequest
  • 06:57 logmsgbot: !log jynus@tin Synchronized wmf-config/db-eqiad.php: Take db1027 out of production (duration: 00m 26s)
  • 02:30 logmsgbot: !log l10nupdate@tin ResourceLoader cache refresh completed at Mon May 16 02:30:49 UTC 2016 (duration 8m 40s)
  • 02:22 logmsgbot: !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 09m 44s)

2016-05-15

  • 02:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun May 15 02:29:07 UTC 2016 (duration 8m 44s)
  • 02:20 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 08m 47s)
  • 02:15 YuviPanda: !log console stuck on labnet1002, powercycling

2016-05-14

  • 02:28 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat May 14 02:28:07 UTC 2016 (duration 8m 42s)
  • 02:19 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 08m 06s)
  • 01:25 logmsgbot: aaron@tin Synchronized php-1.28.0-wmf.1/includes/api/ApiStashEdit.php: f33ed7ae239 (duration: 00m 36s)
  • 00:07 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/includes/api/ApiMain.php: Fix missing parameter for OutputPageCheckLastModified hook (duration: 00m 32s)

2016-05-13

  • 23:54 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/includes/api/ApiEchoNotifications.php: Do not reuse CentralAuth tokens (duration: 00m 25s)
  • 22:54 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/includes/api/ApiEchoNotifications.php: Logging live hack (duration: 00m 31s)
  • 22:50 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/includes/api/ApiEchoNotifications.php: More logging (duration: 00m 25s)
  • 22:38 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Flow/includes/Notifications/Formatter.php: Fix fatal for old Flow notifications (duration: 00m 26s)
  • 22:34 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/: Log failures to fetch cross-wiki notifications (duration: 00m 41s)
  • 21:55 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/Echo.php: Bump cache version now that cache pollution is hopefully fixed (duration: 00m 25s)
  • 21:52 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/: Fixes for cross-wiki notifications deployment fallout (duration: 00m 38s)
  • 21:17 bblack: cache_misc varnish3 downgrade complete (except 3007 + 1045 - those remain depooled/downtimed/puppet-disabled/etc, do not touch)
  • 21:06 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Enable $wgEchoCrossWikiNotifications on the right wikis (unused for now) (duration: 00m 28s)
  • 20:09 logmsgbot: mattflaschen@tin Synchronized wmf-config: Disable cross-wiki notifications entirely on non-SUL wikis and hide preference (duration: 00m 34s)
  • 20:05 bblack: starting varnish3 downgrade process for most of cache_misc
  • 17:43 godog: powercycle ms-be2008, sdb failed
  • 15:41 godog: bounce graphite-web on graphite1001, debugging carbon-c-relay stalls while sending metrics
  • 13:57 jynus: installing and testing sqlproxy on dbproxy1005
  • 13:44 hashar: Jenkins zmq is all fine
  • 13:43 hashar: Restarted Jenkins, the zmq plugin daemon did not start
  • 13:13 ema: depooling cp3007 to try reproducing T134989
  • 11:28 ema: repooling cp1061
  • 11:09 moritzm: uploaded firejail 0.38-1+wmf1 for trusty-wikimedia to carbon
  • 10:27 ema: depooling cp1061 trying to reproduce T134989
  • 09:48 godog: nodetool cleanup on restbase2006 T132976
  • 09:11 godog: start backfilling cassandra metrics from graphite1001 to graphite1003
  • 05:27 jynus: reimporting x1 on dbstore2002
  • 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri May 13 02:30:49 UTC 2016 (duration 8m 50s)
  • 02:22 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 08m 17s)
  • 01:48 awight|animal: rollback fundraising-tools from 53f6fe635dd8cc451b86788018e53f418b690b00 to e2978024e6f6b6881d087ac5d07e4c40f7374709
  • 00:53 mutante: killing multiple icinga-wm processes on neon, running puppet. manually started and by puppet as well?
  • 00:34 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: Remaining wikis to wmf.1
  • 00:17 cwd: updated payments from b16c3cb0e12aacb49b91974d693c401dcaa9cca9 to 1af7208b4c7387a701f9ce83e33483d1e19213cb
  • 00:13 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/includes/api/ApiEchoNotifications.php: attempt to fix notices/fatals in Echo (duration: 00m 29s)
  • 00:11 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.23/extensions/Echo/includes/api/ApiEchoNotifications.php: attempt to fix notices/fatals in Echo (duration: 00m 25s)

2016-05-12

  • 23:58 chasemp: restart slapd instances. we have some seemingly ldap issues in labs but it may not be the servers, could be bad merged changes.
  • 23:54 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.1/extensions/CirrusSearch/includes/CirrusSearch.php: Adjust textcat data collection for AB test (T121542) (duration: 00m 26s)
  • 23:43 logmsgbot: dereckson@tin Synchronized wmf-config/CirrusSearch-common.php: Revert "A/B/C test of control vs textcat vs accept-lang + textcat" (Gerrit:288545, 3/3) (duration: 00m 25s)
  • 23:43 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Revert "A/B/C test of control vs textcat vs accept-lang + textcat" (Gerrit:288545, 2/3) (duration: 00m 25s)
  • 23:42 logmsgbot: dereckson@tin Synchronized tests/cirrusTest.php: Revert "A/B/C test of control vs textcat vs accept-lang + textcat" (Gerrit:288545, 1/3) (duration: 00m 24s)
  • 23:33 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: A/B/C test of control vs textcat vs accept-lang + textcat (Gerrit:268048) (duration: 00m 25s)
  • 23:23 logmsgbot: dereckson@tin Synchronized wmf-config/CirrusSearch-common.php: A/B/C test of control vs textcat vs accept-lang + textcat (Gerrit:268048) (duration: 00m 25s)
  • 23:22 logmsgbot: dereckson@tin Synchronized tests/cirrusTest.php: A/B/C test of control vs textcat vs accept-lang + textcat (no-op) (duration: 00m 25s)
  • 23:16 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Allow import from outreach to meta (T134788) (duration: 00m 27s)
  • 23:10 logmsgbot: dereckson@tin Synchronized wmf-config/throttle.php: UK EU edit-a-thon throttle rule (Gerrit:288377, T134902) (duration: 00m 28s)
  • 23:06 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable cross-wiki notifications by default in production (Gerrit:287035, 2/2) (duration: 00m 29s)
  • 23:06 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings-labs.php: Enable cross-wiki notifications by default in production (Gerrit:287035, 1/2) (duration: 00m 42s)
  • 22:17 eileen: Updating civicrm from d32032965f2a45d24f1eaeb352eb6bfd6641753c to 6e3581692bb81c2c507612e7cd66dcdf78ef3cc0
  • 21:00 logmsgbot: aude@tin Synchronized wmf-config/Wikibase.php: Update repoUrl setting for Wikibase (duration: 00m 28s)
  • 21:00 logmsgbot: aude@tin Synchronized wmf-config/Wikibase-labs.php: Update repoUrl setting for Wikibase (duration: 00m 32s)
  • 20:58 logmsgbot: aude@tin Synchronized wmf-config/Wikibase-production.php: Update repoUrl setting for Wikibase (duration: 00m 27s)
  • 20:53 awight: update fundraising-tools from b114b7174c3bd9bf53cd44bf55397049a03b96fb to 53f6fe635dd8cc451b86788018e53f418b690b00
  • 19:57 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 to 1.28.0-wmf.1
  • 19:23 chasemp: adding static route to labtest instance vlan on cr(1|2)-codfw through labtestnet
  • 19:03 ottomata: restarting kafka broker on kafka1012 to pick up inter.broker.protocol.version change
  • 18:41 kaldari: ran mwscript maintenance/updateCollation.php --wiki=ruwiktionary --force
  • 17:34 jynus: reverting dbstore1001 to regular replication coords
  • 17:13 jynus: enabling GTID on selected db production hosts to change topology of dbstore1001
  • 17:03 godog: upgrade carbon-c-relay from jessie-backports on graphite1003 / graphite2002
  • 16:46 godog: investigating carbon-c-relay crash with buffer overflow on graphite1001
  • 16:45 akosiaris: restart alsafi for 2.5+dfsg-4~bpo8+1 qemu upgrade
  • 16:43 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.1/extensions/UploadWizard/resources/controller/uw.controller.Details.js: SWAT: Fix Uncaught TypeError: this.copyMetadataWidget.remove is not a function gerrit:288423 (duration: 00m 24s)
  • 16:38 logmsgbot: mattflaschen@tin Synchronized wmf-config/CommonSettings-labs.php: Beta Cluster change (duration: 00m 28s)
  • 16:30 logmsgbot: mattflaschen@tin Synchronized wmf-config/CommonSettings-labs.php: Beta Cluster change (duration: 00m 29s)
  • 16:30 jynus: running "megacli -PDOffline -PhysDrv '[32:8]' -aALL" on db1023
  • 16:29 logmsgbot: mattflaschen@tin Synchronized wmf-config/db-labs.php: Beta Cluster change (duration: 00m 27s)
  • 16:24 gehel: removing wdqs1001 from rotation for disk upgrade (T120714)
  • 16:11 gehel: merged and applied https://gerrit.wikimedia.org/r/#/c/288376/ (T134456)
  • 16:10 akosiaris: migrate all ganeti instances except alsafi off ganeti2006
  • 16:10 akosiaris: evacuate ganeti2006.codfw.wmnet from ganeti secondary instances
  • 16:06 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.1/extensions/Wikidata/extensions/ArticlePlaceholder/includes/SearchHookHandler.php: SWAT: Update ArticlePlaceholder - Fix notability checks and props params gerrit:288396 (duration: 00m 24s)
  • 16:03 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.23/maintenance/updateCollation.php: SWAT: updateCollation.php sql changes backport (duration: 00m 26s)
  • 15:52 ema: testing file storage on misc eqiad+esams T134989
  • 15:46 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.23/extensions/Wikidata/extensions/ArticlePlaceholder/includes/SearchHookHandler.php: SWAT: Update ArticlePlaceholder - Fix notability checks and props params gerrit:288396 (duration: 00m 25s)
  • 15:41 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.1/maintenance/updateCollation.php: SWAT: updateCollation.php sql updates gerrit:288386 gerrit:288384 gerrit:288385 (duration: 00m 26s)
  • 15:25 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.23/extensions/ContentTranslation/modules/tools/ext.cx.tools.mt.js: SWAT: MT: Use custom labels instead of provider id gerrit:288342 (duration: 00m 26s)
  • 15:20 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.1/extensions/ContentTranslation/modules/tools/ext.cx.tools.mt.js: SWAT: MT: Use custom labels instead of provider id gerrit:288171 (duration: 00m 25s)
  • 15:13 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Bump CirrusSearchRequestSet avro schema to rev 121456865906 PART II gerrit:287973 (duration: 00m 25s)
  • 15:12 logmsgbot: thcipriani@tin Synchronized wmf-config/event-schemas: SWAT: Bump CirrusSearchRequestSet avro schema to rev 121456865906 PART I gerrit:287973 (duration: 00m 26s)
  • 15:06 logmsgbot: thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT: Remove deprecated settings gerrit:281966 (duration: 00m 34s)
  • 14:55 jynus: executing random CHANGE MASTERS and crashes on db200[89] trying to break them
  • 14:50 godog: downtime flapping redis replication on rdb2006 alert for 10d
  • 14:45 jynus: enabling GTID on x1 shard (except db1029)
  • 13:34 ema: upgrading misc to varnish 4.1.2-1wm4 and wiping caches (T134989)
  • 13:30 elukey: restarted memcached on mc1007 with chunk growth factor set to 1.15 - Part of a perf experiment (T129963)
  • 13:14 jynus: deploying GTID replication on all of es3 shard
  • 13:08 volans: disabling puppet and restarting MySQL on db2040 to test change 288361 (in scheduled downtime on icinga) T133780
  • 12:40 godog: kill duplicate ircecho daemon on neon
  • 12:39 akosiaris: upgraded nodejs on etherpad1001 to nodejs 4.3
  • 12:32 akosiaris: etherpad-lite_1.6.0-1 uploaded on apt.wikimedia.org
  • 12:05 moritzm: added jenkins 1.651.2 for precise-wikimedia to carbon
  • 10:44 jynus: testing dual masters and r/w mode on parsercache nodes
  • 10:20 elukey: restarted oozie, hive-* daemons on analytics1003 for java upgrades
  • 10:09 godog: run puppet on graphite2001 to split cassandra metrics
  • 10:06 godog: reload carbon-c-relay on labmon1001, noop
  • 09:51 volans: Apply at runtime thread_pool_stall_limit=10 for all coredb masters (Gerrit 287394, T133333)
  • 09:47 volans: Apply at runtime thread_pool_max_threads=2000 for all coredb masters (Gerrit 287394, T133333)
  • 08:33 ema: wiping misc caches once again (T134989)
  • 07:51 ema: upgrading misc ulsfo to varnish 4.1.2-1wm3 and wiping caches (T134989)
  • 07:48 jynus: testing schema change T73563 on db1040
  • 07:47 ema: upgrading misc codfw to 4.1.2-1wm3 and wiping caches (T134989)
  • 07:05 ema: running varnish 4.1.2-1wm3 in misc esams (T134989)
  • 06:50 ema: repooling cp3007 (ran puppet after varnish upgrade)
  • 06:42 ema: depooling cp3007
  • 04:56 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.23/includes/objectcache/RedisBagOStuff.php: Fix unserialization of negative numbers (duration: 00m 32s)
  • 04:37 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/includes/objectcache/RedisBagOStuff.php: Fix unserialization of negative numbers (duration: 00m 31s)
  • 03:54 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.23/extensions/Flow/includes/Notifications/MentionPresentationModel.php: Fix Flow fatal (duration: 00m 26s)
  • 03:05 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu May 12 03:05:39 UTC 2016 (duration 9m 30s)
  • 02:56 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 07m 23s)
  • 02:30 logmsgbot: mwdeploy@tin scap sync-l10n completed (1.27.0-wmf.23) (duration: 09m 28s)
  • 00:05 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.23/extensions/ZeroBanner: https://gerrit.wikimedia.org/r/#/c/288236/ (duration: 00m 25s)
  • 00:03 logmsgbot: maxsem@tin Synchronized php-1.28.0-wmf.1/extensions/ZeroBanner: https://gerrit.wikimedia.org/r/#/c/288236/ (duration: 00m 26s)

2016-05-11

  • 23:52 logmsgbot: catrope@tin Finished scap: Updating wmf23 Echo to wmf1 (duration: 26m 45s)
  • 23:26 logmsgbot: catrope@tin Started scap: Updating wmf23 Echo to wmf1
  • 23:13 logmsgbot: catrope@tin Synchronized php-1.28.0-wmf.1/extensions/Echo/: SWAT (duration: 00m 33s)
  • 23:08 logmsgbot: catrope@tin Synchronized portals: (no message) (duration: 00m 25s)
  • 23:08 logmsgbot: catrope@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 31s)
  • 23:06 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on specieswiki (duration: 00m 34s)
  • 21:19 robh: ulsfo maint complete, will push traffic back shortly.
  • 21:15 logmsgbot: robh@palladium conftool action : set/pooled=yes; selector: dc=ulsfo,cluster=cache_text
  • 21:12 logmsgbot: oblivian@palladium conftool action : set/pooled=yes; selector: dc=ulsfo,cluster=cache_upload
  • 20:47 mutante: re-enabled icinga notifications for ipsec
  • 20:23 bblack: wiping cache misc caches, again....
  • 19:39 mutante: turned off notifications for icinga services that match IPSEC
  • 19:25 mutante: stopped icinga-wm, flood due to ulsfo maintenance
  • 19:08 logmsgbot: hoo@tin Synchronized wmf-config/InitialiseSettings.php: Enable the ArticlePlaceholder on eowiki, orwiki and napwiki (duration: 00m 29s)
  • 18:58 robh: disregard ulsfo cp system icinga spam, onsite work for thermal paste per T134831
  • 18:55 logmsgbot: hoo@tin Synchronized wmf-config/InitialiseSettings.php: Enable the ArticlePlaceholder on htwiki - T134273 (duration: 00m 46s)
  • 18:47 logmsgbot: hoo@tin Finished scap: Update ArticlePlaceholder to master for initial Wikipedia deployment. (duration: 51m 18s)
  • 17:59 moritzm: uploaded new linux package for jessie-wikimedia to carbon (based on 4.4.9)
  • 17:56 logmsgbot: hoo@tin Started scap: Update ArticlePlaceholder to master for initial Wikipedia deployment.
  • 17:53 logmsgbot: hoo@tin Synchronized php-1.28.0-wmf.1/extensions/Wikidata: Update Wikibase (Property ordering related backports) (duration: 01m 52s)
  • 17:51 logmsgbot: hoo@tin Synchronized php-1.27.0-wmf.23/extensions/Wikidata: Update Wikibase (Property ordering related backports) (duration: 02m 11s)
  • 17:34 bblack: wiping cache_misc caches again...
  • 16:14 akosiaris: upgrade etherpad software to 1.6.0-1 and evaluate stability. packages NOT yet uploaded on apt.w.o
  • 16:07 bblack: wiping cache_misc caches again...
  • 15:22 thcipriani: first SWAT with scap subcommands (3.2.0-1) complete
  • 15:20 logmsgbot: thcipriani@tin Synchronized php-1.28.0-wmf.1/extensions/Translate/tag/PageTranslationHooks.php: SWAT: On documentation unit deletion, dont attempt to re-render translation page gerrit:288165 (duration: 00m 27s)
  • 15:11 logmsgbot: thcipriani@tin Synchronized portals: SWAT: T128546 updating portal stats gerrit:288192 (duration: 00m 33s)
  • 15:10 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: SWAT: T128546 updating portal stats gerrit:288192 (duration: 00m 43s)
  • 14:36 bblack: wiping caches on cache_misc...
  • 13:33 ottomata: starting kafka 0.9 upgrade of analytics-eqiad cluster
  • 13:31 elukey: camus + puppet disabled on analytics1027 as prep step for kafka cluster migration to 0.9
  • 12:50 elukey: mw1147 powercycled due to unresponsiveness (not able to login as root)
  • 12:50 ema: upgrading cp1061 to varnish 4 (T131501)
  • 12:47 ema: upgrading cp1058 to varnish 4 (T131501)
  • 12:39 ema: upgrading cp1051 to varnish 4 (T131501)
  • 12:33 ema: upgrading cp1045 to varnish 4 (T131501)
  • 12:18 ema: upgrading cp2025 to varnish 4 (T131501)
  • 12:10 ema: upgrading cp2018 to varnish 4 (T131501)
  • 12:09 moritzm: installing libarchive security updates
  • 12:05 ema: upgrading cp2012 to varnish 4 (T131501)
  • 11:55 ema: upgrading cp2006 to varnish 4 (T131501)
  • 11:34 _joe_: restarting just HHVM on mw1142
  • 11:34 _joe_: restarting mw1142
  • 10:52 logmsgbot: elukey@palladium conftool action : set/pooled=yes; selector: kafka1002.eqiad.wmnet
  • 10:50 logmsgbot: elukey@palladium conftool action : set/pooled=no; selector: kafka1002.eqiad.wmnet
  • 10:48 elukey: restarting eventbus on kafka100[12] for security upgrades
  • 10:47 logmsgbot: elukey@palladium conftool action : set/pooled=yes; selector: kafka1001.eqiad.wmnet
  • 10:44 moritzm: gradually disabling unprivileged bpf on Linux 4.4 hosts via sysctl (once completed this will be puppetised, but the sysctl can't be reverted without a reboot so be careful for the initial activation)
  • 10:43 logmsgbot: elukey@palladium conftool action : set/pooled=no; selector: kafka1001.eqiad.wmnet
  • 10:20 ema: upgrading cp4004 to varnish 4 (T131501)
  • 10:12 ema: upgrading cp4003 to varnish 4 (T131501)
  • 10:03 ema: upgrading cp4002 to varnish 4 (T131501)
  • 09:41 ema: upgrading cp4001 to varnish 4 (T131501)
  • 09:37 moritzm: rolling restart of scb in eqiad to pick up openssl update
  • 09:14 ema: upgrading cp3010 to varnish 4 (T131501)
  • 09:06 ema: upgrading cp3009 to varnish 4 (T131501)
  • 08:55 ema: upgrading cp3008 to varnish 4 (T131501)
  • 08:34 mobrovac: restbase deploy end of beaaa71
  • 08:28 godog: bootstrap restbase2008-a T132976
  • 08:24 mobrovac: restbase deploy start of beaaa71
  • 08:18 elukey: memcached on mc1009 restarted with chunk size growth factor 1.15 (was: 1.05)
  • 08:04 moritzm: restarting apache on palladium for openssl update
  • 08:03 elukey: puppet disabled on mc10XX hosts for https://gerrit.wikimedia.org/r/#/c/287913
  • 02:55 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed May 11 02:55:55 UTC 2016 (duration 9m 47s)
  • 02:46 logmsgbot: mwdeploy@tin sync-l10n completed (1.28.0-wmf.1) (duration: 05m 54s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.23) (duration: 09m 39s)

2016-05-10

  • 23:59 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.23/extensions/Kartographer/: Security patch (duration: 00m 25s)
  • 23:58 logmsgbot: maxsem@tin Synchronized php-1.28.0-wmf.1/extensions/Kartographer/: Security patch (duration: 00m 26s)
  • 23:52 logmsgbot: dereckson@tin Synchronized php-1.28.0-wmf.1/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.js: Fix 'Uncaught TypeError: this.emit is not a function' (T134794) (duration: 00m 25s)
  • 23:49 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.23/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.js: Fix 'Uncaught TypeError: this.emit is not a function' (T134794) (duration: 00m 28s)
  • 23:34 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Centralise feedback for the visual editor at the Hindi Wikipedia (Gerrit:287873, T134789) (duration: 00m 25s)
  • 23:31 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable VisualEditor in single edit mode on ja.wiki (Gerrit:285985, 2/2) (duration: 00m 25s)
  • 23:30 logmsgbot: dereckson@tin Synchronized dblists/visualeditor-default.dblist: Enable VisualEditor in single edit mode on ja.wiki (Gerrit:285985, 1/2) (duration: 00m 25s)
  • 23:21 robh: disabling ulsfo via dns for onsite work tomorrow per T134831
  • 23:18 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Configure cross-wiki uploads from test2wiki to testwiki (Gerrit:285708, 2/2) (duration: 00m 27s)
  • 23:17 logmsgbot: dereckson@tin Synchronized wmf-config/filebackend-production.php: Configure cross-wiki uploads from test2wiki to testwiki (Gerrit:285708, 1/2) (duration: 00m 27s)
  • 23:05 logmsgbot: dereckson@tin Synchronized wmf-config/CommonSettings.php: Undeploy UploadWizard from test2wiki (Gerrit:287944, 2/2) (duration: 00m 27s)
  • 23:04 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Undeploy UploadWizard from test2wiki (Gerrit:287944, 1/2) (duration: 00m 30s)
  • 22:28 yurik: graphoid was restarted on all scb servers with the new caching configuration. T134542
  • 22:08 mutante: ocg1003 - papaul fixed the install issue
  • 22:08 mutante: ocg1003 - revoked old puppet cert, signed new cert, re-adding after reinstall
  • 21:07 gehel: merging and applying configuration for new maps servers
  • 20:24 mutante: scheduled icinga downtime for ocg1003 and all services on it, rebooting to PXE (T84723)
  • 20:01 bblack: re-enabling puppet on caches
  • 19:53 logmsgbot: demon@tin Finished scap: group0 wikis to 1.28.0-wmf.1 (duration: 26m 49s)
  • 19:26 logmsgbot: demon@tin Started scap: group0 wikis to 1.28.0-wmf.1
  • 19:20 ottomata: upgraded kafka1022 to confluent kafka 0.9.0.1
  • 19:11 ottomata: reenabling camus and puppet on analytics1027
  • 18:38 mutante: deleted /tmp/make-wmf-branch on tin by request
  • 18:20 yurik: finished graphoid deployment & restart. T134575
  • 18:18 bblack: disabling puppet on caches for nginx change observe/deploy...
  • 18:10 jynus: testing GTID replication on es2019 T133385 T130702
  • 18:01 cscott: OCG: script reported "Cleared 0 (of 363141 total) entries from cache in 56.894 seconds" (T120079)
  • 18:01 ottomata: stopping kafka on kafka1022 to upgrade to 0.9
  • 18:00 cscott: OCG: clearing cache for ocg1003.eqiad.wmnet and ocg1003 (T120079)
  • 17:57 ottomata: stopping camus and puppet on analytics1027 during upgrade of one kafka broker
  • 17:56 yurik: about to deploy graphoid
  • 17:41 cscott: updated OCG to version b0c57a1c6890e9fa1f2c3743fc14cb6a7f244fc3 (T120079)
  • 17:38 cscott: starting OCG deploy
  • 16:40 ema: repooling cp3007 running varnish 4
  • 16:04 godog: nodetool cleanup on restbase2005 T132976
  • 16:03 robh: i fubar'd sinistra's grub, it'll be offline for a bit while longer.
  • 15:52 csteipp: deployed patch for T134863
  • 15:48 godog: collect mysqld metrics with prometheus-metrics-collector 0.8.1 on db2070 for 24h T128185
  • 15:35 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Configure $wgCheckUserCAMultiLock for CentralAuth wikis gerrit:286927 (duration: 00m 26s)
  • 15:27 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Simplify $wgApiFrameOptions configuration (2/2) PART II gerrit:287968 (duration: 00m 27s)
  • 15:26 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Simplify $wgApiFrameOptions configuration (2/2) PART I gerrit:287968 (duration: 00m 26s)
  • 15:23 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Simplify $wgApiFrameOptions configuration (1/2) gerrit:287939 (duration: 00m 26s)
  • 15:22 robh: sinistra powercycling and troubleshooting starting for disk issue
  • 15:15 moritzm: restarting apache on silver (wikitech host) for openssl update
  • 15:15 godog: graphite-web reload on graphite1001 after merging https://gerrit.wikimedia.org/r/281631
  • 15:10 logmsgbot: thcipriani@tin Synchronized portals: (no message) (duration: 00m 33s)
  • 15:10 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 27s)
  • 15:09 thcipriani: running sync-portals for portals SWAT
  • 15:08 moritzm: restarted apache on californium for openssl update
  • 15:06 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Remove misleading comment in UploadWizard configuration gerrit:287948 (duration: 00m 47s)
  • 14:57 jynus: dropped pdns db and associated user accounts on m5-master
  • 14:54 bblack: restarting pybal on lvs3002
  • 14:53 moritzm: restarting ntp on nescio for openssl update
  • 14:52 bblack: restarting pybal on lvs3004
  • 14:45 moritzm: restarting ntp on maerlant for openssl update
  • 14:37 moritzm: restarting ntp on acamar for openssl update
  • 14:32 moritzm: restarting ntp on chromium for openssl update
  • 14:27 ema: upgrading cp3007 (misc) to varnish 4
  • 14:24 moritzm: restarting ntp on hydrogen for openssl update
  • 14:15 moritzm: restarting ntp on achernar for openssl update
  • 14:10 moritzm: restarting apache on neon for openssl update
  • 14:08 moritzm: restarting apache on uranium for openssl update
  • 14:01 ema: depooling and rebooting cp1066 to test mdadm boot workaround T131961
  • 13:58 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Add db1023 db weight; reduce weight on db1035 and db1044 (duration: 00m 26s)
  • 13:55 moritzm: restarting nginx on dataset1001, ms1001 and francium for openssl update
  • 13:30 ema: depooling and rebooting cp1065 to test mdadm boot workaround T131961
  • 13:18 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 28s)
  • 13:17 logmsgbot: krinkle@tin Synchronized wmf-config/ProductionServices.php: (no message) (duration: 00m 28s)
  • 12:47 logmsgbot: hashar@tin Synchronized multiversion/MWMultiVersion.php: Typo fix in a couple comment blocks (duration: 00m 28s)
  • 12:46 logmsgbot: hashar@tin Synchronized typos: (no message) (duration: 01m 55s)
  • 12:33 godog: restart ircecho on kraz
  • 12:28 mobrovac: mobileapps deploying b8c396ae
  • 12:07 moritzm: rolling restart of maps cluster for openssl update
  • 12:03 moritzm: "powercycled" kraz (stuck by qemu bug)
  • 11:36 moritzm: apache restart on bohrium/piwik for openssl update
  • 11:26 jynus: stopping es2017 and es2019 for cloning 17 -> 19 + regular conf/upgrades
  • 11:02 logmsgbot: jmm@palladium conftool action : set/pooled=yes; selector: ms-fe2001.codfw.wmnet
  • 11:02 logmsgbot: jmm@palladium conftool action : set/pooled=no; selector: ms-fe2001.codfw.wmnet
  • 11:01 moritzm: rolling restart of swift frontend servers in codfw and eqiad for openssl update
  • 09:44 moritzm: rolling restart of swift backend servers in codfw and eqiad
  • 09:32 moritzm: powercycling ms-be2016, unresponsive and serial console is dead
  • 09:06 moritzm: restarting etherpad-lite on etherpad1001 for openssl update
  • 08:56 moritzm: restarting exim on fermium/lists.wikimedia.org for openssl update
  • 08:52 moritzm: restarting tor on radium for openssl update
  • 08:29 godog: bootstrap restbase2007-b T132976
  • 06:33 mobrovac: restbase deploy end of 1c890c4
  • 06:24 mobrovac: restbase deploy start of 1c890c4
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue May 10 02:33:17 UTC 2016 (duration 8m 57s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.23) (duration: 09m 23s)

2016-05-09

  • 23:15 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: nowiki flow beta (duration: 00m 26s)
  • 23:12 logmsgbot: demon@tin Synchronized wmf-config/: Obsolete graph settings (duration: 00m 29s)
  • 23:08 logmsgbot: demon@tin Synchronized wmf-config/mobile.php: Enable $wgMFStripResponsiveImages (duration: 00m 27s)
  • 23:04 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: lazy load images in MF for bnwiki (duration: 00m 26s)
  • 22:14 gehel: restarting elasticsearch on elastic1026 (high load)
  • 21:56 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: pedias back to wmf.23
  • 21:42 mutante: argon - scheduling eternal downtime, shut down | https://en.wiktionary.org/wiki/good_riddance#Etymology
  • 21:37 mutante: argon - revoke puppet cert, stop salt
  • 21:04 chasemp: clean out old snapshots on labstore1001
  • 20:58 mdholloway: mobileapps deployed f206e94
  • 20:36 mdholloway: starting mobileapps deployment
  • 20:29 gehel: restarting logstash server logstash100[26].eqiad.wmnet (T110236)
  • 20:26 gehel: restarting logstash server logstash1001.eqiad.wmnet (T110236)upgrade
  • 19:58 logmsgbot: demon@tin Synchronized php-1.27.0-wmf.22/extensions/NavigationTiming/: backport firstPaintTime fix (duration: 00m 33s)
  • 18:41 mutante: irc.wm.org - before restarting ircd on old, ~ 199 users on new, after: ~ 293 users on new
  • 18:38 mutante: argon - restarting ircd (this is the old server)
  • 17:37 elukey: camus+puppet re-enabled on analytics1027 after maintenance
  • 17:30 elukey: analytics1002 Yarn+HDFS masters failed over to analytics1001 for Java upgrades (restored original state)
  • 17:22 elukey: analytics1001 Yarn+HDFS masters failed over to analytics1002 for Java upgrades
  • 17:07 elukey: executed authdns-update on ns0.w.o to introduce new aqs records
  • 17:05 gehel: cluster restart completed for eqiad / codfw elasticsearch (T110236)$
  • 16:34 gehel: restarting elasticsearch server elastic1031.eqiad.wmnet (T110236), includes JDK upgrade
  • 16:12 gehel: restarting elasticsearch server elastic1030.eqiad.wmnet (T110236), includes JDK upgrade
  • 15:51 elukey: disabled camus+puppet on analytics1027 as prep step for maintenance on the cluster.
  • 15:41 gehel: restarting elasticsearch server elastic1029.eqiad.wmnet (T110236), includes JDK upgrade
  • 15:37 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add museudaimigracao.org.br to wgCopyUploadsDomains gerrit:287636 (duration: 00m 32s)
  • 15:30 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add National Digital Library of Brazil domains to wgCopyUploadsDomains gerrit:287635 (duration: 00m 27s)
  • 15:01 jynus: turning off es2001-es2010
  • 14:51 hashar: Zuul went deadlock. Restarted
  • 14:32 gehel: restarting elasticsearch server elastic1028.eqiad.wmnet (T110236), includes JDK upgrade
  • 14:30 gehel: restarting cassandra.metrics-collector on maps-test200[1234] (T134514) - correction
  • 14:29 gehel: restarting cassandra on maps-test200[1234] (T134514)
  • 14:23 elukey: restarting hadoop java daemons for Java upgrades on analytics104X and analytics105X hosts
  • 14:09 elukey: memcached restarted on mc1009 with only slab_reassign set (T129963)
  • 14:06 gehel: restarting elasticsearch server elastic1027.eqiad.wmnet (T110236), includes JDK upgrade
  • 13:37 gehel: restarting elasticsearch server elastic1026.eqiad.wmnet (T110236), includes JDK upgrade
  • 13:27 moritzm: restarting salt-master on neodymium to pick up openssl update
  • 13:14 gehel: restarting elasticsearch server elastic1025.eqiad.wmnet (T110236), includes JDK upgrade
  • 13:12 elukey: restarting hadoop java daemons for Java upgrades on analytics102X and analytics 103X hosts
  • 13:07 moritzm: restarting nginx on carbon/apt.wikimedia.org to pick up openssl update
  • 12:55 hashar: Enabled a setting in Jenkins for T132895
  • 12:53 gehel: restarting elasticsearch server elastic1024.eqiad.wmnet (T110236), includes JDK upgrade
  • 12:43 hashar: Updating Jenkins job operations-puppet-typos to use extended regular expressions when reading /typos ( T133047 )
  • 12:32 gehel: restarting elasticsearch server elastic1023.eqiad.wmnet (T110236), includes JDK upgrade
  • 12:03 Dereckson: mwscript initSiteStats.php --wiki iawiki --update (T134749)
  • 11:32 moritzm: rolling restart of ocg for openssl update
  • 11:27 hashar: restarting Jenkins
  • 11:17 gehel: restarting elasticsearch server elastic1022.eqiad.wmnet (T110236), includes JDK upgrade
  • 11:12 moritzm: restarting archiva on titanium for java update
  • 11:02 moritzm: restarting gitblit for java update
  • 10:54 gehel: restarting elasticsearch server elastic1021.eqiad.wmnet (T110236), includes JDK upgrade
  • 10:28 jynus: general decommission of db1058 (puppet, salt, etc.)
  • 10:21 elukey: restarted eventlogging on eventlogging1001 for security upgrades
  • 09:56 elukey: memcached restarted on mc1009, now running with slab_reassign,maxconns_fast,hash_algorithm=murmur3,slab_automove,lru_crawler,lru_maintainer (T129963, performance experiment)
  • 09:45 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Increase db1070 weight after repooling (duration: 00m 38s)
  • 09:27 gehel: restarting elasticsearch server elastic1020.eqiad.wmnet (T110236), includes JDK upgrade
  • 09:15 godog: bootstrap restbase2007-a T132976
  • 08:18 gehel: restarting elasticsearch server elastic1019.eqiad.wmnet (T110236)
  • 08:16 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1070 with low weight, weight increases, retire db1058 (duration: 00m 30s)
  • 08:13 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Retire db1058 server (duration: 00m 39s)
  • 07:38 gehel: restarting elasticsearch server elastic1018.eqiad.wmnet (T110236)
  • 06:48 jynus: powercycling pc2006 (crashed)
  • 06:11 gehel: restarting elasticsearch server elastic1017.eqiad.wmnet (T110236)
  • 02:52 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon May 9 02:51:58 UTC 2016 (duration 9m 18s)
  • 02:42 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.23) (duration: 08m 40s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 09m 53s)

2016-05-08

  • 07:44 gehel: restarting elasticsearch server elastic1016.eqiad.wmnet (T110236)
  • 06:07 gehel: restarting elasticsearch server elastic1015.eqiad.wmnet (T110236)
  • 02:52 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun May 8 02:52:46 UTC 2016 (duration 9m 47s)
  • 02:43 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.23) (duration: 08m 19s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 09m 52s)

2016-05-07

  • 20:28 SMalyshev: deploying updated Blazegraph version for WDQS to mitigate deadlock issue
  • 20:13 SMalyshev: restarted blazegraph on wdqs1001 and wdqs1002
  • 14:38 jynus: inplace precise upgrade to trusty on db1069 before labs explodes
  • 12:32 hoo: Restarted blazegraph on wdqs1002 (Unresponsive, even locally: java.io.IOException: Too many open files) T134238
  • 11:55 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.22/includes/api/ApiStashEdit.php: If56084466: Make stashEditFromPreview() call setCacheTime() (duration: 00m 33s)
  • 02:54 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat May 7 02:54:32 UTC 2016 (duration 9m 14s)
  • 02:45 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.23) (duration: 05m 38s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 09m 34s)
  • 01:01 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.22/includes/api/ApiStashEdit.php: If56084466: Bump PRESUME_FRESH_TTL_SEC to improve hit rate and avoid link queries (duration: 00m 34s)

2016-05-06

  • 22:37 logmsgbot: krenair@tin Synchronized wmf-config/LabsServices.php: labs-only change: https://gerrit.wikimedia.org/r/#/c/287294/ (duration: 00m 33s)
  • 22:37 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: labs-only change: https://gerrit.wikimedia.org/r/#/c/287294/ (duration: 00m 45s)
  • 19:56 chasemp: launching some test query traffic against labs DNS to test new settings
  • 14:56 logmsgbot: dcausse@tin Synchronized wmf-config/LabsServices.php: Set analytics kafka broker info for labs deployment-prep (duration: 00m 33s)
  • 14:46 moritzm: restarting exim on MX servers to pick up openssl update
  • 14:46 gehel: restarting elasticsearch server elastic1014.eqiad.wmnet (T110236)
  • 14:35 logmsgbot: demon@tin Synchronized php-1.27.0-wmf.22/extensions/CentralAuth: Backporting T134246 (duration: 00m 38s)
  • 13:54 gehel: restarting elasticsearch server elastic1013.eqiad.wmnet (T110236)
  • 13:26 jynus: reimaging db1070
  • 13:06 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1070 for maintenance (duration: 00m 29s)
  • 12:40 moritzm: restarted hhvm on mw1226 (hhvm-dump-debug output available)
  • 12:28 mobrovac: restbase rolling restart in eqiad for openssl update
  • 11:52 moritzm: rolling restart of restbase in codfw for openssl update
  • 11:47 elukey: restarting aqs on aqs100[123] for security upgrades.
  • 11:15 moritzm: restarting slapd on dubnium to pick up openssl update
  • 11:10 hoo: Reverted the property suggester data to data from the 20160411 dump (done testing T132839)
  • 11:02 hoo: Overwrote property suggester data with data from the 20160215 dump (T132839)
  • 10:59 moritzm: restarting slapd on pollux to pick up openssl update
  • 09:17 moritzm: restarted hhvm on mw1148
  • 08:36 gehel: restarting elasticsearch server elastic1012.eqiad.wmnet (T110236)
  • 08:13 godog: delete blacklisted cassandra metrics for restbase meta tables T134016
  • 08:05 gehel: restarting elasticsearch server elastic1011.eqiad.wmnet (T110236)
  • 07:12 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1065 after maintenance; repool db1023 with low weight (duration: 00m 34s)
  • 05:57 gehel: restarting elasticsearch server elastic1010.eqiad.wmnet (T110236)
  • 05:06 kart_: Update cxserver to 155c2d4
  • 04:35 logmsgbot: ori@tin rebuilt wikiversions.php and synchronized wikiversions files: Wikipedias back to wmf.22 due to page load performance regression

2016-05-05

  • 21:37 gehel: restarting elasticsearch server elastic1009.eqiad.wmnet (T110236)
  • 19:30 csteipp: deployed patch for T132874
  • 19:07 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: wikipedias to wmf.23
  • 18:47 csteipp: deployed patch for T133507
  • 18:44 robh: mr1-ulsfo power cable reroute complete
  • 18:43 robh: mr1-ulsfo going to flap as i reroute the power cable
  • 18:23 csteipp: deployed patch for T130947
  • 18:17 csteipp: deployed revert/new patches for core & extension for T129506
  • 17:40 _joe_: restarted ocg after packet capture
  • 17:40 ejegg: updated payments-wiki from 8962b558b5eb5a4dba91be7ed7649b532fa4ee35 to b16c3cb0e12aacb49b91974d693c401dcaa9cca9
  • 17:36 _joe_: temporarily stopping ocg on ocg1003 to better debug cleanup cache
  • 17:23 urandom: Rolling restart of cassandra-metrics-collector in RESTBase test cluster : T134016
  • 17:19 urandom: Rolling restart of cassandra-metrics-collector in RESTBase cluster : T134016
  • 17:09 _joe_: clearing ocg cache entries for ocg1003
  • 17:06 logmsgbot: oblivian@palladium conftool action : set/pooled=no; selector: name=ocg1003.*
  • 16:55 godog: restart cassandra-metrics-collector on restbase2001 to test white/black listing
  • 16:31 mdholloway: mobileapps finished no-op test deploy
  • 16:22 mdholloway: mobileapps starting no-op deploy on scb1001
  • 16:01 jynus: restarting db1023 for reimage to jessie
  • 15:11 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.23/extensions/CentralAuth/: SWAT: Use master CentralAuthUser instances when writing gerrit:287079 (duration: 00m 32s)
  • 15:05 logmsgbot: anomie@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Commons: Restrict changetags userright gerrit:286522 (duration: 00m 29s)
  • 15:03 logmsgbot: anomie@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant groups from flaggedrevs the patrol rights at test2wiki gerrit:287090 (duration: 00m 47s)
  • 15:02 jynus: stopping db1065 for hardware maintenance
  • 14:42 elukey: restarted memcached on mc1008 as part of a performance test (T129963)
  • 14:39 elukey: memcached on mc1009 is running now with new parameters only available for 1.4.25-2~wmf1 - part of a performance test (T129963)
  • 14:28 elukey: memcached 1.4.25-2~wmf1 manually installed on mc1009 as part of a performance test (T129963)
  • 14:02 andrewbogott: stopping nova-api on labnet1002
  • 13:55 andrewbogott: downtimed labservices1001, holmium, labcontrol1001 for one hour, disabled puppet as per https://phabricator.wikimedia.org/T128737
  • 13:14 akosiaris: set ganeti2006 as drained (excluded from allocation operations)
  • 11:32 gehel: restarting blazegraph on wdqs1001
  • 10:40 akosiaris: rebooted labsdb1004
  • 10:40 akosiaris: fix problems with url_downloader created by PEBKAC
  • 10:23 jynus: stopping mysql & backuping db1023 in preparation for reimage
  • 09:59 jynus: SET GLOBAL thread_pool_stall_limit = 10; on db1057
  • 09:56 jynus: SET GLOBAL thread_pool_max_threads = 2000; on db1057
  • 09:54 elukey: installed memcached 1.4.25-2~wmf1 manually on mc2009 as part of T129963
  • 09:49 mobrovac: restbase deploy end of 2a3972a
  • 09:42 jynus: powercycling mw2027
  • 09:30 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Really depool db1023 for maintenance (duration: 02m 27s)
  • 09:26 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1023 for maintenance (duration: 02m 44s)
  • 09:22 _joe_: reloaded the backend varnsih config on cp1058
  • 09:22 mobrovac: restbase deploy start of 2a3972a on canary restbase1008
  • 08:57 hoo: Restarted blazegraph on wdqs1002 (unresponsive)
  • 08:12 hoo: Restarted blazegraph on wdqs1002 (unresponsive)
  • 05:52 gehel: restarting elasticsearch server elastic1008.eqiad.wmnet (T110236)
  • 04:55 gehel: restarting elasticsearch server elastic1007.eqiad.wmnet (T110236)
  • 03:27 ori: restarted parsoid on all wtp* hosts to kill any requests from changepropagation service that may have been in flight
  • 03:20 ori: ran: sudo salt 'scb*' cmd.run 'puppet agent --disable ; service changeprop stop'
  • 03:15 ori: and restarted parsoid on wtp1*
  • 03:15 ori: injected an early return to v1Wt2html in routes.js when oldid = 106801025
  • 02:49 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 18m 23s)
  • 00:27 twentyafterfour: phabricator upgrade complete
  • 00:21 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Plumbing for wmgEchoCrossWikiByDefault (unused for now) (duration: 00m 24s)
  • 00:16 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Plumbing for wmgEchoCrossWikiByDefault (unused for now) (duration: 00m 26s)
  • 00:06 logmsgbot: aude@tin rebuilt wikiversions.php and synchronized wikiversions files: Put wikidata on wmf/1.27.0-wmf.23
  • 00:04 twentyafterfour: taking phabricator offline for maintenance

2016-05-04

  • 23:54 logmsgbot: aude@tin Synchronized php-1.27.0-wmf.23/extensions/Wikidata: Fix bug in other languages box: T134432 (duration: 02m 18s)
  • 23:37 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Remove emailuser override for hewiki, no longer needed (duration: 00m 33s)
  • 23:21 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.23/extensions/VisualEditor: SWAT (duration: 00m 29s)
  • 23:18 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.23/extensions/Cite: SWAT (duration: 00m 52s)
  • 21:56 logmsgbot: demon@tin Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 28s)
  • 21:43 cscott: restarting OCG after puppet deploy of https://gerrit.wikimedia.org/r/286068
  • 21:37 ejegg: updated payments-wiki from c502ab2f6b6ff914d67503a664d36076fdc32dcf to 8962b558b5eb5a4dba91be7ed7649b532fa4ee35
  • 21:35 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.22/extensions/Echo/Hooks.php: Debug logging for seemingly unattached users (duration: 00m 25s)
  • 21:25 gehel: deploying new icinga check on response time for WDQS
  • 21:23 logmsgbot: legoktm@tin Synchronized wmf-config/: touch the directory this time (duration: 00m 37s)
  • 21:13 logmsgbot: legoktm@tin Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 27s)
  • 21:11 logmsgbot: legoktm@tin Synchronized wmf-config/CommonSettings.php: Enable UploadsLink on Wikimedia Commons (2/2) - try 2 (duration: 00m 30s)
  • 21:08 logmsgbot: legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Enable UploadsLink on Wikimedia Commons (1/2) - try 2 (duration: 00m 28s)
  • 20:47 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.22/extensions/Echo/Hooks.php: Fix fatal (T134428) (duration: 00m 32s)
  • 20:32 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.23/extensions/Echo/Hooks.php: Fix fatal (T134428) (duration: 00m 33s)
  • 20:28 logmsgbot: hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: Wikidata back to 1.27.0-wmf.22 due to T134432. Poke T131557.
  • 20:08 subbu: finished deploying parsoid sha b0d015fa (T134017)
  • 20:05 subbu: synced new code; restarted parsoid on wtp1001 as canary
  • 20:05 hashar: Completed group1 wikis to 1.27.0-wmf.23
  • 20:02 subbu: starting deploy of parsoid sha b0d015fa
  • 19:51 logmsgbot: hashar@tin Synchronized wmf-config/: Bump cache epoch for Wikidata - https://gerrit.wikimedia.org/r/#/c/286940/ (duration: 00m 30s)
  • 19:33 logmsgbot: hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.23
  • 19:30 logmsgbot: hashar@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 25s)
  • 19:21 logmsgbot: hashar@tin Synchronized wmf-config: Reverting https://gerrit.wikimedia.org/r/#/c/286517/ due to wmgUseUploadsLink being undefined. T130018 (duration: 00m 29s)
  • 19:17 mutante: wikimedia.ru looks down - hello Moscow? https://meta.wikimedia.org/wiki/Wikimedia_Russia
  • 19:15 logmsgbot: hashar@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 41s)
  • 19:14 logmsgbot: hashar@tin Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 33s)
  • 18:10 gehel: restarting elasticsearch server elastic1006.eqiad.wmnet (T110236)
  • 17:49 gehel: restarting elasticsearch server elastic1005.eqiad.wmnet (T110236)
  • 17:31 twentyafterfour: livehacked phabricator/src/aphront/response/AphrontFileResponse.php to fix filename with newlines
  • 17:14 gehel: restarting elasticsearch server elastic1004.eqiad.wmnet (T110236)
  • 17:09 logmsgbot: legoktm@tin Synchronized wmf-config/: Enable UploadsLink at Wikimedia Commons - T130018 (duration: 00m 43s)
  • 17:05 logmsgbot: legoktm@tin Finished scap: Build l10n cache for UploadsLink deployment - T130018 (duration: 51m 53s)
  • 16:37 gehel: restarting elasticsearch server elastic1003.eqiad.wmnet (T110236)
  • 16:19 gehel: restarting blazegraph (T134238)
  • 16:13 logmsgbot: legoktm@tin Started scap: Build l10n cache for UploadsLink deployment - T130018
  • 16:03 gehel: restarting elasticsearch server elastic1002.eqiad.wmnet (T110236)
  • 16:02 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable NewUserMessage for SUL accounts too on gu.wikiquote gerrit:286877 (duration: 00m 28s)
  • 16:00 bblack: REALLY (from active LVS) removing old mobile IPs from actual production config (no longer in use) - T124482
  • 15:52 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration for gl.wikisource gerrit:286811 (duration: 00m 27s)
  • 15:50 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.23/extensions/ProofreadPage/ProofreadPage.namespaces.php: SWAT: Localize namespaces Page and Index in Galician gerrit:286820 (duration: 00m 26s)
  • 15:45 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Restore Wikipedia: namespace alias on outreach gerrit:286875 (duration: 00m 27s)
  • 15:43 logmsgbot: thcipriani@tin Synchronized wmf-config/throttle.php: SWAT: Shakespeare in London throttle rule gerrit:286807 (duration: 00m 26s)
  • 15:36 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.23/includes/htmlform/HTMLFormField.php: SWAT: Fix HTMLFormField calling Message::setContext with null gerrit:286855 (duration: 00m 25s)
  • 15:34 bblack: removing old mobile IPs from actual production config (no longer in use) - T124482
  • 15:33 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove redundant NS_PROJECT entries from wgNamespacesAliases gerrit:286812 (duration: 00m 34s)
  • 15:28 gehel: restarting elasticsearch server elastic1001.eqiad.wmnet (T110236)
  • 15:24 bblack: changed catchpoint 'Static Assets' checks from (deprecated) https://bits.wikimedia.org/static-current/resources/assets/poweredby_mediawiki_88x31.png to https://meta.wikimedia.org/w/resources/assets/poweredby_mediawiki_88x31.png - T107430
  • 15:11 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Really depool db1065 for maintenance (duration: 00m 26s)
  • 15:09 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Translate: Use Apertium via cxserver Part II gerrit:286632 (duration: 00m 28s)
  • 15:08 logmsgbot: thcipriani@tin Synchronized wmf-config/ProductionServices.php: SWAT: Translate: Use Apertium via cxserver Part I gerrit:286632 (duration: 00m 29s)
  • 14:59 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1065 for maintenance (duration: 00m 35s)
  • 14:30 mobrovac: change-prop: stopping the service until we scale the http proxy service
  • 14:22 jynus: installing sys schema on dbs with performance_schema enabled
  • 14:12 gehel: restarting elasticsearch server elastic2024.codfw.wmnet (T110236)
  • 13:49 logmsgbot: jmm@palladium conftool action : set/pooled=yes; selector: sca1002.eqiad.wmnet
  • 13:49 logmsgbot: jmm@palladium conftool action : set/pooled=no; selector: sca1002.eqiad.wmnet
  • 13:42 logmsgbot: jmm@palladium conftool action : set/pooled=yes; selector: sca1001.eqiad.wmnet
  • 13:41 logmsgbot: jmm@palladium conftool action : set/pooled=no; selector: sca1001.eqiad.wmnet
  • 13:40 moritzm: rolling restart of apertium in sca1* for openssl update
  • 13:24 gehel: restarting elasticsearch server elastic2023.codfw.wmnet (T110236)
  • 12:50 gehel: restarting elasticsearch server elastic2022.codfw.wmnet (T110236)
  • 12:34 gehel: restarting blazegraph (T134238)
  • 12:23 bblack: cache_text HTTP/2 switch process complete
  • 12:12 bblack: starting cache_text HTTP/2 switch process
  • 12:10 bblack: cache_upload HTTP/2 switch process complete
  • 12:08 gehel: restarting elasticsearch server elastic2021.codfw.wmnet (T110236)
  • 12:00 bblack: starting cache_upload HTTP/2 switch process
  • 11:51 gehel: restarting elasticsearch server elastic2020.codfw.wmnet (T110236)
  • 11:16 elukey: updating Pybal/LVS for codfw eventbus on lvs2003
  • 11:09 moritzm: removed obsolete mediawiki-math-texvc/imagemagick from nobelium
  • 11:03 gehel: restarting elasticsearch server elastic2019.codfw.wmnet (T110236)
  • 10:33 gehel: restarting elasticsearch server elastic2018.codfw.wmnet (T110236)
  • 10:29 moritzm: rolling restart of parsoid in eqiad to pick up openssl update
  • 10:25 elukey: updating pybal/LVS with codfw eventbus config on lvs2006
  • 10:23 jynus: restarting db1058 for reimaging to jessie T125028
  • 10:05 gehel: restarting elasticsearch server elastic2017.codfw.wmnet (T110236)
  • 09:59 godog: root@tin:/# lvresize -r -v --size +30G /dev/mapper/tin--vg-root
  • 09:44 kart_: Updated cxserver to 45596ac
  • 09:15 jynus: restarting enwiki-labs reimports (lag could happen temporarily)
  • 08:42 gehel: restarting elasticsearch server elastic2016.codfw.wmnet (T110236)
  • 08:19 jynus: stopping db1058 mysql for backup in preparation for reimage
  • 08:14 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1058 for reimage (duration: 00m 36s)
  • 06:20 gehel: restarting elasticsearch server elastic2015.codfw.wmnet (T110236)
  • 03:06 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed May 4 03:06:41 UTC 2016 (duration 9m 32s)
  • 02:57 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.23) (duration: 17m 12s)
  • 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 09m 19s)
  • 00:23 Dereckson: mwscript namespaceDupes.php aswikisource --merge --fix (T133505)
  • 00:21 kaldari: ran mwscript maintenance/updateCollation.php --wiki=ruwiktionary --force
  • 00:02 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.23/extensions/Graph/extension.json: Graph: match modern module loading in core (3/3) (duration: 00m 26s)
  • 00:02 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.23/extensions/Graph/lib/topojson-global.js: Graph: match modern module loading in core (2/3) (duration: 00m 26s)
  • 00:01 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.23/extensions/Graph/lib/d3-global.js: Graph: match modern module loading in core (1/3) (duration: 00m 26s)

2016-05-03

  • 23:56 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.22/extensions/Graph/extension.json: Graph: match modern module loading in core (3/3) (duration: 00m 25s)
  • 23:55 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.22/extensions/Graph/lib/topojson-global.js: Graph: match modern module loading in core (2/3) (duration: 00m 25s)
  • 23:54 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.22/extensions/Graph/lib/d3-global.js: Graph: match modern module loading in core (1/3) (duration: 00m 26s)
  • 23:40 bblack: slow, depooled, staggered restart of varnish frontends on text and upload clusters commencing
  • 23:23 logmsgbot: dereckson@tin Synchronized wmf-config/CirrusSearch-production.php: Cirrus: only use pooled curl in hhvm / Gerrit:286485 (duration: 00m 34s)
  • 23:09 logmsgbot: dereckson@tin Synchronized wmf-config/CommonSettings-labs.php: Revert Don't yet allow wikidatasparql graph urls (no op in prod) (duration: 00m 25s)
  • 23:08 logmsgbot: dereckson@tin Synchronized wmf-config/CommonSettings.php: Revert Don't yet allow wikidatasparql graph urls (T126741) (duration: 00m 26s)
  • 22:47 bblack: upgrading varnish3 package on cache_text ...
  • 22:46 andrewbogott: restarting rabbitmq on labcontrol1001 to pick up a new ulimit
  • 21:55 bblack: stopped varnishkafka on all cache_upload, and wiped out the spammy junk it fills the disk with in /var/cache/varnishkafka/
  • 21:45 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.23/includes/search/SearchEngine.php: T134305 Fix invalid namespace handling in wmf.23 (duration: 00m 38s)
  • 21:37 bblack: cache_upload: rolling varnishd (backend only) restarts for package update
  • 21:31 bblack: upgrading varnish3 package on cache_upload
  • 21:11 gehel: restarting wdqs1002 (T134238)
  • 20:02 hashar: Restarting Jenkins
  • 19:11 hashar: group0 to 1.27.0-wmf.23 is complete.
  • 19:01 logmsgbot: hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.23 T131557
  • 19:01 gehel: restarting elasticsearch server elastic2013.codfw.wmnet (T110236)
  • 18:26 bblack: cache_misc: rolling varnishd restarts for package update
  • 18:23 bblack: upgrading varnish3 package on cache_misc
  • 18:17 bblack: HTTP/2 enable for cache_misc (nginx upgrade - T96848)
  • 17:59 gehel: restarting elasticsearch server elastic2012.codfw.wmnet (T110236)
  • 17:38 jynus: deployed new grants on new hosts for pdns-labs database
  • 17:18 gehel: restarting blazegraph on wdqs1002 (T134238)
  • 17:13 gehel: restarting elasticsearch server elastic2011.codfw.wmnet (T110236)
  • 17:11 bblack: HTTP/2 enable for cache_maps (nginx upgrade) - T96848
  • 17:03 bblack: nginx on all cp* restarted for openssl-1.0.2h
  • 16:58 godog: bootstrap restbase2009-a T132976
  • 16:43 gehel: restarting elasticsearch server elastic2010.codfw.wmnet (T110236)
  • 16:35 bblack: merging https://gerrit.wikimedia.org/r/282323 for puppetswat
  • 16:32 bblack: merging https://gerrit.wikimedia.org/r/#/c/282322 for puppetswat
  • 16:30 moritzm: uploaded openssl 1.0.2h for jessie-wikimedia to carbon
  • 16:28 bblack: merging https://gerrit.wikimedia.org/r/286598 for puppetswat
  • 16:26 bblack: merging https://gerrit.wikimedia.org/r/286642 for puppetswat
  • 16:25 bblack: merging https://gerrit.wikimedia.org/r/286484 for puppetswat
  • 16:22 bblack: authdns-update for https://gerrit.wikimedia.org/r/#/c/285085/ (yue lang) + workarounds from https://phabricator.wikimedia.org/T97051#1994679
  • 16:17 bblack: merging https://gerrit.wikimedia.org/r/285086 for puppetswat (apache redirects change)
  • 16:11 mobrovac: restabse restarting after https://gerrit.wikimedia.org/r/#/c/286278/
  • 16:04 bblack: merging https://gerrit.wikimedia.org/r/286278 (puppetswat)
  • 15:48 gehel: restarting elasticsearch server elastic2009.codfw.wmnet (T110236)
  • 15:47 logmsgbot: aude@tin Synchronized wmf-config/InitialiseSettings.php: Enable data access for Wikiversity (duration: 00m 38s)
  • 15:44 logmsgbot: aude@tin Synchronized dblists/wikidataclient.dblist: Remove beta.wikiversity as a client. See T54971 (duration: 00m 25s)
  • 15:41 logmsgbot: aude@tin Synchronized dblists/: Remove arbitraryaccess.dblist (duration: 00m 25s)
  • 15:39 logmsgbot: aude@tin Synchronized wmf-config/CommonSettings.php: Remove arbitraryaccess wikitag (duration: 00m 26s)
  • 15:34 logmsgbot: aude@tin Synchronized wmf-config/InitialiseSettings.php: Remove $wmgWikibaseEnableArbitraryAccess setting (duration: 00m 26s)
  • 15:28 paravoid: "systemctl restart ircecho" on kraz
  • 15:27 logmsgbot: aude@tin Synchronized wmf-config/Wikibase.php: Remove use of $wmgWikibaseEnableArbitraryAccess setting (duration: 00m 28s)
  • 15:15 volans: Change runtime semi-synchronous replication on eqiad x1,es2,es3 to match configured value T131753
  • 15:13 logmsgbot: thcipriani@tin Synchronized portals: (no message) (duration: 00m 30s)
  • 15:13 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 37s)
  • 15:08 gehel: restarting elasticsearch server elastic2008.codfw.wmnet (T110236)
  • 15:05 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable NewUserMessage on gu.wikiquote gerrit:286638 (duration: 00m 33s)
  • 14:53 logmsgbot: hashar@tin Synchronized php-1.27.0-wmf.23/extensions/Wikidata: wikidata to .23 https://gerrit.wikimedia.org/r/#/c/286657/ (duration: 02m 17s)
  • 14:36 gehel: restarting elasticsearch server elastic2007.codfw.wmnet (T110236)
  • 14:12 elukey: changed kafkatee's config on oxygen to watch all the webrequest text|upload kafka partitions (analytics doubled them recently)
  • 14:00 jynus: creating sanitarium db filtering (redactatron) for jamwiki
  • 13:51 hashar: Warming up HHVM cache hitting testwiki /w/api.php T131557
  • 13:40 gehel: restarting elasticsearch server elastic2006.codfw.wmnet (T110236)
  • 13:39 hashar: Warming up HHVM cache hitting testwiki T131557
  • 13:38 hashar: Warming up HHVM cache hitting testwiki T1315567
  • 13:37 logmsgbot: hashar@tin Finished scap: testwiki to php-1.27.0-wmf.23 and rebuild l10n cache T131557 (duration: 26m 54s)
  • 13:18 jynus: applying schema change on s3-master db T130692
  • 13:10 logmsgbot: hashar@tin Started scap: testwiki to php-1.27.0-wmf.23 and rebuild l10n cache T131557
  • 13:02 gehel: restarting elasticsearch server elastic2005.codfw.wmnet (T110236)
  • 13:01 gehel: restarting wdqs-updater and keeping it under close scrutiny for the moment
  • 13:00 volans: Change runtime semi-synchronous replication on eqiad core DBs (s1-s7) to match configured value T131753
  • 12:31 gehel: restarting elasticsearch server elastic2004.codfw.wmnet (T110236)
  • 12:28 gehel: stopping wdqs-updater as it leaks pipes
  • 12:07 hashar: Cutting 1.27.0-wmf.23 branches T131557
  • 12:07 hashar: Cutting 1.27.0-wmf.23 branches
  • 11:35 bblack: cp1068: upgraded varnish to 3.0.6plus-wm9
  • 10:42 moritzm: installing poppler security updates on ocg (and other trusty hosts)
  • 10:24 jynus: applying schema change on eqiad-s3 db servers T130692
  • 10:03 jynus: applying schema change on codfw-s3 db servers T130692
  • 09:55 jynus: testing online index creation on db2018
  • 09:11 gehel: restarting elasticsearch server elastic2003.codfw.wmnet (T110236)
  • 09:10 moritzm: powercycled alsafi (stuck in KVM)
  • 09:08 kart_: Update cxserver to 8a4254e
  • 08:18 gehel: restarting elasticsearch server elastic2002.codfw.wmnet (T110236)
  • 08:05 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1040 after maintenance (duration: 00m 25s)
  • 07:07 jynus: recovering torrus database @ netmon following T87815
  • 02:34 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue May 3 02:34:45 UTC 2016 (duration 8m 36s)
  • 02:26 awight: CentralNotice fundraising campaigns reenabled after CiviCRM maintenance
  • 02:26 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 08m 40s)
  • 02:21 mutante: ssh alsafi
  • 02:13 eileen: Updating civicrm from f5e8f98d07a2280118b7153bc342bf52ee67edd5 to d32032965f2a45d24f1eaeb352eb6bfd6641753c
  • 02:05 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Remove obsolete https->http rewrite for IRC notifications (duration: 00m 28s)
  • 02:04 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Remove obsolete ocwiki hack (duration: 00m 37s)
  • 01:54 RoanKattouw: Killed ssh processes on tin that had been hanging for days
  • 01:53 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 23019m 56s)
  • 01:53 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 12941m 41s)
  • 01:53 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 4303m 19s)
  • 01:40 mutante: irc.wm.org - see T123729 if any questions
  • 01:39 mutante: switching irc.wikimedia.org from old server argon to new server kraz. old server still running untouched as argon.wikimedia.org. no clients are kicked. appservers are sending RC to both.
  • 00:50 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.php: https://gerrit.wikimedia.org/r/286552 - interwiki cache update including jamwiki, horizon, and other things (duration: 00m 26s)
  • 00:41 logmsgbot: krenair@tin Synchronized langlist: create jamwiki - https://gerrit.wikimedia.org/r/286258 (duration: 00m 24s)
  • 00:40 logmsgbot: krenair@tin Synchronized static/images/project-logos: create jamwiki - https://gerrit.wikimedia.org/r/286258 (duration: 00m 24s)
  • 00:39 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: create jamwiki - https://gerrit.wikimedia.org/r/286258 (duration: 00m 24s)
  • 00:38 logmsgbot: krenair@tin rebuilt wikiversions.php and synchronized wikiversions files: create jamwiki - https://gerrit.wikimedia.org/r/286258
  • 00:36 logmsgbot: krenair@tin Synchronized dblists: create jamwiki - https://gerrit.wikimedia.org/r/286258 (duration: 00m 27s)
  • 00:12 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/286534/ (duration: 00m 30s)

2016-05-02

  • 23:58 logmsgbot: krenair@tin Finished scap: for https://gerrit.wikimedia.org/r/#/c/286535/ i18n changes (duration: 25m 57s)
  • 23:37 ebernhardson: restart elastic2007, codfw cluster master, to resolve lingering issues after resolving frozen index race condition
  • 23:32 logmsgbot: krenair@tin Started scap: for https://gerrit.wikimedia.org/r/#/c/286535/ i18n changes
  • 23:30 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.22/extensions/Echo: https://gerrit.wikimedia.org/r/#/c/286535/ and https://gerrit.wikimedia.org/r/#/c/286532/ (duration: 00m 30s)
  • 23:10 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.22/extensions/Flow/modules/flow/ui/widgets/editor/mw.flow.ui.EditorSwitcherWidget.js: https://gerrit.wikimedia.org/r/#/c/286527/ (duration: 00m 26s)
  • 22:56 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.22/extensions/CirrusSearch/: Cirrus: Stop auto-creating frozen index (duration: 00m 31s)
  • 22:56 awight: CentralNotice fundraising campaigns disabled for CiviCRM outage
  • 22:52 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: Config setting to stop auto-creating frozen index in cirrus (duration: 00m 33s)
  • 22:18 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/286537/ (duration: 00m 42s)
  • 22:13 cwd: updated crm from b386a6821c71310950ccdcdcf2616add727e1af4 to f5e8f98d07a2280118b7153bc342bf52ee67edd5
  • 22:10 papaul: restbase200[7-9]- signing puppet certs, salt-key, initial run
  • 21:26 cscott: updated OCG to version b775e612520f9cd4acaea42226bcf34df07439f7
  • 21:21 cscott: starting OCG deploy (a little late)
  • 20:23 gehel: restarting elasticsearch server elastic2001.codfw.wmnet (T110236)
  • 20:21 gehel: starting rolling restart of elasticsearch codfw cluster to disable multicast (T110236)
  • 20:15 subbu: finished deploying parsoid version 0a26f3a4
  • 20:09 subbu: synced code + restarted parsoid on wtp1001 as canary
  • 20:05 subbu: starting deploy of parsoid version 0a26f3a4
  • 19:27 logmsgbot: aaron@tin Synchronized php-1.27.0-wmf.22/includes/filebackend/FileBackendMultiWrite.php: 63b2d7b2eae (duration: 00m 32s)
  • 19:17 mutante: manually removing 2fa from my own wikitech account, adding it back ..
  • 18:24 gehel: deploying latest WDQS version
  • 17:23 robh: restbase2004 offline for next few hours for comparison work for new systems T132976
  • 16:01 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/286286/ (duration: 00m 26s)
  • 15:53 logmsgbot: krenair@tin Synchronized wikiversions-labs.json: https://gerrit.wikimedia.org/r/#/c/283689/ (duration: 00m 25s)
  • 15:53 logmsgbot: krenair@tin Synchronized dblists/all-labs.dblist: https://gerrit.wikimedia.org/r/#/c/283689/ (duration: 00m 26s)
  • 15:44 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/286287/ (duration: 00m 25s)
  • 15:40 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/286285/ (duration: 00m 25s)
  • 15:32 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.22/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/286434/2 (duration: 02m 02s)
  • 15:29 bblack: re-pooling esams
  • 15:22 jynus: restarting db1040 for reimage
  • 15:21 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.22/extensions/Math/MathRestbaseInterface.php: https://gerrit.wikimedia.org/r/#/c/286412/ (duration: 00m 26s)
  • 15:07 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/285700/ (duration: 00m 42s)
  • 14:52 moritzm: rolling restart of zookeeper to pick up Java update
  • 14:22 bblack: starting gdnsd on esams (esams is marked down there)
  • 14:20 bblack: stopped gdnsd on eeden
  • 13:13 jynus: stopping db1040 mysql for backup before cloning
  • 12:15 elukey: deployed Varnish change to force HTTP 503 for datasets.wikimedia.org, stats.wikimedia.org, metrics.wikimedia.org as prep-step for OS reimage.
  • 12:13 elukey: deployed Varnish cache::misc change to force HTTP 503 for datasets.wikimedia.org, stats.wikimedia.org, metrics.wikimedia.org as prep-step for OS reimage.
  • 12:12 elukey: Merged Varnish cache::misc change to force HTTP 503 for datasets.wikimedia.org, stats.wikimedia.org, metrics.wikimedia.org as prep-step for OS reimage.
  • 11:21 elukey: deployed the last version of Event Logging from tin. Service also restarted.
  • 11:06 moritzm: rolling restart of hhvm in eqiad for pcre security update
  • 10:42 moritzm: rolling restart of hhvm in codfw for pcre security update
  • 09:58 moritzm: uploaded openldap 2.4.41+wmf1 for jessie-wikimedia to carbon (T130593)
  • 08:14 hashar: Restarted stuck Jenkins (due to IRC plugin)
  • 07:44 moritzm: rebooting hasseleh/hassium for kernel upgrade to 4.4
  • 07:10 moritzm: installing poppler security updates
  • 06:46 _joe_: rebooting serpens from ganeti, unreachable
  • 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon May 2 02:30:33 UTC 2016 (duration 9m 18s)
  • 02:21 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 09m 31s)

2016-05-01

  • 19:37 SMalyshev: enabled wdqs1002, put wdqs1001 in maintenance mode for reload
  • 16:20 volans: changing live configuration of db1042 thread_pool_stall_limit to 10 to avoid connection timeout errors
  • 16:18 volans: changing live configuration of db1042 thread_pool_stall_limit back to 100 to test impact on connection timeout
  • 16:08 volans: changing live configuration of db1042 thread_pool_stall_limit to 10 to test impact on connection timout
  • 15:24 jynus: alter table puppet.fact_values to a bigint unsigned for m1 T107753
  • 15:07 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Depool db1040 for investigation T134114 (duration: 01m 22s)
  • 14:44 volans: truncated puppet.fact_values table to fix puppet (as documented on wikitech)
  • 10:58 godog: reboot furud.codfw.wmnet, ganeti instance with increasing load and 100% iowait, kvm/ganeti idle instance bug likely T134098

2016-04-30

  • 13:41 elukey: disabled puppet on analytics1047 and scheduled downtime for the host, IO errors in the dmesg for /dev/sdd. Stopped also Hadoop daemons to remove it from the cluster temporarily (not sure how to do it properly, will write docs).
  • 10:45 volans: Reset slave on sanitarium:3311 due to corrupted relay log after skipping query for duplicate key T132416
  • 10:19 volans: restarted slave on dbstore1001 skipping missing database T132837
  • 08:28 gehel: restarting elasticsearch server elastic1031.eqiad.wmnet (T110236)
  • 07:15 gehel: restarting elasticsearch server elastic1030.eqiad.wmnet (T110236)
  • 06:32 gehel: restarting elasticsearch server elastic1029.eqiad.wmnet (T110236)
  • 06:16 gehel: restarting elasticsearch server elastic1028.eqiad.wmnet (T110236)
  • 01:15 aude: applied Ibd302e1 to terbium for debugging broken wikidata rdf dumps

2016-04-29

  • 22:57 mutante: DNS - forced authdns-gen-zones etc from https://phabricator.wikimedia.org/T97051#1994679 on ns0/ns1/ns2 to get new language added
  • 20:59 gehel: restarting elasticsearch server elastic1027.eqiad.wmnet (T110236)
  • 19:56 urandom: (Re)starting cleanup on restbase1009-{a,b}.eqiad.wmnet
  • 19:56 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.22/extensions/CentralNotice/: T133971 (duration: 00m 41s)
  • 19:29 gehel: restarting elasticsearch server elastic1026.eqiad.wmnet (T110236)
  • 19:07 gehel: restarting elasticsearch server elastic1025.eqiad.wmnet (T110236)
  • 18:21 logmsgbot: jzerebecki@tin Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Hooks/OutputPageBeforeHTMLHookHandler.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 2 of 2 T132645 (duration: 00m 28s)
  • 18:20 logmsgbot: jzerebecki@tin Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 1 of 2 T133924 (duration: 00m 29s)
  • 18:14 logmsgbot: jzerebecki@tin Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Hooks/OutputPageBeforeHTMLHookHandler.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 2 of 2 T132645 (duration: 00m 34s)
  • 18:14 robh: started all slaves via dbstore2001 this time.
  • 18:12 logmsgbot: jzerebecki@tin Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 1 of 2 T133924 (duration: 00m 44s)
  • 18:07 robh: started all slaves via dbstore2002 per jaime's request
  • 17:45 gehel: restarting elasticsearch server elastic1024.eqiad.wmnet (T110236)
  • 16:56 gehel: restarting elasticsearch server elastic1023.eqiad.wmnet (T110236)
  • 16:22 gehel: restarting elasticsearch server elastic1022.eqiad.wmnet (T110236)
  • 15:29 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2047 and db2068. Depool db2008, db2009. Pool db2033 as the new x1 node. (duration: 00m 27s)
  • 15:17 gehel: restarting elasticsearch server elastic1021.eqiad.wmnet (T110236)
  • 14:56 logmsgbot: oblivian@palladium conftool action : set/pooled=yes; selector: name=mw1153.eqiad.wmnet
  • 14:54 jynus: moving topology of db2033 to be the new x1 master on codfw
  • 14:40 logmsgbot: oblivian@palladium conftool action : set/pooled=no; selector: name=mw1153.eqiad.wmnet
  • 14:32 gehel: restarting elasticsearch server elastic1020.eqiad.wmnet (T110236)
  • 14:26 hashar: Rebased tin:/srv/mediawiki-staging 31886c7..8e2670a . Bring in 3 changes that are solely for beta cluster.
  • 13:54 jynus: stopping mysql db2008 (cloning to db2033)
  • 13:39 jynus: reimaging db2033
  • 13:09 gehel: restarting elasticsearch server elastic1019.eqiad.wmnet (T110236)
  • 12:30 gehel: restarting elasticsearch server elastic1018.eqiad.wmnet (T110236)
  • 11:39 elukey: soft reboot for mw1119 (not responsive to ssh, root login timed out on the console)
  • 09:43 gehel: restarting elasticsearch server elastic1017.eqiad.wmnet (T110236)
  • 09:42 gehel: restarting elasticsearch server elastic1016.eqiad.wmnet (T110236)
  • 09:01 jynus: changing live configuration of db1049 thread_pool_stall_limit to 10 to test impact on connection timout
  • 08:20 gehel: restarting elasticsearch server elastic1016.eqiad.wmnet (T110236)
  • 07:57 elukey: puppet disabled on new kafka codfw instances due to errors while starting Event Bus (hosts not in service)
  • 07:54 moritzm: enabled base::firewall on stat1002
  • 07:52 gehel: restarting elasticsearch server elastic1015.eqiad.wmnet (T110236)
  • 07:36 godog: stop cleanups on restbase1014-b
  • 06:46 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Reduce normal traffic on s2 API servers (duration: 00m 27s)
  • 06:33 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1038, increase weight of new hardware slaves db107[4-8] (duration: 00m 33s)
  • 05:42 gehel: restarting elasticsearch server elastic1014.eqiad.wmnet (T110236)
  • 05:41 mutante: re: "02:29 Krenair: last deployment was slow because of snapshot1007 being offline" it's back, i don't know why, it was powered down and i just tried switching it on. that helped. the command is literally "power on" on HP
  • 05:39 mutante: snapshot1007 - was powered down, powering it on. (..connect to mgmt.. "damn it's a HP")
  • 05:35 mutante: snapshot1007 - not reachable, duration 10h
  • 04:58 gehel: restarting elasticsearch server elastic1013.eqiad.wmnet (T110236)
  • 02:29 Krenair: last deployment was slow because of snapshot1007 being offline, icinga shows it's been like that for the last 7 hours
  • 02:22 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.22/extensions/EventBus: https://gerrit.wikimedia.org/r/286115 (duration: 02m 27s)
  • 02:20 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 09m 20s)
  • 00:27 mutante: RT - remove libapache2-mod-php5, restart Apache, Perl apps dont need PHP
  • 00:22 logmsgbot: aaron@tin Synchronized wmf-config/filebackend-production.php: Set "autoResync" on for local-multiwrite (duration: 02m 29s)
  • 00:15 cwd: updated civicrm from 777a91b8f9f6003a3eebdb8f2c73e45cc2bfb4a4 to b386a6821c71310950ccdcdcf2616add727e1af4
  • 00:04 Dereckson: Previous deployment: Gerrit:285553 Enable lazy loaded references in beta (T129693)
  • 00:03 Dereckson: Previous deployment: Gerrit:285927 GoogleNewsSitemap configuration (T39608)
  • 00:03 Dereckson: Previous deployment: Gerrit:252627 Revert "Increase abusefilter emergency disable threshold on MediaWiki.org"
  • 00:03 Dereckson: Previous deployment: Gerrit:280865+Gerrit:285989 Allow wmf-config/throttle.php to be lenient on ip/IP typo, clean rules (no-op)
  • 00:02 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 02m 25s)
  • 00:02 Dereckson: Previous deployment: Gerrit:279142 Document FIXME statement in config (no-op)

2016-04-28

  • 23:59 logmsgbot: maxsem@tin Synchronized wmf-config/throttle.php: (no message) (duration: 02m 24s)
  • 23:56 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 02m 25s)
  • 23:26 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.22/extensions/VisualEditor/: (no message) (duration: 02m 30s)
  • 23:17 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.22/extensions/WikidataPageBanner/: https://gerrit.wikimedia.org/r/286018 (duration: 02m 29s)
  • 23:12 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.22/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,285769,n,z (duration: 02m 34s)
  • 23:07 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.22/extensions/UploadWizard/: https://gerrit.wikimedia.org/r/#q,286016,n,z (duration: 02m 34s)
  • 22:01 chasemp: reboot of holmium
  • 21:41 twentyafterfour: added usleep(200000); to slow down the phabricator import even further.
  • 21:32 twentyafterfour: reduced phabricator taskmaster processes to 1
  • 21:08 gehel: restarting elasticsearch server elastic1012.eqiad.wmnet (T110236)
  • 19:47 gehel: restarting elasticsearch server elastic1011.eqiad.wmnet (T110236)
  • 19:15 jynus: manually rotating db1038's error log
  • 19:10 hashar: 1.27.0-wmf.22 deployed. Uneventful.
  • 19:00 logmsgbot: hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.22
  • 18:42 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.22/extensions/Echo/: Fix fatal T133921 (duration: 00m 32s)
  • 18:19 gehel: restarting elasticsearch server elastic1010.eqiad.wmnet (T110236)
  • 18:08 logmsgbot: mattflaschen@tin Synchronized wmf-config/db-labs.php: Beta Cluster change (duration: 00m 37s)
  • 17:40 yurik: deployed and restarted kartotherian & tilerator
  • 16:57 gehel: restarting elasticsearch server elastic1009.eqiad.wmnet (T110236)
  • 16:41 ejegg: updated payments-wiki from 16ed5af8c8544ea1c8d837ae16585eba4cbbfd4e to c502ab2f6b6ff914d67503a664d36076fdc32dcf
  • 16:26 twentyafterfour: further reduced the queue worker count on phabricator, to relieve stress on mysql m3 db1048
  • 16:17 bblack: starting SPDY stats sample on 8x caches for 24H - T96848
  • 16:15 gehel: restarting elasticsearch server elastic1008.eqiad.wmnet (T110236)
  • 15:35 elukey: installed memcached 1.4.25-2 (Debian sid/testing) in mc2009 as part of performance test (T129963)
  • 15:27 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Math: increase the number of concurrent connections to 150 gerrit:283269 (duration: 00m 35s)
  • 15:27 logmsgbot: gehel@palladium conftool action : get/pooled; selector: elastic1001.eqiad.wmnet
  • 15:23 elukey: puppet disabled on mc2009 as preparation step for https://gerrit.wikimedia.org/r/#/c/284907
  • 15:12 gehel: restarting elasticsearch server elastic1007.eqiad.wmnet (T110236)
  • 15:05 jynus: restarting db1038 for reimage to jessie
  • 14:32 gehel: wdqs-updater started on wdqs1002 (T133566)
  • 14:26 bblack: started SPDY stats sample on 8x caches - T96848#2248582
  • 14:25 elukey: deployed new zookeeper nodes in codfw (conf200[123])
  • 13:59 gehel: restarting elasticsearch server elastic1006.eqiad.wmnet (T110236)
  • 13:23 bblack: rebooting cp1008
  • 12:50 gehel: restarting elasticsearch server elastic1005.eqiad.wmnet (T110236)
  • 12:33 moritzm: upgrade/rolling restart of mediawiki canaries for pcre upgrade
  • 12:31 volans: Increase eqiad masters expire_logs_days (according to available space) T133333
  • 12:31 jynus: restarting sanitarium:s3 instance- query stuck again
  • 12:04 gehel: restarting elasticsearch server elastic1004.eqiad.wmnet (T110236)
  • 11:25 moritzm: uploaded varnish 3.0.6plus-wm9 to carbon for jessie-wikimedia
  • 11:19 volans: cleaning up some space on puppet-compiler host
  • 11:14 moritzm: upgraded varnish on cp1008 to 3.0.7 (except one patch)
  • 11:14 gehel: restarting elasticsearch server elastic1003.eqiad.wmnet (T110236)
  • 11:03 jynus: backing up db1038 data to dbstore1002
  • 10:50 jynus: stopping and restarting db1038 for backup and upgrade T125028
  • 10:41 jynus: running update table on eventlogging database on the master (db1046) T108856
  • 10:39 logmsgbot: elukey@palladium conftool action : set/pooled=yes; selector: aqs1001.eqiad.wmnet
  • 10:32 hoo: Set new email for global user "Sebschlicht" per https://meta.wikimedia.org/w/index.php?oldid=15564713#Sebschlicht2.40global and private communication
  • 10:31 moritzm: installing PHP updates for jessie
  • 09:46 gehel: restarting elasticsearch server elastic1002.eqiad.wmnet (T110236)
  • 09:23 jynus: removing unused mysql-server-5.5 from holmium (keeping database just in case) T128737
  • 09:10 logmsgbot: elukey@palladium conftool action : set/pooled=no; selector: aqs1001.eqiad.wmnet
  • 09:03 moritzm: remove obsolete mysql 5.5 installations from mw1022, mw1023, mw1024, mw1025, mw1114 and mw1163
  • 09:00 gehel: restarting elasticsearch server elastic1001.eqiad.wmnet (T110236)
  • 08:59 gehel: starting rolling restart of elasticsearch cluster in eqiad (T110236)
  • 08:58 logmsgbot: oblivian@palladium conftool action : set/weight=10; selector: name=mw2018.codfw.wmnet
  • 08:57 logmsgbot: oblivian@palladium conftool action : set/weight=12; selector: name=mw2018.codfw.wmnet
  • 08:12 elukey: restarting kafka on kafka{1012,1014,1022,1020,2001,2002} for Java upgrades. Will probably trigger some EventLogging alarms due to a bug (T133779)
  • 07:51 twentyafterfour: applied a hotfix to phabricator repository import job so that autoclose will not apply to unmerged refs/changes
  • 07:50 twentyafterfour: reduced the number of phabricator worker processes to hopefully stop exhausting mysql connections.
  • 05:37 mutante: lvs1012 - puppet fail, tries to upgrade tcpdump package and cannot be authenticated
  • 05:34 mutante: mw1146 - hhvm restart
  • 05:27 mutante: krypton remove RT packages, remnants from testing
  • 03:04 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.22/extensions/Echo: Fix T133817 (originally scheduled for SWAT) (duration: 00m 34s)
  • 03:03 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.21/extensions/Echo: Fix T133817 (originally scheduled for SWAT) (duration: 00m 39s)
  • 02:41 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 09m 24s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 10m 38s)
  • 02:12 twentyafterfour: manually edited crontab on iridium and killed multiple instances of public_task_dump.py (the cronjob was defined as * 2 * * * instead of 0 2 * * *)
  • 00:48 twentyafterfour: Phabricator's back online, everything seems to have gone smoothly.
  • 00:29 twentyafterfour: Preparing to take phabricator offline for maintenance.

2016-04-27

  • 22:18 logmsgbot: mattflaschen@tin Synchronized wmf-config/db-labs.php: Beta Cluster change (duration: 00m 29s)
  • 22:04 bblack: banned req.url ~ "^/w/load.php.*choiceData" on cache_text
  • 22:00 bblack: banned req.url ~ "^/load.php.*choiceData" on cache_text
  • 21:22 cwd: updated civicrm from 15a0086eef78f16110eba358a28ef78b51a385e1 to 777a91b8f9f6003a3eebdb8f2c73e45cc2bfb4a4
  • 21:03 bblack: rebooting cp1065
  • 21:01 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Restore codfw to elasticsearch config T133784 (duration: 00m 31s)
  • 21:00 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: Restore codfw to elasticsearch config T133784 (duration: 00m 37s)
  • 20:48 thcipriani: restarting jenkins after plugin downgrade
  • 20:41 hashar: 1.27.0-wmf.22 to group1 has been completed without incident. Deployment is open !
  • 20:41 ebernhardson: Enabled cirrussearch writes to codfw only on mw1165 w/ live hack
  • 20:32 gehel: switching wdqs1002 to maintenance and reimporting data (T133566)
  • 20:28 cscott: updated OCG to version e39e06570083877d5498da577758cf8d162c1af4
  • 20:20 yurik: deployed kartotherian & tilerator services
  • 20:09 gehel: adding back wdqs1001 to varnish configuration after reinstall (T133566)
  • 19:24 Pchelolo: update restbase to e9fbdfe
  • 19:18 Pchelolo: update restbase to e9fbdfe: canary on restbase1007
  • 19:11 Pchelolo: update restbase to e9fbdfe: staging
  • 19:09 logmsgbot: hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.22
  • 19:00 dcausse: restarting elastic on elastic2007.codfw.wmnet (master)
  • 18:56 mutante: creating VM ununpentium on ganeti/eqiad (T123713)
  • 18:55 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Drop codfw from elasticsearch config T133784 (duration: 00m 36s)
  • 18:55 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: Drop codfw from elasticsearch config T133784 (duration: 00m 25s)
  • 18:02 jynus: generating new triggers for eventlogging_sync schema T108856
  • 16:58 gehel: increase throttling limit and concurrency on recoveries for elasticsearch codfw cluster (T133784)
  • 16:05 gehel: increasing curl pool size for jobrunners (T133755)
  • 15:46 elukey: restarted kafka1013 for java upgrades
  • 15:30 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable NewUserMessage on hiwikiquote gerrit:285639 (duration: 00m 31s)
  • 15:09 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Subject namespace to hiwikibooks gerrit:285008 (duration: 02m 41s)
  • 14:52 moritzm: uploaded pcre 8.31-2ubuntu2.3+wm1 to carbon for trusty-wikimedia (rebuild of latest trusty update with our patch to enable JIT)
  • 14:44 elukey: repooled kafka1001 after upgrades, will do the same procedure to kafka1002
  • 14:40 _joe_: upgraded conftool on palladium
  • 14:40 elukey: restarted kafka on kafka1001
  • 14:38 _joe_: upgrading conftool on all cp servers
  • 14:33 elukey: kafka1001.eqiad.wmnet depooled from eventbus for kafka upgrades (via confctl)
  • 14:14 chasemp: restart phd on iridium as it keeps complaining it lost procs (seems ok now)
  • 13:53 elukey: restarted kafka on kafka1018.eqiad.wmnet for Java upgrades
  • 13:24 gehel: hard restart of codfw elasticsearch cluster
  • 13:23 _joe_: uploading new conftool packages, T128199
  • 12:15 dcausse: restarting elastic on elastic2004.codfw.wmnet
  • 11:43 dcausse: restarting elastic on elastic2003.codfw.wmnet
  • 11:05 dcausse: restarting elastic on elastic2002.codfw.wmnet
  • 10:58 logmsgbot: reedy@tin LocalisationUpdate failed: git pull of core failed
  • 10:58 logmsgbot: reedy@tin LocalisationUpdate failed: git pull of core failed
  • 10:49 dcausse: restarting elastic on elastic2001.codfw.wmnet
  • 10:36 logmsgbot: gehel@tin Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 02m 47s)
  • 10:30 gehel: switching elasticsearch morelike traffic from codfw to eqiad
  • 10:01 _joe_: shutting down mw10[7-8][0-9] and mw112[1-9]/mw1130 for T126242
  • 09:56 _joe_: clean puppet certs and facts on mw10[7-8][0-9] and mw112[1-9]/mw1130 for T126242
  • 09:23 _joe_: restarted hhvm on mw1144, usual deadlock
  • 09:11 _joe_: repooling cp3038
  • 09:09 _joe_: repooling cp3038
  • 08:59 _joe_: hard rebooting cp3038, console unreachable
  • 08:59 jynus: starting mysql on holmium
  • 08:57 _joe_: depooling cp3038 from all live pools
  • 08:40 _joe_: stopping puppet on mw10[7-8][0-9] and mw112[1-9]/mw1130 for T126242
  • 06:11 mutante: krypton - for some unrelated reason on every puppet run there is some noise about analytics::burrow stuff
  • 06:09 mutante: krypton - re-enabled puppet after LE + rt.wm.o puppetization issues fixed with gerrit 285586 @bblack #LEftw
  • 03:11 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Apr 27 03:11:11 UTC 2016 (duration 9m 46s)
  • 03:01 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 18m 27s)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 11m 46s)

2016-04-26

  • 23:29 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.22/extensions/Flow: SWAT (duration: 00m 45s)
  • 23:28 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.22/extensions/Echo: SWAT (duration: 00m 31s)
  • 23:26 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Revert lazy-loaded references (duration: 00m 25s)
  • 23:24 logmsgbot: catrope@tin Synchronized wmf-config/Wikibase.php: Actually enable data access in user language for Commons and testwikidatawiki (duration: 00m 26s)
  • 23:21 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.22/extensions/WikimediaEvents: SWAT (duration: 00m 25s)
  • 23:21 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.21/extensions/WikimediaEvents: SWAT (duration: 00m 28s)
  • 23:16 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable lazy-loaded references in mobile web beta on enwiki (duration: 00m 26s)
  • 23:11 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable arbitrary access in user language on Commons and testwikidatawiki (duration: 00m 33s)
  • 23:09 logmsgbot: catrope@tin Synchronized dblists/arbitraryaccess.dblist: Enable arbitrary access on Commons (duration: 00m 33s)
  • 22:54 ejegg: updated payments-wiki from 3ac3a0d7bb6e2d8443a87fb62bd5412a95f75aa2 to 16ed5af8c8544ea1c8d837ae16585eba4cbbfd4e
  • 21:39 gehel: restarting elasticsearch server elastic2021.codfw.wmnet - activating unicast (T110236)
  • 21:26 logmsgbot: ori@tin Synchronized wmf-config/filebackend-production.php: Ia4434256c3: Set descriptionCacheExpiry for Commons repo (duration: 00m 35s)
  • 21:06 gehel: activating geodns for new varnish maps servers (T131880)
  • 20:47 gehel: restarting elasticsearch server elastic2020.codfw.wmnet - activating unicast (T110236)
  • 20:38 bblack: restarting pybal on lvs3002 to enable new cache configuration for maps (T131880)
  • 20:35 bblack: restarting pbyal on lvs3004 to enable new cache configuration for maps (T131880)
  • 20:32 bblack: restarting pbyal on lvs2002 to enable new cache configuration for maps (T131880)
  • 20:30 bblack: restarting pbyal on lvs2005 to enable new cache configuration for maps (T131880)
  • 20:07 hashar: Killing Zuul entirely due to deadlock T128569
  • 20:02 bblack: restarting pbyal on lvs4002 to enable new cache configuration for maps (T131880)
  • 19:54 gehel: restarting pybal on lvs4004 to enable new cache configuration for maps (T131880)
  • 19:52 gehel: restarting elasticsearch server elastic2019.codfw.wmnet - activating unicast (T110236)
  • 19:14 gehel: restarting elasticsearch server elastic2018.codfw.wmnet - activating unicast (T110236)
  • 19:07 logmsgbot: hashar@tin Purged l10n cache for 1.27.0-wmf.20
  • 19:04 logmsgbot: hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.22
  • 18:41 gehel: restarting HHVM on terbium to enable upgrade to 3.12.1 (T132751)
  • 17:55 gehel: restarting elasticsearch server elastic2017.codfw.wmnet - activating unicast (T110236)
  • 17:38 gehel: restarting elasticsearch server elastic2016.codfw.wmnet - activating unicast (T110236)
  • 16:45 gehel: restarting elasticsearch server elastic2015.codfw.wmnet - activating unicast (T110236)
  • 16:43 godog: start restbase1014-[ab] cleanup
  • 16:15 gehel: restarting elasticsearch server elastic2014.codfw.wmnet - activating unicast (T110236)
  • 16:07 ejegg: updated payments from f09297028acace67588c2de845b754e2ace75c97 to 3ac3a0d7bb6e2d8443a87fb62bd5412a95f75aa2
  • 16:07 ejegg: updated payments from
  • 15:34 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean InitialiseSettings.php gerrit:285381 (duration: 00m 27s)
  • 15:27 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable action=credits on test and beta gerrit:285374 (duration: 00m 29s)
  • 15:25 papaul: OS installation on graphite2002
  • 15:24 gehel: restarting elasticsearch server elastic2013.codfw.wmnet - activating unicast (T110236)
  • 15:24 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Reformatted comment gerrit:285373 (duration: 00m 28s)
  • 15:22 hashar: 1.27.0-wmf.22 HHVM cache is warmed up | T131556
  • 15:20 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Switch to short syntax array gerrit:285361 (duration: 00m 36s)
  • 15:17 logmsgbot: hashar@tin Synchronized php-1.27.0-wmf.22/includes/DefaultSettings.php: Set = 1.27.0-wmf.22 (duration: 01m 04s)
  • 14:55 logmsgbot: hashar@tin Finished scap: testwiki to php-1.27.0-wmf.22 and rebuild l10n cache (duration: 29m 28s)
  • 14:38 gehel: restarting elasticsearch server elastic2012.codfw.wmnet - activating unicast (T110236)
  • 14:26 logmsgbot: hashar@tin Started scap: testwiki to php-1.27.0-wmf.22 and rebuild l10n cache
  • 13:33 hashar: Checking out MediaWiki 1.27.0-wmf.22 on tin | T131556
  • 13:33 hashar: Checking out MediaWiki 1.27.0-wmf.22 on tin
  • 13:11 gehel: restarting elasticsearch server elastic2011.codfw.wmnet - activating unicast (T110236)
  • 12:27 gehel: restarting elasticsearch server elastic2010.codfw.wmnet - activating unicast (T110236)
  • 12:09 elukey: mw1145 powercycled (root login timeout on the console)
  • 11:43 gehel: restarting elasticsearch server elastic2009.codfw.wmnet - activating unicast (T110236)
  • 10:31 jynus: restarting mw1143 as is no longer accessible
  • 09:51 jynus: restarting db2068 for upgrade before returning from maintenance
  • 09:50 gehel: starting reinstall of wdqs1001 (T133566)
  • 09:47 gehel: restarting elasticsearch server elastic2008.codfw.wmnet - activating unicast (T110236)
  • 09:41 moritzm: uploaded apache 2.4.10-10+deb8u4+wmf1 to carbon
  • 09:12 _joe_: restarted hhvm on mw1133, almost OOM
  • 09:11 hashar: CI is back up!
  • 09:09 mobrovac: citoid deployed 36c2bf02
  • 09:03 jynus: restarting db1069 s1 instance- one query is "stuck", creating lag on labs
  • 08:56 gehel: depooling wdqs1001 from varnish for reinstall
  • 08:52 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1052 (old s1-master) with low weight (duration: 00m 31s)
  • 08:50 YuviPanda: restarted nova-conductor & scheduler on labcontrol1001 for T133654
  • 08:45 hashar: Most of CI is down / deadlocked due to wmflabs being unresponsive T133654
  • 08:44 gehel: restarting elasticsearch server elastic2007.codfw.wmnet - activating unicast (T110236)
  • 08:27 jynus: stopping db2068 for cloning to db2047
  • 08:06 hashar: CI jobs deadlocked due to castor being unavailable | https://phabricator.wikimedia.org/T133652
  • 06:52 jynus: restarting db2047 for reimaging
  • 02:46 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Apr 26 02:46:52 UTC 2016 (duration 9m 1s)
  • 02:37 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 19m 17s)
  • 01:09 paravoid: reinstalling bast4001 w/ jessie

2016-04-25

  • 23:51 logmsgbot: dereckson@tin Synchronized docroot/noc/conf/: noc.wikimedia.org update (Gerrit:285061, Gerrit:285062, Gerrit:281977) (duration: 00m 26s)
  • 23:48 logmsgbot: dereckson@tin Synchronized docroot/noc/createTxtFileSymlinks.sh: noc: PoolCounterSettings-eqiad.php → PoolCounterSettings.php (Gerrit:285062, no-op) (duration: 00m 30s)
  • 23:21 Krenair: Deployed patch for T132653
  • 23:10 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/284831/ - Disable Echo survey on French wikis
  • 22:30 eileen: Updating civicrm
  • 21:02 gehel: adding cp10(46|47|59|60)\.eqiad\.wmnet to maps caching cluster (T109162)
  • 21:02 cscott: updated OCG to version 58a720508deb368abfb7652e6a8c7225f95402d2
  • 20:46 csteipp: deployed updated patch for T98313 to include very headers
  • 20:35 gehel: restarting elasticsearch server elastic2006.codfw.wmnet - activating unicast (T110236)
  • 20:07 subbu: finished deploying parsoid version d5363193
  • 20:04 subbu: synced new code and restarted parsoid on wtp1001 as a canary
  • 20:01 subbu: deploying parsoid version d5363193
  • 19:49 gehel: restarting elasticsearch server elastic2005.codfw.wmnet - activating unicast (T110236)
  • 19:39 hashar: Nodepool now booting 2 trusty instances for CI needs. T133203
  • 19:27 gehel: disabling puppet on cp1043/cp1044 while reconfiguring Maps caching servers
  • 19:27 gehel: start configuration of new maps caching servers (T109162)
  • 19:08 gehel: restarting elasticsearch server elastic2004.codfw.wmnet - activating unicast (T110236)
  • 18:41 logmsgbot: legoktm@tin Synchronized wmf-config: ⓛⓐⓑⓢ-ⓞⓝⓛⓨ, ⓝⓞ-ⓞⓟ (duration: 00m 40s)
  • 18:38 bblack: switching mobile hostnames to text IPs in DNS (10 min TTL)
  • 18:21 Pchelolo: finished update RESTBase to c1d5193
  • 18:09 Pchelolo: update RESTBase to c1d5193
  • 17:55 Pchelolo: update RESTBase to c1d5193, canary on restbase1007
  • 17:53 jynus: wiping db1052 tendril's monitoring data due to 5.5 -> 5.6 incompatibility
  • 17:48 jynus: db1052 now in jessie
  • 17:47 Pchelolo: update RESTBase to c1d5193, canary on restbase1005
  • 17:26 Pchelolo: update RESTBase to c1d5193, staging
  • 17:22 mutante: Ⓐ
  • 17:20 ori: 😊
  • 17:01 gehel: restarting elasticsearch server elastic2003.codfw.wmnet - activating unicast (T110236)
  • 16:34 logmsgbot: jzerebecki@tin Synchronized wmf-config/CommonSettings.php: config 3bd39b8b9ac9fb252ff8a3e9a93b14c1defd45fc T132820 : Use notify-type-availability due to Echo change (duration: 00m 26s)
  • 16:24 jzerebecki: T132914: @tin:/srv/mediawiki-staging (master %=)$ mwscript extensions/Flow/maintenance/FlowUpdateBetaFeaturePreference.php frwikisource
  • 16:15 logmsgbot: jzerebecki@tin Synchronized wmf-config/InitialiseSettings.php: config 1f8fea8e53b2e95caf0f559e0ad6e2cb1d51c967 T132914 : Enable Flow opt-in beta feature on frwikisource (duration: 00m 28s)
  • 16:10 logmsgbot: jzerebecki@tin Synchronized wmf-config/InitialiseSettings.php: config 1b156b46d42f2482c1941427dc4ec8aeb427f0b3 T133286 : Add *.asc-test.nl to wgCopyUploadsDomains (duration: 00m 27s)
  • 15:59 YuviPanda: restart pdns recursor & pdns on holmium
  • 15:58 logmsgbot: jzerebecki@tin Synchronized wmf-config/InitialiseSettings.php: config 0d0d7acf79891ad36d48cd30887bb2299d43f264 T132792 3 of 3 ; and config 7144b0139c85b565af76a993b38209b4c0989dec T132748 (duration: 00m 25s)
  • 15:56 gehel: restarting elasticsearch server elastic2002.codfw.wmnet - activating unicast (T110236)
  • 15:56 logmsgbot: jzerebecki@tin Synchronized static/images/project-logos/dewiki-2x.png: config 0d0d7acf79891ad36d48cd30887bb2299d43f264 T132792 2 of 3 : Add HD versions of logo for dewiki (duration: 00m 25s)
  • 15:55 logmsgbot: jzerebecki@tin Synchronized static/images/project-logos/dewiki-1.5x.png: config 0d0d7acf79891ad36d48cd30887bb2299d43f264 T132792 1 of 3 : Add HD versions of logo for dewiki (duration: 00m 25s)
  • 15:53 godog: test reboot for mw2212 - T129196
  • 15:50 chasemp: restart pdns to see if that helps a labs issue w/ tools proxy
  • 15:50 logmsgbot: jzerebecki@tin Synchronized wmf-config/InitialiseSettings.php: config 2497480f39ea2dad27bab554b7d6e3a996560f46 T131685 : Disable MoodBar on testwiki (duration: 00m 27s)
  • 15:43 jynus: recovering backuped db1052 data from dbstore1002
  • 15:42 jzerebecki: T132746 another run needed @tin:/srv/mediawiki-staging (master %=)$ mwscript namespaceDupes.php foundationwiki --move-talk --merge --fix >try2-out-fix2.txt
  • 15:40 jzerebecki: T132746 @tin:/srv/mediawiki-staging (master %=)$ mwscript namespaceDupes.php foundationwiki --move-talk --merge --fix >try2-out-fix.txt
  • 15:35 jzerebecki: err T132746 instead of T132868
  • 15:35 jzerebecki: T132868 db error: Query: UPDATE `page` SET page_namespace = '100',page_title = '2008-09_Budget' WHERE page_id = '21741' Function: NamespaceConflictChecker::movePage Error: 1062 Duplicate entry '100-2008-09_Budget' for key 'name_title' (10.64.0.205)
  • 15:31 jzerebecki: T132868 @tin:/srv/mediawiki-staging (master %=)$ mwscript namespaceDupes.php foundationwiki --move-talk --fix >out-fix.txt
  • 15:31 jzerebecki: T132868 3508 links to fix, 3508 were resolvable. Looks good!
  • 15:28 jzerebecki: T132746 @tin:/srv/mediawiki-staging (master=)$ mwscript namespaceDupes.php foundationwiki --move-talk
  • 15:26 logmsgbot: jzerebecki@tin Synchronized wmf-config/InitialiseSettings.php: config 70ee0fd97c26da02ed8ea511b2e21b47f374101b T132746 : Add Resolution: namespace to foundationwiki (duration: 00m 31s)
  • 15:22 jzerebecki: T131605 @tin:/srv/mediawiki-staging (master=)$ echo 'https://en.wikipedia.org/static/images/project-logos/cswiki.png' | mwscript purgeList.php
  • 15:22 logmsgbot: jzerebecki@tin Synchronized wmf-config/InitialiseSettings.php: config 5c153e934a88f7b5e16a2d4b244ed76a9dfc1cdc T132868 : Enable RC patrol on ta.wikiquote (duration: 00m 28s)
  • 15:20 jzerebecki: T131605 @tin:/srv/mediawiki-staging (master=)$ echo 'https://cs.wikipedia.org/static/images/project-logos/cswiki.png' | mwscript purgeList.php
  • 15:14 logmsgbot: jzerebecki@tin Synchronized wmf-config/InitialiseSettings.php: config 6c5214dcb3f148dab12fe4aa6ae0921cff05038a T131605 2 of 2 : Revert "350K articles celebration logo on cs.wikipedia" (duration: 00m 30s)
  • 15:12 logmsgbot: jzerebecki@tin Synchronized static/images/project-logos/cswiki.png: config 6c5214dcb3f148dab12fe4aa6ae0921cff05038a T131605 1 of 2 : Revert "350K articles celebration logo on cs.wikipedia" (duration: 00m 26s)
  • 15:07 logmsgbot: jzerebecki@tin Synchronized php-1.27.0-wmf.21/includes/specials/SpecialRunJobs.php: wmf21 e411ad62516bfae31f8d482a22d378912fe5979f T89169 : SpecialRunJobs: delegate error handling to MWExceptionHandler (duration: 00m 37s)
  • 14:47 gehel: restarting elasticsearch server elastic2001.codfw.wmnet - activating unicast (T110236)
  • 14:46 gehel: restarting elasticsearch server elastic1001.codfw.wmnet - activating unicast (T110236)
  • 14:33 jzerebecki: that file is the same diff as the HEAD commit @tin:/srv/mediawiki-staging/php-1.27.0-wmf.21/extensions/CentralAuth ((398f6d4...) %)$ rm test.patch
  • 13:55 gehel: lowering disk high watermark on elasticsearch eqiad to rebalance the cluster
  • 13:07 jynus: restarting and reimaging db1052 (old enwiki master)
  • 12:08 mobrovac: restbase re-deployed 7f69f86ee9 to restbase1015 after reimaging
  • 10:54 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Depool db2068 for cloning to db2047 (duration: 01m 51s)
  • 10:11 jynus: stopping db1052 for backup to dbstore1002
  • 09:38 jynus: deleting imported logs on pc1004 and pc1005
  • 08:23 moritzm: restarted HHVM on mw1116 (output of hhvm-dump-debug available)
  • 08:19 jynus: dropping old imported logs from pc1006

2016-04-24

  • 20:45 Reedy: ran namespaceDupes.php on aswikisource
  • 20:04 ori: Deployed change Ib7e248ccf to statsv (commit id 5323cece2b3; task T132770)

2016-04-23

  • 19:09 _joe_: rebooted meitnerium
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 10m 27s)
  • 00:07 Jamesofur: deactivate phabricator account for ktr101 per global ban

2016-04-22

  • 23:33 mutante: powercycled mw1141
  • 23:26 mutante: reboot bast4001
  • 21:21 mutante: short gaps in codfw ganglia expected due to install2001 being an aggregator
  • 21:10 mutante: install2001 - reinstalled, re-adding to puppet etc
  • 20:36 mutante: reboot install2001 to PXE for OS upgrade\
  • 19:09 Krinkle: mwscript deleteEqualMessages.php --wiki thwikibooks
  • 15:54 bblack: testing https redir on ganglia.wm.o (uranium)
  • 15:47 moritzm: uploaded gerrit 2.12.2 for jessie-wikimedia to carbon
  • 15:27 godog: add new librenms template bound to 'port utilization over threshold' alert
  • 15:24 moritzm: uploaded gerrit 2.12.2 to carbon
  • 15:22 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Depool es2019 (duration: 00m 52s)
  • 14:14 mobrovac: restbase initial deploy of 7f69f86ee9 to restbase1015
  • 13:57 mobrovac: restbase bringing the service back up on restbase1007
  • 13:43 godog: bootstrap restbase1015-a T128107
  • 13:34 mobrovac: restbase stopping restbase on rb1007 to manually inspect why is it flapping
  • 13:27 akosiaris: upload 5.3.10-1ubuntu3.22+wmf1 on apt.wikimedia.org
  • 13:03 godog: depool restbase1007, 400s from restbase self check
  • 09:46 moritzm: installing PHP security updates
  • 09:25 moritzm: installing ircbalance bugfix updates (preventing massive logspam on some systems)
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Apr 22 02:31:45 UTC 2016 (duration 9m 15s)
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 10m 12s)

2016-04-21

  • 20:34 gehel: lowering elasticsearch high watermark to rebalance disk space across cluster
  • 20:07 gehel: reindex elasticsearch updates for the duration of the switch back from codfw, just in case
  • 18:57 mutante: rutherfordium - on 4.4.0-1-amd64 now (T131928)
  • 18:56 mutante: people.wm.org / rutherfordium , very short downtime, reboot
  • 18:43 mutante: rutherfordium (people.wm.org) - installing package upgrades
  • 18:29 mutante: bromine - on 4.4.0-1-amd64 now (T131928)
  • 18:25 mutante: bromine (annualreport,bz-static,transparency,releases) reboot for kernel upgrade
  • 18:15 mutante: bromine - apt-get dist-upgrade
  • 18:12 mutante: planet1001 - on 4.4.0-1-amd64 now (T131928)
  • 18:10 mutante: planet1001 - reboot for upgrade
  • 17:44 mutante: planet1001 - apt-get dist-upgrade (libc6, apache, ..)
  • 16:10 logmsgbot: marktraceur@tin Synchronized php-1.27.0-wmf.21/resources/src/mediawiki/api/upload.js: Unbreak finishing stash uploads in upload dialog (duration: 00m 27s)
  • 16:10 jynus: disabling event scheduler on db1041
  • 15:34 _joe_: services switchover done
  • 15:30 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Adust weights for shard s7 - T133205 (duration: 00m 32s)
  • 15:28 _joe_: [switchback services] moving traffic for restbase/citoid/cxserver back to eqiad
  • 15:23 ostriches: ytterbium: puppet re-enabled for gerrit host
  • 15:23 logmsgbot: filippo@tin Synchronized wmf-config/filebackend-production.php: [switchover swift #1] async writes to codfw (duration: 00m 28s)
  • 15:15 godog: [switchover swift #2] upload backends from codfw to eqiad
  • 15:09 godog: [switchover swift #3] upload codfw to eqiad
  • 15:05 godog: [switchover swift #4] repool upload eqiad in dns
  • 15:04 gehel: starting reindex of lost elasticsearch updates during activation of SSL (T132762)
  • 15:01 godog: [switchover swift #5] upload esams to eqiad
  • 14:57 godog: [switchover swift #6] upload eqiad to 'direct'
  • 14:48 logmsgbot: gehel@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 29s)
  • 14:45 gehel: reverting elasticsearch traffic back to eqiad
  • 14:39 jynus: enabling replication lag alerts for all dbs
  • 14:28 paravoid: traffic/mediawiki codfw->eqiad switchover is done
  • 14:26 jynus: deployed dns updated records for new eqiad masters
  • 14:25 _joe_: [switchover #8/#3] Running rebuildEntityPerPage.php
  • 14:25 _joe_: [switchover #8/#2] Enabling crons in eqiad
  • 14:24 ori: [switchover #8/#1] Starting the jobqueue in eqiad
  • 14:23 volans: [switchover #8/#3] Re-enable puppet on all eqiad and codfw databases masters
  • 14:21 paravoid: wikis are read-write again
  • 14:21 logmsgbot: ori@tin Synchronized wmf-config/db-eqiad.php: [switchover #7/#1] Iac92c8bc6b: Put eqiad in read-write mode for datacenter switchover to eqiad (duration: 00m 39s)
  • 14:20 volans: [switchover #6/#1] Set database masters RW in eqiad for s1-7, es2-3 and x1
  • 14:19 bblack: [switchover #5/#6] Switch Varnish MW backends to eqiad - DONE (confirmed)
  • 14:18 Krinkle: apache-fast-test warmup finished
  • 14:18 godog: [switchover #5/#7] finished swift-proxy roll restart
  • 14:15 bblack: [switchover #5/#6] Switch Varnish MW backends to eqiad - starting
  • 14:14 mobrovac: [switchover #5/2] restbase Yes Done
  • 14:11 subbu: parsoid deploy and restarts done
  • 14:10 godog: [switchover #5/#7] roll-restart swift-proxy in eqiad and codfw
  • 14:09 subbu: restarting parsoid on all nodes
  • 14:09 akosiaris: [switchover #5/#3] Misc services cluster (for the action API endpoint)
  • 14:09 mobrovac: [switchover #5/2] restbase puppet agent -tv && systemctl restart restbase
  • 14:08 Krinkle: krinkle@tin: bin/apache-fast-test wiki-urls-warmup1000.txt eqiad
  • 14:08 subbu: syncing parsoid code
  • 14:07 volans: [switchover #5/#5] Switch parsercache RO/RW
  • 14:07 _joe_: [switchover #5/1] switching redis replication manually
  • 14:07 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: [switchover #4/#2] I0e85c3d20: Switch wmfMasterDatacenter to eqiad (duration: 00m 26s)
  • 14:06 _joe_: [switchover #4/1] puppet merged
  • 14:03 volans: [switchover #3/#1] Set active site's databases (masters) in read-only mode except parsercache ones.
  • 14:03 _joe_: [swichover 3/2 wipe memcacheds]
  • 14:02 paravoid: wikis now in planned read-only mode, cf. http://blog.wikimedia.org/2016/04/18/wikimedia-server-switch/
  • 14:02 logmsgbot: ori@tin Synchronized wmf-config/db-codfw.php: [switchover #2/#1] Id8b2e7a05: Set codfw databases to read-only mode (duration: 00m 24s)
  • 14:00 jynus: disabled all db lag alerts
  • 13:55 volans: [switchover #1/#6] Switch pt-heartbeat from active site (codfw) to new site (eqiad) masters
  • 13:54 ori: [switchover #1/2] stopping jobrunners in codfw
  • 13:54 _joe_: [switchover #1/3] stopping crons on wasat
  • 13:52 volans: [switchover #1/#5] Set final $master status for databases in advance
  • 13:50 volans: [switchover #1/#4] Disable puppet on all eqiad and codfw databases masters
  • 13:50 paravoid: commencing codfw->eqiad datacenter switchover
  • 13:39 logmsgbot: ori@tin Synchronized wmf-config/InitialiseSettings.php: I2171f6b1: Enable MessageCacheError log channel (duration: 00m 25s)
  • 13:37 bblack: [traffic codfw switch revert #3] - DNS TTL done, bulk of end-user traffic rebalanced, graphs starting to level off at new normals, as done as it gets from our end
  • 13:31 bblack: [traffic codfw switch revert #4] - done & confirmed
  • 13:28 bblack: [traffic codfw switch revert #4] - merge -> start salted puppet
  • 13:27 bblack: [traffic codfw switch revert #2] - done & confirmed
  • 13:25 bblack: [traffic codfw switch revert #3] - merge -> authdns-update
  • 13:24 bblack: [traffic codfw switch revert #2] - merge -> start salted puppet
  • 13:23 bblack: [traffic codfw switch revert #1] - done & confirmed
  • 13:23 bblack: [traffic codfw switch revert #1] - merge -> start salted puppet (@13:20, late log)
  • 13:21 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/includes: Ie9799f5ea: Make MessageCache handle lock timeouts better (duration: 01m 18s)
  • 13:12 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Temporarely increase es1* master weight to add connection capacity (duration: 00m 37s)
  • 09:57 elukey: removed apache2 logrotate config manually from argon as temp patch to remove cronspam from root@ (T132896)
  • 08:36 jynus: restarting db1031 to apply new mysql config
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Apr 21 02:31:04 UTC 2016 (duration 8m 37s)
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 09m 48s)
  • 01:49 mutante: git pull on strontium, ops/puppet
  • 01:48 mutante: belated log: restarted slapd on seaborgium
  • 01:29 ori: installed python-progressbar on terbium for warmup script, will be puppetized later

2016-04-20

  • 22:18 mutante: creating ganeti VM install1001 on eqiad cluster
  • 19:03 AaronSchulz: Cleared out 'enqueue' job queues to see if corruption comes back
  • 18:17 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Promote db1031 as the new x1 eqiad local master (duration: 00m 28s)
  • 18:16 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/extensions/Translate/messagegroups/WikiPageMessageGroup.php: I331bd93b: Avoid more master queries on page views (duration: 00m 31s)
  • 18:16 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/includes/jobqueue/JobQueueGroup.php: Ie9799f5ea: Catch errors in pushLazyJobs() and log them (duration: 00m 36s)
  • 18:00 jynus: changing database topology to set db1031 as the master of x1 on eqiad
  • 17:58 volans: Upgrading db1065 and fixing overheathing problems T132515
  • 17:30 volans: Upgrading db1070 and fixing overheathing problems T132515
  • 17:19 logmsgbot: aaron@tin Synchronized php-1.27.0-wmf.21/includes/jobqueue/JobQueueRedis.php: 86d185a4bbf52d (duration: 00m 39s)
  • 17:15 volans: Upgrading db1071 and fixing overheathing problems T132515
  • 17:03 akosiaris: aptitude purge php5-xhprof on uranium
  • 16:54 elukey: replaced "#" with ";" manually in uranium's /etc/php5/cli/conf.d/20-xhprof.ini and /etc/php5/apache2/php.ini to avoid cronspam (didn't find puppet/package trails)
  • 15:43 ebernhardson: delete apifeatureusage-2016.01.20 from codfw elasticsearch cluster. Index should never have existed in this cluster (and is beyond retention).
  • 15:42 ebernhardson: delete apifeatureusage-2016-01-(02,09,10) from eqiad elasticsearch cluster. We only keep 30 days of apifeatureusage logs
  • 15:37 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Tweak DB weights for better latency, avoiding peaks on QPS (duration: 00m 32s)
  • 15:18 ottomata: enabling puppet on analytics1015
  • 15:17 andrewbogott: re-imaging labtestvirt2001 and labtestneutron2001
  • 14:56 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Change eqiad masters for s1,s3-s7 - T105135 (duration: 00m 28s)
  • 14:55 ottomata: started puppet on analytics1003
  • 14:52 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Repool es2019 (duration: 00m 38s)
  • 14:37 ottomata: stopping puppet on analytics1015 and analytics1003 in prep for migration
  • 13:54 elukey: puppet disabled on analytics1027 to stop Camus
  • 13:50 _joe_: rolling restart of ocg servers
  • 13:21 moritzm: rebooting rdb1002,rdb1003,rdb1004,rdb1006,rdb1007,rdb1008 for upgrade to Linux 4.4
  • 13:17 jynus: [switchover-maintenance] Changing DB slave topology for shard s1 on eqiad T111654
  • 12:59 volans: [switchover-maintenance] Changing DB slave topology for shard s4 on eqiad T111654
  • 12:54 volans: [switchover-maintenance] Changing DB slave topology for shard s5 on eqiad T111654
  • 12:48 jynus: [switchover-maintenance] Changing DB slave topology for shard s3 on eqiad T111654
  • 12:37 volans: [switchover-maintenance] Changing DB slave topology for shard s6 on eqiad T111654
  • 12:17 volans: [switchover-maintenance] Changing DB slave topology for shard s7 on eqiad T111654
  • 10:51 godog: pool restbase1014
  • 10:42 volans: [switchover-maintenance] Restarting db1028 (s7)
  • 10:33 jynus: [switchover-maintenance] Restarting db1018
  • 10:31 volans: [switchover-maintenance] Upgrading TLS for shard s7 on eqiad databases
  • 09:13 jynus: backfilling recentchanges on enwiki API servers
  • 09:12 godog: stop compactions on restbase1014-[ab]
  • 08:54 elukey: deployed the new puppet compiler - version 0.1.4 (hosts sorted in the HTML output, minor change)
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Apr 20 02:31:20 UTC 2016 (duration 8m 46s)
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 09m 37s)

2016-04-19

  • 22:42 ori: killing rc insert query on db1065 and db1066
  • 21:27 ori: running rebuildrecentchanges.php --from=20160419144741 --to=20160419151018 on all wikis
  • 21:12 paravoid: clearing the exim4 retry database on mx2001
  • 20:44 ori: on all wikis, deleting from recentchanges where rc_timestamp > 20160419144741 and rc_timestamp < 20160419151018
  • 20:09 ori: ran `mwscript rebuildrecentchanges.php --wiki=testwiki --from=20160419144741 --to=20160419151018`
  • 19:53 paravoid: staggered varnish bans for 'obj.http.server ~ "^mw2.+"' as a workaround for T133069
  • 19:51 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/maintenance/rebuildrecentchanges.php: Ie9799f5ea: rebuildrecentchanges: Allow rebuilding specified time range only (duration: 00m 28s)
  • 19:43 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Revert "Depool one db server from each shard as a backup" (duration: 00m 27s)
  • 19:12 AaronSchulz: Cleared enwiki 'enqueue' queue (T133089)
  • 19:06 legoktm: purging sidebar cache across all wikis (T133069)
  • 18:27 mutante: kraz.codfw, reinstalling as kraz.wikimedia
  • 18:00 _joe_: running rebuildEntityPerPage.php on wikidata, T133048
  • 17:49 jynus: setting binlog_format=ROW on old x1-master at eqiad (db1029) to reenable replication
  • 17:25 logmsgbot: demon@tin Synchronized php-1.27.0-wmf.21/extensions/CentralAuth: forgot something (duration: 00m 42s)
  • 17:17 volans: Deleting pc1003* and pc1006* binlog from pc2006 to make some space
  • 17:12 volans: Deleting pc1005* binlog from pc2005 to make some space
  • 17:08 volans: Deleting pc1002* old binlog from pc2005 to make some space
  • 17:01 ostriches: ytterbium: stopped puppet for a bit, testing host key mess.
  • 16:55 ostriches: restarting gerrit to pick up furud's rsa key
  • 15:49 bblack: [traffic codfw switch #4] - puppet change complete - done
  • 15:49 bblack: [traffic codfw switch #2] - confirmed bulk of traffic moved after ~10min for DNS TTL, rates levelling out on eqiad+codfw front network stats
  • 15:46 bblack: [traffic codfw switch #4] - salting puppet change
  • 15:45 bblack: [traffic codfw switch #4] - puppet merging eqiad text -> codfw
  • 15:41 bblack: [traffic codfw switch #3] - puppet change complete - done
  • 15:39 bblack: [traffic codfw switch #3] - salting puppet change
  • 15:38 bblack: [traffic codfw switch #3] - puppet merging esams text -> codfw
  • 15:35 bblack: [traffic codfw switch #2] - authdns-update complete, user traffic to eqiad frontends should start dropping off now
  • 15:34 bblack: [traffic codfw switch #1] - puppet change complete - done
  • 15:31 bblack: [traffic codfw switch #1] - salting puppet change
  • 15:30 bblack: [traffic codfw switch #1] - puppet merging text caches -> direct
  • 15:25 logmsgbot: ori@tin Synchronized wmf-config/ProductionServices.php: Iee2e08df5: Fix codfw redis hostnames [no-op, already synced as live hack] (duration: 00m 36s)
  • 15:11 andrewbogott: testing the log by logging a test
  • 15:10 logmsgbot: ori@tin Synchronized wmf-config/ProductionServices.php: live-hack fix for rdb2*.eqiad (duration: 00m 34s)
  • 15:02 _joe_: [switchover #13] starting maintenace jobs
  • 15:00 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: db weight teaking to better process the load (duration: 00m 28s)
  • 15:00 jynus: applying database weight changes
  • 14:48 paravoid: sites are read-write again
  • 14:48 logmsgbot: ori@tin Synchronized wmf-config/db-codfw.php: [switchover #12] I5e9635b8f4: Set codfw databases in read-write (duration: 00m 35s)
  • 14:43 jynus: [swithchover #11-2] Set and confirmed codfw master dbs in read-write
  • 14:37 godog: [switchover #10] puppet and swift reload finished
  • 14:34 jynus: [switchover #11] applying $master change for codfw masters
  • 14:32 godog: [switchover #10] running puppet on ms-fe and reload swift
  • 14:25 mobrovac: [switchover #7] restbase now uses MW from codfw
  • 14:25 bblack: [switchover #9] varnish - puppet runs complete - done
  • 14:20 bblack: [switchover #9] varnish - change merged, puppet runs starting
  • 14:19 _joe_: [switchover #7] memcached redises are now masters in codfw, running puppet on eqiad to start replicating
  • 14:19 akosiaris: [switchover #8] restarting parsoid on all wtp nodes
  • 14:19 subbu: manually restarted parsoid on wtp1001 and confirmed html identical before/after switchover on enwiki:Hospet
  • 14:16 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: Idbfb0184d: Switch wmfMasterDatacenter to codfw (duration: 00m 30s)
  • 14:16 akosiaris: [switchover #7] puppet agent -t -v on SCA, SCB cluster
  • 14:15 mobrovac: [switchover #7] puppet agent -tv && restbase restart
  • 14:15 ori: [switchover #7] Switch wmfMasterDatacenter to codfw (https://gerrit.wikimedia.org/r/#/c/282897/)
  • 14:14 _joe_: [switchover #7] running puppet on mc* hosts in codfw
  • 14:14 paravoid: [switchover #7] setging mediawiki master datacenter to codfw in puppet
  • 14:10 _joe_: [switchover #6] wiped memcached
  • 14:10 jynus: [switchover #5] DB Masters on eqiad set as read-only, and confirmed it
  • 14:09 _joe_: [switchover #6] disabled puppet on all redis hosts as a safety measure before inverting replication after the puppet change
  • 14:03 paravoid: sites in planned readonly-mode, cf. http://blog.wikimedia.org/2016/04/18/wikimedia-server-switch/
  • 14:02 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Set mediawiki-eqiad in read-only mode for datacenter switchover to codfw (duration: 00m 35s)
  • 14:01 jynus: [switchover #4] Set mediawiki-eqiad in read-only mode for datacenter switchover to codfw
  • 13:56 _joe_: [switchover #3] disabling cronjobs on terbium
  • 13:55 ori: [switchover #1]: disabling eqiad jobrunners via "salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'service jobrunner stop; service jobchron stop;'".
  • 13:47 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool one db per shard as a backup (duration: 00m 27s)
  • 11:33 volans: changing binlog_format to STATEMENT for codfw masters for shards s1-s7 T124699
  • 10:07 jynus: updated dns entries about mysql masters
  • 09:16 godog: shutdown restbase100[56]
  • 09:11 godog: stop cassandra and restbase on restbase1006
  • 09:06 godog: stop compactions on restbase1014
  • 03:13 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool pc1006 (duration: 00m 28s)
  • 03:12 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Repool pc2006 (duration: 00m 31s)
  • 02:54 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/includes/api/ApiStashEdit.php: Ie9799f5ea: Segment stash edit cache stats by basis for hit/miss (duration: 00m 39s)
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Apr 19 02:31:46 UTC 2016 (duration 9m 46s)
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 09m 47s)
  • 02:06 urandom: systemctl mask cassandra-b on restbase2004.codfw.wmnet (it should not be running)
  • 01:34 mutante: restbase2004 - starting crashed cassandra-b service
  • 01:11 mutante: restbase2004 - unit cassandra-b is failed
  • 00:42 mutante: kraz - signing puppet certs, adding salt keys
  • 00:29 mutante: kraz.codfw.wmnet - initial install, adding to site

2016-04-18

  • 23:33 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: I1547834: Remove unused ['bits'] configuration (duration: 00m 27s)
  • 23:22 YuviPanda: running sync-common on mw1145 to catch up on deploys
  • 23:17 YuviPanda: rebooted mw1145 from mgmt
  • 23:14 logmsgbot: ori@tin Synchronized wmf-config: I0ec3c015f: Update parser cache configuration for tag-based hashing (duration: 01m 48s)
  • 23:08 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Depool pc2006 (duration: 01m 01s)
  • 23:06 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/includes/objectcache/SqlBagOStuff.php: Ie9799f5ea: Allow tag names for SqlBagOStuff consistent hashing (duration: 04m 08s)
  • 22:52 logmsgbot: krinkle@tin Synchronized docroot/noc/conf: noc: remote bits references (duration: 00m 32s)
  • 22:45 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Repool pc2005 (duration: 00m 33s)
  • 22:44 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool pc1005 (duration: 00m 26s)
  • 22:20 chasemp: resetting labcontrol1001 puppet master with auth.conf which is fixing all the puppet clients in Labs
  • 21:33 gehel: deploying latest wdqs
  • 21:30 papaul: conf200[1-3] - signing puppet certs, salt-key, initial run
  • 20:58 papaul: OS install on conf200[1-3]
  • 20:55 mutante: furud.codfw - initial puppet install, signing certs
  • 18:43 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool pc1005 (duration: 00m 28s)
  • 18:42 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Depool pc2005 (duration: 00m 26s)
  • 18:37 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Repool pc2004 (duration: 00m 25s)
  • 18:27 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool pc1004 (duration: 00m 36s)
  • 17:31 elukey: restarted kafkatee on oxygen after module upgrade (was stuck from this morning due to a kafka broker restart)
  • 14:21 ottomata: upgraded pykafka on hafnium and restarted statsv
  • 14:04 andrewbogott: puppet agent -tv on labvirt1002 which restarted nrpe and resolved icinga alerts
  • 14:02 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool pc1004 (duration: 00m 29s)
  • 14:01 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Depool pc2004 (duration: 00m 27s)
  • 11:30 mobrovac: restbase restarting eqiad after https://gerrit.wikimedia.org/r/#/c/283958/
  • 10:07 _joe_: traffic from eqiad caches switched to codfw for restbase,citoid,cxserver
  • 09:43 moritzm: installing openssh updates on jessie systems
  • 09:20 _joe_: upgrading HHVM on terbium
  • 09:19 godog: depool restbase100[56].eqiad.wmnet, about to get decomissioned
  • 09:17 godog: repool restbase2006 after raid expansion
  • 09:08 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Update master comments in preparation for dc failover (duration: 00m 26s)
  • 09:07 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Update master comments in preparation for dc failover (duration: 01m 41s)
  • 07:02 jynus: changing binlog_format to MIXED on db2018

2016-04-17

  • 21:51 urandom: Decommissioning restbase1005.eqiad.wmnet : T95253
  • 21:50 urandom: `systemctl mask cassandra' on restbase1006.eqiad.wmnet (node is decommissioned) : T95253
  • 00:10 urandom: Decommissioning restbase1006.eqiad.wmnet : T95253

2016-04-16

  • 14:25 andrewbogott: rebooting mw1132 — OOM
  • 14:24 andrewbogott: rebooting mw1134 — OOM
  • 10:34 jynus: recreating _counters table on s3 to solve replication issues
  • 10:08 jynus: recreating hitcounter table on s3 to solve replication issues
  • 09:31 volans: Skipped on dbstore1002 query from replica: 'DELETE FROM `bawiktionary`.`hitcounter`'
  • 09:31 jynus: importing sys schema to db1075
  • 06:44 akosiaris: restart fermium to apply extra vcpus assignment
  • 06:23 volans: Stopped and restarted replica on dbstore2002 for s3 to "unstuck" the replica
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Apr 16 02:31:33 UTC 2016 (duration 9m 11s)
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 09m 03s)
  • 02:16 urandom: Bootstraping restbase1009-a.eqiad.wmnet : T95253

2016-04-15

  • 22:28 mutante: rebooting bast4001 - debugging install issue
  • 21:04 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Repool new db1075,1077,1078 after TLS upgrade on s3 - T111654 (duration: 00m 36s)
  • 16:08 logmsgbot: demon@tin Synchronized wmf-config/CommonSettings.php: more liberal initialisesettings invalidation (duration: 00m 42s)
  • 15:34 godog: remove cassandra metrics for restbase100[1234]* restbase100[789] restbase2004 - T132771
  • 14:40 gehel: reindexing all wikis after the switch to codfw (T132762)
  • 14:06 urandom: start decommission of restbase1009-a.eqiad.wmnet : T95253
  • 12:15 gehel: increase max request size for elasticsearch
  • 12:14 volans: Re-arrange s3 replica topology: making codfw replicate from db1075 (this time for real) - T111654
  • 12:08 bblack: experimenting on carbon HTTP config (apt/mirrors/ubuntu.wm.o) - watch out for installer / package-update issues!
  • 12:00 gehel: remove maintenance from wdqs1002
  • 10:51 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Depool new db1075,1077,1078 to upgrade TLS on s3 - T111654 (duration: 00m 41s)
  • 10:37 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Repool db1042 after TLS upgrade on s4 - T111654 (duration: 00m 30s)
  • 10:21 _joe_: restarted zuul, zuul-merger on gallium
  • 10:10 Tim: on helium: scheduled restore of home_pmtpa to bast4001
  • 09:23 volans: Re-arrange s3 replica topology: making codfw replicate from db1075 - T111654
  • 09:01 gehel: reenabling wdqs1002 in varnish rotation after reinstall (T132387)
  • 08:09 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Depool db1042 to upgrade TLS on s4 - T111654 (duration: 00m 36s)
  • 08:05 volans: starting TLS upgrade for shard s4 T111654
  • 08:00 gehel: removing empty log archives from Fluorine (T132324)
  • 07:40 moritzm: rebooting oresrdb* to Linux 4.4
  • 07:17 moritzm: rebooting oxygen for kernel update to 4.4
  • 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Apr 15 02:30:21 UTC 2016 (duration 8m 55s)
  • 02:21 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 08m 28s)

2016-04-14

  • 23:13 logmsgbot: demon@tin Synchronized portals: (no message) (duration: 00m 29s)
  • 23:12 logmsgbot: demon@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 29s)
  • 23:11 logmsgbot: demon@tin Synchronized php-1.27.0-wmf.21/includes/resourceloader/ResourceLoaderSpecialCharacterDataModule.php: I3e26d08a (duration: 00m 30s)
  • 23:08 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: Add images.unsplash.com to $wgCopyUploadsDomains (duration: 00m 33s)
  • 23:06 logmsgbot: demon@tin Synchronized portals/: updating to master, removing top-links A/B test (duration: 00m 45s)
  • 21:57 logmsgbot: bd808@tin Synchronized php-1.27.0-wmf.21/extensions/ContentTranslation: Enable europeana2802016 campaign (T125626) (duration: 00m 34s)
  • 21:44 logmsgbot: csteipp@tin Synchronized wmf-config/CommonSettings-labs.php: Syncing labs config change (duration: 00m 27s)
  • 21:43 logmsgbot: csteipp@tin Synchronized wmf-config/InitialiseSettings-labs.php: Syncing labs config change (duration: 00m 34s)
  • 20:31 ostriches: gerrit restarting
  • 19:35 godog: force puppet run on cache_upload in eqiad
  • 19:26 godog: force puppet run for cache_upload in esams
  • 19:11 godog: depool upload/eqiad
  • 19:01 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.21
  • 18:58 godog: force puppet run on cache_upload in codfw
  • 18:51 godog: forcing puppet run on cache_upload in eqiad and codfw
  • 18:36 logmsgbot: demon@tin Finished scap: sync some symlink removals (duration: 04m 54s)
  • 18:31 logmsgbot: demon@tin Started scap: sync some symlink removals
  • 18:23 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: ldap logging tune (duration: 00m 34s)
  • 18:05 cwd: updated crm from 5877ee71abc21140e60934aef627003cfb2d11cc to 2f40e829195dec1cec71d08dc9c656eb247631ae
  • 17:14 godog: set zone access for all - private wikis in codfw
  • 16:49 godog: run on tin: mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php commonswiki --backend=local-multiwrite
  • 16:22 godog: rollback varnish backends to eqiad for thumbs
  • 16:13 godog: route upload backends to codfw - T129089
  • 16:08 logmsgbot: filippo@tin Synchronized wmf-config/filebackend-production.php: swift codfw sync replication T129089 (duration: 00m 26s)
  • 16:00 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable GuidedTour extension on sq.wikipedia (T132412) (duration: 00m 26s)
  • 15:54 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Babel configuration for uz.wikipedia (part 1/2) (T131924) (duration: 00m 26s)
  • 15:51 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable Kartographer on pl.wikimedia (T132510) (duration: 00m 26s)
  • 15:48 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set wgKartographerWikivoyageMode (duration: 00m 26s)
  • 15:42 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: HD logo for lad.wikipedia (T132120, 3/3) (duration: 00m 26s)
  • 15:42 logmsgbot: dereckson@tin Synchronized static/images/project-logos/ladwiki-2x.png: HD logo for lad.wikipedia (T132120, 2/3) (duration: 00m 25s)
  • 15:41 logmsgbot: dereckson@tin Synchronized static/images/project-logos/ladwiki-1.5x.png: HD logo for lad.wikipedia (T132120, 1/3) (duration: 00m 25s)
  • 15:38 logmsgbot: dereckson@tin Synchronized static/images/project-logos/vecwikisource.png: New logo for vec.wikisource (T132157) (duration: 00m 26s)
  • 15:37 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add task identifier comment to vec.wikisource logo change (no-op, T132157) (duration: 00m 27s)
  • 15:33 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add museumcommons.wikimedia.nl to wgCopyUploadsDomains (T131841) (duration: 00m 26s)
  • 15:29 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add bio.acousti.ca to wgCopyUploadsDomains (T132140) (duration: 00m 27s)
  • 15:27 Dereckson: Synchronized wmf-config/InitialiseSettings.php: Change voteWiki to fa language temporarily (T132667) (duration: 00m 34s)
  • 15:14 moritzm: uploaded librsvg 2.40.5-1+deb8u1+wmf1 to carbon (T132584)
  • 14:24 ema: rebooting cp3022 to test workaroud for T131961
  • 14:20 paravoid: cr1-esams: enabling IPv4/IPv6 BGP sessions with TeliaSonera
  • 12:41 Dereckson: Ran initSiteStats.php for 10 wikis to fix negative/off-by-one statistics errors (T131306)
  • 11:53 elukey: deployed new puppet-compiler version - 0.1.3 (adding submodules support)
  • 09:00 moritzm: fixed stray salt minion processes on labsdb1003 (apparently caused by stale pidfile)
  • 03:43 Krinkle: mwscript deleteEqualMessages.php --wiki kawikiquote
  • 03:02 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Apr 14 03:02:03 UTC 2016 (duration 9m 25s)
  • 02:52 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 12m 26s)
  • 02:31 bd808: backfilled missing SAL entries from 2016-04-13T10:56Z to 2016-04-13T20:20Z to https://tools.wmflabs.org/sal/production
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 11m 13s)
  • 01:29 mutante: mw1211 - restart hhvm
  • 01:26 mutante: rebooting unresponsive serpens.wm.org
  • 01:06 mutante: bast4001 back up unchanged - tftp wouldnt work across DC, probably network ACLs
  • 00:43 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.20/includes/libs/IEUrlExtension.php: Ie9799f5ea: Revert Hack IEUrlExtension::haveUndecodedRequestUri() to always return true (duration: 00m 30s)
  • 00:42 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/includes/libs/IEUrlExtension.php: Ie9799f5ea: Revert Hack IEUrlExtension::haveUndecodedRequestUri() to always return true (duration: 00m 32s)
  • 00:38 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: I08cfeca7c: Force ['SERVER_SOFTWARE'] to be"Apache" (duration: 00m 26s)
  • 00:28 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.20/extensions/TextExtracts/: https://gerrit.wikimedia.org/r/#/c/282640/ (duration: 00m 26s)
  • 00:27 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.21/extensions/TextExtracts/: https://gerrit.wikimedia.org/r/#/c/282640/ (duration: 00m 26s)
  • 00:23 logmsgbot: dereckson@tin Synchronized wmf-config/abusefilter.php: Enable $wgAbuseFilterProfile for commonswiki (Gerrit:282806, Task T132200) (duration: 00m 28s)
  • 00:13 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.21/extensions/AbuseFilter/AbuseFilter.class.php: Fixes to filter profiling (Gerrit:283333, 3/3) (duration: 00m 26s)
  • 00:12 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.21/extensions/AbuseFilter/Views/AbuseFilterViewList.php: Fixes to filter profiling (Gerrit:283333, 1/3) (duration: 00m 26s)
  • 00:11 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.21/extensions/AbuseFilter/Views/AbuseFilterViewEdit.php: Fixes to filter profiling (Gerrit:283333, 1/3) (duration: 00m 26s)
  • 00:07 mutante: rebooting bast4001 to PXE, no active users, no screens
  • 00:02 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Import sources on hi.wiktionary (Task T132417, Gerrit:282843) (duration: 00m 26s)

2016-04-13

  • 23:56 mutante: ssh alsafi
  • 23:46 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable Echo survey on French-language wikis (Task T131893, Gerrit:283330) (duration: 00m 26s)
  • 23:38 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.20/extensions/Echo/modules/ooui/styles/mw.echo.ui.FooterNoticeWidget.less: Fix for Echo footer (Gerrit:282715) (duration: 00m 27s)
  • 23:35 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.20/extensions/Echo/modules/ooui/mw.echo.ui.FooterNoticeWidget.js: Fixes for Echo (Gerrit:282714 + Gerrit:282715) (duration: 00m 26s)
  • 23:25 Dereckson: Ran mwscript updateArticleCount.php --wiki=plwikisource --update
  • 23:23 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Consider all pages as valid content articles on pl.wikisource (Task T131771, Gerrit:283349) (duration: 00m 35s)
  • 23:11 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add signature edit button for the Comments namespace to ru.wikinews (Task T132241, Gerrit:283347) (duration: 00m 38s)
  • 22:23 logmsgbot: hoo@tin Synchronized php-1.27.0-wmf.21/./extensions/Wikidata/extensions/Wikibase/view/resources/jquery/wikibase/jquery.wikibase.statementview.RankSelector.js: touch (duration: 00m 26s)
  • 22:11 logmsgbot: hoo@tin Finished scap: Update Wikibase to master (wmf21) (duration: 32m 06s)
  • 21:39 logmsgbot: hoo@tin Started scap: Update Wikibase to master (wmf21)
  • 21:09 bd808: https://tools.wmflabs.org/sal missing entries since 2016-04-13T09:21. Needs to be backfilled
  • 20:20 bblack: re-pooling ulsfo traffic T128424
  • 19:56 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: (no message)
  • 19:24 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/includes/libs/IEUrlExtension.php: Live-hack IEUrlExtension::haveUndecodedRequestUri() to always return true (duration: 00m 33s)
  • 19:23 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.20/includes/libs/IEUrlExtension.php: Live-hack IEUrlExtension::haveUndecodedRequestUri() to always return true (duration: 00m 33s)
  • 18:48 gehel: activating cache headers for WDQS
  • 18:46 bblack: lvs4002 - enable->run puppet post-reboot
  • 18:45 bblack: lvs4001 - enable->run puppet post-reboot
  • 18:37 gehel: activated maintenance page for wdqs1002 (data load in progress)
  • 18:16 bblack: shutdown lvs400[12]
  • 18:06 mutante: bast1001 back with jessie
  • 18:03 robh: the cp sysetms in ulsfo will be rebooting into maint mode regularly for the next few hours. I'll be scheduling downtime for each host as I get to them, but not echoing every cp host reboot in SAL
  • 18:03 robh: the cp sysetms in ulsfo will be rebooting into maint mode regularly for the next few hours. I'll be scheduling for each host as I get to them, but not echoing every cp host reboot in SAL
  • 18:00 bblack: disable puppet, stop pybal on lvs400[12] (maint shutdown imminent, depooled from DNS since yesterday)
  • 17:46 robh: lvs4003 rebooted and back online, lvs4004 offlining for maint.
  • 17:41 mutante: tungsten stop and remove rsync package and config
  • 17:33 Krinkle: mwscript deleteEqualMessages.php --wiki cywiki (T45917)
  • 17:27 mutante: bast1001 - revoke puppet cert, delete salt key, reinstall with jessie
  • 17:15 mutante: rebooting bast1001 into PXE
  • 17:14 robh: lvs4003 going offline for maint (icinga has been silenced, i think ;)
  • 16:58 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings-labs.php: I5a0abcdc: Make MathML rendering default in labs (duration: 00m 39s)
  • 16:54 volans: completed TLS upgrade for s1 - T111654
  • 16:35 ottomata: rebuilding raid1 array on aqs1001 after hot swapping sdh
  • 16:32 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Repool db1057 after TLS upgrade on s1 - T111654 (duration: 00m 26s)
  • 16:18 andrewbogott: rebooting mw1139 — OOM
  • 16:11 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.20/extensions/CentralNotice: SWAT: Update CentralNotice gerrit:283205 (duration: 00m 30s)
  • 16:10 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.21/extensions/CentralNotice: SWAT: Update CentralNotice gerrit:283206 (duration: 00m 33s)
  • 15:56 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.20/extensions/WikidataPageBanner: SWAT: Attempt at fixing table of contents problem gerrit:282994 (duration: 00m 28s)
  • 15:21 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.21/extensions/WikidataPageBanner: SWAT: Attempt at fixing table of contents problem gerrit:282995 (duration: 00m 29s)
  • 15:10 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix typo in newikibooks namespaces gerrit:283183 (duration: 00m 30s)
  • 14:53 volans: start upgrading TLS for cross-dc replica on shards s1 - T111654
  • 14:53 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Depool db1057 to upgrade TLS on s1 - T111654 (duration: 00m 26s)
  • 14:48 elukey: Yarn nodemanager Xmx size bumped up from 1000m to 2048 on all the analytics* hosts to overcome the OutOfMemory errors.
  • 14:38 ottomata: restarting hadoop-yarn-nodemanager on all hadoop worker nodes one by one to apply increase in heap size
  • 14:35 godog: start cleanup on restbase100[569] - T128107
  • 14:25 volans: completed upgrading TLS for cross-dc replica on shards s2 - T111654
  • 13:52 logmsgbot: jzerebecki@tin Finished scap: php-1.27.0-wmf.21: Update Wikidata to wmf/1.27.0-wmf.21 (duration: 29m 56s)
  • 13:22 logmsgbot: jzerebecki@tin Started scap: php-1.27.0-wmf.21: Update Wikidata to wmf/1.27.0-wmf.21
  • 13:14 volans: start upgrading TLS for cross-dc replica on shards s2 - T111654
  • 11:46 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Reduce db1050 weight - T111654 (duration: 00m 30s)
  • 11:22 volans: completed upgrading TLS for cross-dc replica on shards s6 and s7 - T111654
  • 10:56 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Repool db1050 and db1041 after TLS upgrade - T111654 (duration: 00m 42s)
  • 09:22 volans: start upgrading TLS for cross-dc replica on shards s6 and s7 - T111654
  • 07:26 moritzm: temporarily bumped connection tracking table size on mw1163 to 512k (randomly spiking)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 11m 56s)
  • 00:38 Krinkle: mwscript deleteEqualMessages.php --wiki newiki
  • 00:05 Krinkle: mwscript deleteEqualMessages.php --wiki maiwiki

2016-04-12

  • 23:59 Krinkle: mwscript deleteEqualMessages.php --wiki fawiktionary
  • 23:50 logmsgbot: mattflaschen@tin Synchronized php-1.27.0-wmf.21/resources/lib/oojs-ui/: OOjs UI hotfix for search box styling (duration: 00m 27s)
  • 23:32 Krinkle: mwscript deleteEqualMessages.php --wiki elwiki (T45917)
  • 23:29 mutante: mw1080 depooled because read-only fs
  • 23:26 mutante: depooled mw1080
  • 23:18 matt_flaschen: Syncing failing only on mw1080
  • 23:16 logmsgbot: mattflaschen@tin Synchronized dblists: Make flow dblist explicit, rather than computed, retry (duration: 00m 29s)
  • 23:12 logmsgbot: mattflaschen@tin Synchronized dblists: Make flow dblist explicit, rather than computed (duration: 00m 31s)
  • 22:36 mutante: powercycled mw1163
  • 22:17 mobrovac: mathoid deploying ca7680521
  • 20:38 Krinkle: mwscript deleteEqualMessages.php --wiki glwikiquote
  • 20:38 Krinkle: mwscript deleteEqualMessages.php --wiki glwiki
  • 20:38 Krinkle: mwscript deleteEqualMessages.php --wiki glwikibooks
  • 20:38 Krinkle: mwscript deleteEqualMessages.php --wiki glwikisource
  • 19:33 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/includes/db/loadbalancer/LBFactory.php: I7bc3b3aa: Revert "Measure commitMasterChanges() run time" (duration: 00m 27s)
  • 19:32 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.20/includes/db/loadbalancer/LBFactory.php: I7bc3b3aa: Revert "Measure commitMasterChanges() run time" (duration: 00m 27s)
  • 19:32 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.21/includes/filerepo/file/LocalFile.php: I6457cb91: Don't report image cache hits / misses (duration: 00m 31s)
  • 19:31 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.20/includes/filerepo/file/LocalFile.php: I6457cb91: Don't report image cache hits / misses (duration: 00m 39s)
  • 19:27 awight: update crm from a20ac4c64e195732a27d0e9cfd33f0c23f4a8d4e to 5877ee71abc21140e60934aef627003cfb2d11cc
  • 19:27 mutante: create ganeti VM kraz.codfw.wmnet
  • 19:01 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.21
  • 18:37 logmsgbot: demon@tin Finished scap: testwiki to wmf.21 and rebuild l10n (duration: 15m 09s)
  • 18:22 logmsgbot: demon@tin Started scap: testwiki to wmf.21 and rebuild l10n
  • 18:22 logmsgbot: demon@tin scap aborted: testwiki to wmf.21 and rebuild l10n (duration: 19m 23s)
  • 18:19 mutante: tin: rm php-1.27.0-wmf.12/13 from mw-staging
  • 18:03 logmsgbot: demon@tin Started scap: testwiki to wmf.21 and rebuild l10n
  • 17:57 mutante: ssh alsafi fixes ganeti VM timeouts once again
  • 17:02 gehel: reinstall of wdqs1002
  • 16:29 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-labs.php: No-op sync of beta cluster config change for T132408 (duration: 00m 26s)
  • 16:18 _joe_: reenabling puppet everywhere
  • 15:55 _joe_: disabling puppet on all mw hosts for apache change
  • 15:55 mutante: removed all email.donate. from DNS
  • 15:47 bblack: Draining traffic from ulsfo via GeoDNS updates for T128424 maintenance
  • 15:45 gehel: disabling wdqs1002 on the varnish cache::misc cluster via puppet
  • 15:10 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Make use of the local, not master, parsoid cluster gerrit:282894 (duration: 00m 28s)
  • 15:04 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor Single Edit Tab on the English Wikipedia gerrit:274131 (duration: 00m 55s)
  • 12:08 moritzm: uploaded ircd-ratbox 2.2.9-3 for jessie-wikimedia to carbon
  • 11:37 hashar: Restarting Jenkins
  • 11:34 hashar: Jenkins upgrading "Script Security Plugin" from 1.17 to 1.18.1 https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2016-04-11
  • 11:19 tgr: running CentralAuth/checkLocalUsers.php --delete on all wikis for T119736
  • 11:10 godog: start raid expansion on restbase2006 T127951
  • 11:03 godog: repool restbase2005 / depool restbase2006
  • 10:16 ema: restarted varnishstatsd-default.service on cp4017 (T132430)
  • 09:55 hoo: Updated Wikidata's property suggester with data from Monday's json dump
  • 09:21 ema: varnishstatsd-default restarted on cp4010 and cp4018
  • 08:35 moritzm: installing ntp bugfix updates (will trigger some ntpd Icinga warnings until the clocks have resynced)
  • 08:01 elukey: restarted hhvm on mw1111 - multiple SEGV signals, hhvm-dump-debug in /tmp/hhvm.8424.bt.
  • 08:00 volans: Set rpl_semi_sync_master_timeout=100 on db1038 T131753 (filling up erro log)
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Apr 12 02:33:25 UTC 2016 (duration 8m 34s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 11m 20s)
  • 00:37 ori: Deleted refreshLinksDynamic jobs for commonswiki and enwiki (T132318)
  • 00:34 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.20/includes/page/WikiPage.php: Ie9799f5ea: Flag triggerOpportunisticLinksUpdate() behind $wgMiserMode (duration: 00m 31s)
  • 00:11 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.20/includes/collation/IcuCollation.php: Ie9799f5ea: Cache first-letter data in APC, if available (duration: 00m 25s)
  • 00:07 logmsgbot: dereckson@tin Synchronized php-1.27.0-wmf.20/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTargetLoader.js: Check wpSection before converting textbox contents for use in VE (Gerrit:282827) (duration: 00m 25s)

2016-04-11

  • 23:34 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings-labs.php: Fix wgCopyUploadsDomains on Commons Beta (Task T132285, Gerrit:282495) — 2/2 (duration: 00m 25s)
  • 23:33 logmsgbot: dereckson@tin Synchronized wmf-config/CommonSettings-labs.php: Fix wgCopyUploadsDomains on Commons Beta (Task T132285, Gerrit:282495) — 1/2 (duration: 00m 26s)
  • 23:30 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add mergehistory right to eliminator group on ja.wikipedia (Task T131751, Gerrit:282055) (duration: 00m 29s)
  • 23:25 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set wgSemiprotectedRestrictionLevels for fr.wikipedia (Task T132248, Gerrit:282469) (duration: 00m 29s)
  • 23:20 Krinkle: mwscript deleteEqualMessages.php --wiki idwiki (T45917)
  • 23:19 logmsgbot: dereckson@tin Synchronized wmf-config/CommonSettings.php: Remove $wgApiFrameOptions = SAMEORIGIN override for UploadWizard wikis (Task T131182, Gerrit:282768) (duration: 00m 31s)
  • 23:17 logmsgbot: dereckson@tin Synchronized static/images/project-logos/vecwiki.png: New logo for vec.wikipedia (and end of celebration logo) (Task T132185, Gerrit:282396) (duration: 00m 24s)
  • 23:12 Krinkle: mwscript deleteEqualMessages.php --wiki zh_classicalwiki (T45917)
  • 23:09 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Remove $wgApiFrameOptions = SAMEORIGIN override for UploadWizard wikis (Task T131182, Gerrit:282768) (duration: 00m 27s)
  • 22:11 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Repool db1049 after deploy of Puppet certs for TLS - T111654 (duration: 00m 30s)
  • 22:07 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.20/includes/page/WikiPage.php: Ie9799f5ea: Increase triggerOpportunisticLinksUpdate() backoff TTL (duration: 00m 34s)
  • 22:03 mutante: creating virtual machine furud.codfw.wmnet
  • 21:29 Krinkle: mwscript deleteEqualMessages.php --wiki bgwikiquote (T45917)
  • 21:29 Krinkle: mwscript deleteEqualMessages.php --wiki frwikibooks (T45917)
  • 21:22 ebernhardson: restore elasticsearch cluster high disk watermark to 90%
  • 21:07 logmsgbot: csteipp@tin Synchronized wmf-config/CommonSettings-labs.php: revert oath in labs (duration: 00m 27s)
  • 21:06 logmsgbot: csteipp@tin Synchronized wmf-config/InitialiseSettings-labs.php: reverting oath (duration: 00m 29s)
  • 21:01 bearND: mobileapps deployed 6ef3054
  • 20:55 logmsgbot: csteipp@tin Synchronized wmf-config/CommonSettings-labs.php: retry oath in labs (duration: 00m 29s)
  • 20:53 logmsgbot: csteipp@tin Synchronized wmf-config/InitialiseSettings-labs.php: retry oath in labs (duration: 00m 32s)
  • 20:43 bearND: starting mobileapps deploy
  • 20:41 Krinkle: mwscript deleteEqualMessages.php --wiki plwiki (T45917)
  • 20:19 arlolra: updated Parsoid to version e3766b79
  • 20:09 arlolra: synced code; restarted Parsoid on wtp1001.eqiad as a canary
  • 20:05 ebernhardson: temporarily changing elasticsearch high watermark to 75% to rebalance cluster
  • 20:03 volans: Deploy and use Puppet certs for TLS on cross-dc replica for shard s5 T111654
  • 20:00 mutante: antimony - shred and delete git.wikimedia.org SSL key
  • 19:57 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Depool db1049 to deploy Pupper certs for TLS - T111654 (duration: 00m 28s)
  • 19:55 mutante: antimony - remove Apache package and config
  • 19:53 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Depool db1049 to deploy Pupper certs for TLS - T111654 (duration: 02m 25s)
  • 19:52 mutante: powercycled mw1115
  • 19:41 mutante: antimony (gitblit) - stop Apache
  • 19:15 ejegg: extende globalcollect audit timeout from 90 min to 120 min for duration of NL campaign
  • 19:01 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.20/extensions/WikimediaEvents: I672624e9fc30: Collect impact of proposed ResourceLoader feature-test in statsd (duration: 00m 34s)
  • 18:08 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.20/includes/RevisionList.php: Undo oversight live hack (duration: 00m 25s)
  • 17:49 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.20/includes/RevisionList.php: Attempt to fix oversight timeouts (duration: 00m 28s)
  • 17:44 gehel: rebooting wdqs1002 for kernel upgrade
  • 17:36 gehel: rebooting wdqs1001
  • 17:28 gehel: deploying latest WDQS version
  • 16:16 ejegg: updated payments from f71d3fbc7b8331b0427748b3fd358b5e2fc626fa to f09297028acace67588c2de845b754e2ace75c97
  • 15:59 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Match JsonConfig change gerrit:282447 (duration: 00m 25s)
  • 15:54 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Enable Echo survey on French-language wikis" (duration: 00m 26s)
  • 15:40 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.20/extensions/MobileApp/config/config.json: SWAT: Roll out RESTBase usage to Android production app: 100% gerrit:282434 (duration: 00m 27s)
  • 15:29 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Echo survey on French-language wikis gerrit:282414 (duration: 00m 25s)
  • 15:27 elukey: restarted hhvm on mw1190 (hhvm-dump-debug saved in /tmp/hhvm.19289.bt)
  • 15:24 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [Cleanup] Remove VisualEditor AutoAccountEnable config now unused gerrit:280870 (duration: 00m 25s)
  • 15:20 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [Cleanup] Remove VisualEditor experimental config 2 of 2 gerrit:280869 (duration: 00m 28s)
  • 15:20 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [Cleanup] Remove VisualEditor experimental config 1 of 2 gerrit:280869 (duration: 00m 28s)
  • 15:13 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor on the Project ("Wikipedya") of htwiki gerrit:281263 (duration: 00m 32s)
  • 15:10 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add flood group to ladwiki gerrit:282201 (duration: 00m 29s)
  • 15:05 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add new namespaces and new aliases for newikibooks gerrit:281443 (duration: 00m 45s)
  • 14:09 elukey: mw1140 powercycled (ssh not usable, can't access as root via console too)
  • 11:14 elukey: mw1148 powercycled, not responsive to ssh
  • 10:27 elukey: powercycled mw1144.eqiad.wmnet
  • 08:44 volans: Re-enabling Puppet on cluster mysql and parsercache to deploy change 282385, T111654
  • 08:30 godog: bootstrap restbase1014-b T128107
  • 08:28 godog: start raid expansion on restbase2005 T127951
  • 08:19 godog: repool restbase2002, depool restbase2005
  • 08:19 volans: Disabling Puppet on cluster mysql and parsercache to merge and test change 282385 on db2040, T111654
  • 07:05 moritzm: restarted hhvm on mw1143, mw1201 and mw1213
  • 04:42 Krinkle: mwscript deleteEqualMessages.php --wiki shwiki (T45917)
  • 04:34 MaxSem: ran `mwscript extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=enwiki 'Corvin Victor Paul'` due to CA bug
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Apr 11 02:33:52 UTC 2016 (duration 8m 32s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 11m 22s)

2016-04-10

  • 22:56 Krinkle: mwscript deleteEqualMessages.php --wiki srwiki (T45917)
  • 22:16 Krinkle: mwscript deleteEqualMessages.php --wiki yowiki (T45917)
  • 21:55 Krinkle: mwscript deleteEqualMessages.php --wiki trwikimedia (T45917)
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Apr 10 02:31:13 UTC 2016 (duration 8m 33s)
  • 02:26 Krinkle: mwscript deleteEqualMessages.php --wiki zh_yuewiki
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 10m 00s)

2016-04-09

  • 20:44 logmsgbot: krenair@tin Synchronized wmf-config/LabsServices.php: sync labs-only merge of gehel's that was just left unmerged on tin (duration: 00m 41s)
  • 11:12 volans: Disabling tendril on db2047 (needs to be reimaged) to avoid flooding logs of tendril DB - T132011
  • 11:05 volans: Disabling tendril on es2005-es2010 (out of prod hosts) to avoid flooding logs of tendril DB - T129452
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Apr 9 02:31:58 UTC 2016 (duration 8m 40s)
  • 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 10m 11s)

2016-04-08

  • 14:03 godog: correction of the above, restbase1014-a
  • 14:02 godog: start cassandra bootstrap of restbase2004-a
  • 10:15 moritzm: rebooting radium (tor node) for kernel upgrade to 4.4
  • 09:53 elukey: dumps.wikimedia.org is now accepting only https:// (redirecting http:// to https://)
  • 09:49 moritzm: rebooting krypton for kernel upgrade
  • 07:50 moritzm: enable base::firewall on netmon1001
  • 04:01 kart_: Update cxserver to bd4739b
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Apr 8 02:33:06 UTC 2016 (duration 8m 29s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 10m 04s)
  • 00:04 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set wgSemiprotectedRestrictionLevels for en.wikipedia (Task T131976) (duration: 00m 26s)

2016-04-07

  • 23:56 logmsgbot: dereckson@tin Synchronized static/images/project-logos/ladwiki.png: Logo update for lad.wikipedia (Task T118491) (duration: 00m 27s)
  • 23:46 Dereckson: Synchronized wmf-config/InitialiseSettings.php: Raise upload limit to 4 GB (Gerrit:280831) — erratum for 23:13
  • 23:42 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Raise upload-by-URL request timeout (task T118887) (duration: 00m 32s)
  • 23:26 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Disable Echo survey on all wikis except test (gerrit:282230) (duration: 00m 25s)
  • 23:13 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Raise upload-by-URL request timeout (T118887) (duration: 00m 41s)
  • 22:05 Pchelolo: finished update RESTBase to 7f69f86ee9
  • 22:00 Pchelolo: start update RESTBase to 7f69f86ee9
  • 21:32 logmsgbot: mattflaschen@tin Synchronized wmf-config/LabsServices.php: Beta Cluster change (duration: 00m 30s)
  • 21:30 logmsgbot: krinkle@tin Synchronized w/: I69653efe0f1968: rm old symlinks (duration: 00m 46s)
  • 19:55 bblack: stop->disable varnishrls service on non-text clusters (upload, maps, misc) - ( https://gerrit.wikimedia.org/r/#/c/281439/ )
  • 19:34 ottomata: restarting eventlogging so it runs out of the scap deploy in eventlogging/analytics
  • 19:15 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.20
  • 19:00 ejegg: updated paymentswiki fraud filter settings
  • 16:48 godog: start raid expansion on restbase2002 T127951
  • 16:37 godog: repool restbase2003, raid expansion finished, depool restbase2002
  • 15:30 logmsgbot: thcipriani@tin Synchronized wmf-config/ProductionServices.php: SWAT: Use local resources in codfw for parsoid, url-downloader and mathoid gerrit:279355 (duration: 00m 25s)
  • 15:22 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Use ProductionServices for the jobqueue configuration 3/3 gerrit:279350 (duration: 00m 31s)
  • 15:22 logmsgbot: thcipriani@tin Synchronized wmf-config/jobqueue.php: SWAT: Use ProductionServices for the jobqueue configuration 2/3 gerrit:279350 (duration: 00m 27s)
  • 15:21 logmsgbot: thcipriani@tin Synchronized wmf-config/ProductionServices.php: SWAT: Use ProductionServices for the jobqueue configuration 1/3 gerrit:279350 (duration: 00m 30s)
  • 15:06 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor Beta Feature on Wikisources, Wiktionaries gerrit:280828 (duration: 00m 30s)
  • 14:47 gehel: syncing config to activate switch of CirrusSearch to codfw
  • 14:35 gehel: switching CirrusSearch to use Elasticsearch cluster in codfw (again), testing on mw1017 first
  • 13:58 chasemp: labstore restart nfs-export that crashed
  • 13:53 logmsgbot: gehel@tin Synchronized wmf-config: switching CirrusSearch to use Elasticsearch cluster in codfw (duration: 00m 35s)
  • 13:48 gehel: switching CirrusSearch to use Elasticsearch cluster in codfw
  • 13:28 _joe_: restarted hhvm on mw1015
  • 13:22 logmsgbot: gehel@tin Synchronized wmf-config: Fix TTMServer elastic config (duration: 00m 32s)
  • 12:50 logmsgbot: hoo@tin Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch on Wikidata after Property conversions (duration: 00m 26s)
  • 12:49 _joe_: restarting hhvm on mw1213, deadlock in HPHP::Treadmill::getAgeOldestRequest
  • 12:46 logmsgbot: gehel@tin Synchronized wmf-config: Point the codfw label back to the codfw cluster (duration: 00m 27s)
  • 12:37 logmsgbot: gehel@tin Synchronized wmf-config: Don't use HTTPS+pooling for labswiki (duration: 00m 27s)
  • 12:22 logmsgbot: gehel@tin Synchronized wmf-config: (no message) (duration: 00m 35s)
  • 12:08 gehel: enabling HTTPS + connection pooling for CirrusSerach on all mediawiki nodes (T131839)
  • 12:06 godog: remove restbase2004-b cassandra data directory
  • 11:57 godog: bump raid rebuild to 20MB/s on restbase2003
  • 10:03 _joe_: restarted hhvm on mw1114 (one of the child processes was defunct) and on mw1173 (deadlock in HPHP::Treadmill::getAgeOldestRequest)
  • 09:35 gehel: deploying HHVM HTTP pool sizing on all MW nodes (T131839 / https://gerrit.wikimedia.org/r/#/c/281881), not used yet, no impact expected
  • 09:28 elukey: de-pooling/re-pooling aqs100[23].eqiad.wmnet for nodejs upgrade
  • 09:19 elukey: re-added aqs1001.eqiad.wmnet to LVS pool via confctl
  • 08:59 elukey: removed aqs1001.eqiad.wmnet from LVS pool via confd for nodejs upgrade
  • 08:22 godog: stop cassandra bootstrap of restbase2004-b, not enough disk
  • 08:17 akosiaris: renamed duplicate cxserver-deploy and cxserver repos in phabricator to citoid-deploy and citoid respectively
  • 07:04 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Depool crashed db2047, needs to be reimaged T132011 (duration: 00m 38s)
  • 03:08 bd808: backfilled missing SAL entries from 2016-04-04T17:30Z to 2016-04-05T20:36Z to https://tools.wmflabs.org/sal/production
  • 03:08 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Apr 7 03:08:28 UTC 2016 (duration 9m 52s)
  • 02:58 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 10m 00s)
  • 02:38 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 17m 51s)
  • 02:09 chasemp: reboot db2047 (it's totally stuck w/ BUG: soft lockup - CPU#26 stuck for 22s! [migration/26:267] across)

2016-04-06

  • 23:44 logmsgbot: mattflaschen@tin Synchronized php-1.27.0-wmf.19/extensions/Flow/includes/Data/Listener/NotificationListener.php: Fix new topic notifications (duration: 00m 29s)
  • 23:40 logmsgbot: mattflaschen@tin Synchronized php-1.27.0-wmf.20/extensions/Flow/includes/Data/Listener/NotificationListener.php: Fix new topic notifications (duration: 00m 37s)
  • 23:11 logmsgbot: mattflaschen@tin Synchronized docroot/wikipedia.org/apple-app-site-association: Support handoff and credential sharing with the iOS app (duration: 00m 34s)
  • 22:49 bd808: Testing stashbot response in #wikimedia-fundraising
  • 22:38 awight: update payments from 6dbd26ce56f416af07655f9c500023096678450b to f71d3fbc7b8331b0427748b3fd358b5e2fc626fa
  • 22:36 awight: drupal update wmf_fredge_qc:7003
  • 22:34 awight: update fundraising-crm from 4cc17b635eb84204cced107d5de78533cc5ce06c to a20ac4c64e195732a27d0e9cfd33f0c23f4a8d4e
  • 21:57 awight: update fundraising-tools from d8c5cc0b399411efa3b0634fa891a236c3dbaab2 to b114b7174c3bd9bf53cd44bf55397049a03b96fb
  • 20:17 subbu: finished deploying parsoid sha 5f6c0c60
  • 20:10 subbu: synced code; restarted parsoid on wtp1001 as a canary
  • 20:06 subbu: starting parsoid deploy
  • 19:51 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: Promote labswiki to 1.27.0-wmf.20 following temporary rollback and fix
  • 19:49 logmsgbot: dduvall@tin Synchronized php-1.27.0-wmf.20/extensions/SemanticMediaWiki/includes/SMW_ParserExtensions.php: Replace usage of Title::newFromRedirect() (duration: 00m 38s)
  • 19:33 akosiaris: pool maps-test200{1,2,3} for kartotherian.svc.codfw.wmnet
  • 19:23 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.20
  • 19:22 akosiaris: bounce hhvm on mw1135, mw1145
  • 19:16 akosiaris: restart tilerator, tileratorui on maps-test200{1,2,3}
  • 19:15 akosiaris: restart kartotherian on maps-test200{1,2,3}
  • 18:58 YuviPanda: reboot notebook1001
  • 18:50 akosiaris: disable salt-minion on maps-test200{1,2,3} for maps services deployment. nodejs upgrade is in place
  • 17:00 logmsgbot: thcipriani@tin Finished scap: SWAT: Add user_wpzero AbuseFilter variable gerrit:281867 (duration: 27m 39s)
  • 16:32 logmsgbot: thcipriani@tin Started scap: SWAT: Add user_wpzero AbuseFilter variable gerrit:281867
  • 16:30 cwd: updated payments from bbf36d804220b61b8eb7e5bf7a9c427d98ae1aaa to 6dbd26ce56f416af07655f9c500023096678450b
  • 16:06 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove Language Overlay experiment gerrit:277837 (duration: 00m 26s)
  • 15:54 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Translate extension on uawikimedia gerrit:281403 (duration: 00m 28s)
  • 15:33 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: REVERT Enable Translate extension on uawikimedia gerrit:281403 (duration: 00m 25s)
  • 15:31 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Translate extension on uawikimedia gerrit:281403 (duration: 00m 27s)
  • 15:23 urandom: Restoring default stream throughput on restbase200{3,4-a}.codfw.wmnet
  • 15:20 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: REVERT Bump CirrusSearchRequestSet rev to 121456865906 PART II gerrit:280448 (duration: 00m 31s)
  • 15:17 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Bump CirrusSearchRequestSet rev to 121456865906 PART II gerrit:280448 (duration: 00m 30s)
  • 15:17 logmsgbot: thcipriani@tin Synchronized wmf-config/event-schemas: SWAT: Bump CirrusSearchRequestSet rev to 121456865906 PART I gerrit:280448 (duration: 00m 27s)
  • 15:15 ema: Upgrading cp* to jessie 8.4 point release and linux 4.4 (T131746, T131928). Not rebooting yet.
  • 15:07 logmsgbot: thcipriani@tin Synchronized docroot/mediawiki/xml: SWAT: Add Flow dumps schema gerrit:281640 (duration: 00m 28s)
  • 15:03 hashar: rebased php-1.27.0-wmf.19/MobileFrontend and php-1.27.0-wmf.20/MobileFrontend (single commit related to CI)
  • 14:34 logmsgbot: filippo@tin Synchronized wmf-config/ProductionServices.php: move mediawiki traffic back to restbase eqiad (duration: 00m 34s)
  • 14:26 elukey: hhvm restarted on mw1187
  • 14:15 bblack: rebooting baham (ns1) for 4.4 kernel + package updates
  • 14:05 godog: move restbase/citoid/cxserver varnish traffic back to eqiad
  • 13:45 ema: Upgrading cp1052 to jessie 8.4 point release and linux 4.4 (T131746, T131928)
  • 08:28 _joe_: disabling puppet on the mw servers to test hhvm changes
  • 06:07 _joe_: restarting HHVM on mw1134, deadlock in what appears to be HPHP::Treadmill::getAgeOldestRequest
  • 03:06 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Apr 6 03:06:18 UTC 2016 (duration 9m 27s)
  • 02:56 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.20) (duration: 09m 43s)
  • 02:31 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 11m 44s)
  • 01:02 logmsgbot: krenair@tin Finished scap: https://gerrit.wikimedia.org/r/#/c/281846/ - add messages for the new extendedconfirmed protection (duration: 95m 03s)
  • 00:55 MaxSem: restarted hhvm on mw1119, stuck

2016-04-05

  • 23:30 ejegg: updated payments wiki from d50c1b15f3d09c4724d7ad28e3668368ac8112be to bbf36d804220b61b8eb7e5bf7a9c427d98ae1aaa
  • 23:27 logmsgbot: krenair@tin Started scap: https://gerrit.wikimedia.org/r/#/c/281846/ - add messages for the new extendedconfirmed protection
  • 23:27 MaxSem: restarted hhvm on mw1188, stuck
  • 23:18 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/281807/ (duration: 00m 35s)
  • 23:17 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/281807/ (duration: 00m 35s)
  • 21:43 cscott: restarted parsoid on all nodes to complete deploy of version a5be1cdc
  • 21:37 cscott: restarted parsoid on wtp2001.codfw.wmnet as a (better) canary
  • 21:34 cscott: restarted parsoid on wtp1001.eqiad.wmnet as a canary
  • 21:34 cscott: updated Parsoid to version a5be1cdc
  • 21:34 bd808: https://tools.wmflabs.org/sal missing entries since 2016-04-04T16:36
  • 20:36 logmsgbot: dduvall@tin Purged l10n cache for 1.27.0-wmf.18
  • 20:32 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: Group0 to 1.27.0-wmf.20
  • 20:29 logmsgbot: dduvall@tin Finished scap: testwiki to php-1.27.0-wmf.20 and rebuild l10n cache (duration: 32m 19s)
  • 19:57 logmsgbot: dduvall@tin Started scap: testwiki to php-1.27.0-wmf.20 and rebuild l10n cache
  • 19:31 marxarelli: Applying security patches to wmf/1.27.0-wmf.20 checkout on tin
  • 19:23 marxarelli: Cloning mediawiki for checkout of new 1.27.0-wmf.20 branch on tin
  • 18:11 ejegg: updated payments wiki from cc298682f7ee9f43f5be863c6078e1c2ba70e3e6 to d50c1b15f3d09c4724d7ad28e3668368ac8112be
  • 18:09 marxarelli: Creating new wmf/1.27.0-wmf.20 branch on tin
  • 17:00 ejegg: updated payments wiki from a9659965d8b55b11518680a1170242f311c7f1d2 to cc298682f7ee9f43f5be863c6078e1c2ba70e3e6
  • 15:43 godog: ms-be1019 to weight 3500 - T116842
  • 14:40 logmsgbot: oblivian@tin Synchronized wmf-config/ProductionServices.php: make mediawiki talk to codfw restbase only (duration: 00m 47s)
  • 14:10 _joe_: external traffic for restbase, citoid, cxserver fully switched to codfw
  • 14:05 ema: repooling cp1043 (maps) after varnish upgrade to 4.1.2-1wm2. Bug: T131830
  • 13:58 ema: depooling cp1043 (maps) for varnish upgrade to 4.1.2-1wm2. Bug: T131830
  • 13:52 ema: repooling cp1044 (maps) after varnish upgrade to 4.1.2-1wm2. Bug: T131830
  • 13:49 urandom: Throttling outbound stream throughput to 15Mbps on restbase2004-a.codfw.wmnet and restbase2003.codfw.wmnet : T95253
  • 13:40 ema: depooling cp1044 (maps) for varnish upgrade to 4.1.2-1wm2. Bug: T131830
  • 13:29 cmjohnson1: re-installing snapshot1007 with trusy
  • 13:14 elukey: restarted hhvm on mw1248 (hhvm was segfaulting)
  • 10:16 _joe_: all hhvm extensions uploaded for debian jessie as well
  • 07:14 _joe_: uploading the hhvm 3.12.1 backport package for jessie to reprepro
  • 02:36 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Apr 5 02:36:21 UTC 2016 (duration 9m 19s)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 11m 55s)
  • 01:29 eileen: Updating CiviCRM from bfe563f2451725e734d6ddbddb3be716463a3512 to 4cc17b635eb84204cced107d5de78533cc5ce06c
  • 00:14 Krinkle: krinkle@terbium Running deleteEqualMessages.php over previously cleaned wikis with --lang-code (T45917)
  • 00:12 ori: renamed frontend.navtiming.loading -> frontend.navtiming.loadEventStart and frontend.navtiming.sending -> frontend.navtiming.fetchStart on graphite2001 and graphite1001 ahead of merging https://gerrit.wikimedia.org/r/#/c/281082

2016-04-04

  • 23:30 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/281565/ (duration: 00m 29s)
  • 23:25 logmsgbot: maxsem@tin Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/281070/ (duration: 00m 32s)
  • 23:18 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/280445/ (duration: 00m 27s)
  • 23:17 logmsgbot: maxsem@tin Synchronized static/images/project-logos/astwiki.png: https://gerrit.wikimedia.org/r/#/c/280445/ (duration: 00m 27s)
  • 23:13 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/281314/ (duration: 00m 28s)
  • 23:08 awight: update fundraising tool from 1bc23cb4bfaf2a9d4d215aad79dd67d891b5d973 to d8c5cc0b399411efa3b0634fa891a236c3dbaab2
  • 20:12 subbu: finished deploying parsoid sha 579ec3e6
  • 20:07 subbu: synced code; restarted parsoid on wtp1002 as a canary
  • 20:04 subbu: starting parsoid deploy
  • 19:48 ejegg: rolled payments-wiki back to a9659965d8b55b11518680a1170242f311c7f1d2
  • 19:40 apergos: disabled puppet on snapshot1001,2,4 while new hosts come on line, til probably Apr 5-6
  • 19:38 akosiaris: depool maps-test2004
  • 18:59 ejegg: updated payments-wiki from a9659965d8b55b11518680a1170242f311c7f1d2 to eac5f979099056ef142500570555771dfb1992c1
  • 18:36 tgr: ran MassMessages/sendMessages.php for T128056
  • 18:18 cmjohnson1: stat1002 swapping failed disk slot 11
  • 18:04 cmjohnson1: db1052 swapping failed disk slot 8
  • 17:37 cmjohnson1: shutting down iridium to reapply thermal paste
  • 17:30 greg-g: Phabricator going down in about 10 minutes to hopefully address the overheating issue: T131742
  • 16:36 urandom: Restarting bootstrap of restbase2004.codfw.wmnet : T95253
  • 15:53 greg-g: 15:45 < elukey> !log aqs1001 re-added to the aqs pool (nodejs NOT upgraded due to issues with Cassandra)
  • 15:42 Krenair: ran wikitech-static updates
  • 15:18 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.19/extensions/MobileApp/config/config.json: SWAT: Roll out RESTBase usage to Android production app: 50% gerrit:280957 (duration: 00m 46s)
  • 14:55 urandom: Restarting restbase2004-a.codfw.wmnet (cancelling bootstrap of 2004-b)
  • 14:46 elukey: de-pooled aqs1001.eqiad from the confd pool for nodejs upgrade
  • 13:40 ema: nginx rolling restart for openssl upgrade on cache hosts
  • 10:42 elukey: re-pooled aqs1001.eqiad (no node upgrade, need more info about restbase)
  • 10:23 godog: reduce reserved blocks for /srv on restbase2004
  • 09:59 moritzm: installing pcre3 updates
  • 09:57 godog: start expanding raid0 on restbase2003
  • 09:54 ema: nginx rolling restart for openssl upgrade: cp1046, cp1052, cp1068, cp1071, cp1099
  • 09:53 elukey: de-pooled aqs1001.eqiad.wmnet as pre-step for nodejs upgrade
  • 09:41 godog: depool restbase2003 before raid expansion
  • 09:37 godog: repool restbase2004
  • 09:33 godog: deploy restbase ba39d2bcd2f5 to restbase2004 before repooling
  • 09:26 volans: Re-enabling Puppet on cluster mysql and parsercache to deploy change 279596, T111654
  • 09:10 volans: Disabling Puppet on cluster mysql and parsercache to merge and test change 279596 on db2040, T111654
  • 08:50 moritzm: installing gnupg updates
  • 06:54 moritzm: installing apt bugfix updates
  • 04:18 Tim: iridium came back up, but mcelog reports high CPU temperature prior to the shutdown
  • 04:13 Tim: attempting to turn iridium back on via drac. "getraclog" says it powered itself off after resetting four times
  • 03:15 greg-g: Phabricator is down
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Apr 4 02:33:39 UTC 2016 (duration 8m 31s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 11m 45s)
  • 00:49 logmsgbot: krinkle@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 29s)
  • 00:30 logmsgbot: krinkle@tin Synchronized w/static.php: (no message) (duration: 00m 35s)
  • 00:21 Krinkle: mwscript deleteEqualMessages.php --wiki ruwiki (T45917)
  • 00:06 Krinkle: mwscript deleteEqualMessages.php --wiki metawiki --delete

2016-04-03

  • 22:37 Krinkle: mwscript deleteEqualMessages.php --wiki eswiki --lang-code ca (T45917)
  • 22:37 Krinkle: mwscript deleteEqualMessages.php --wiki eswiki (T45917)
  • 22:34 Krinkle: mwscript deleteEqualMessages.php --wiki eswiki
  • 19:16 Krenair: mw1146 began to respond with 503s to all requests, tried restarting apache2/hhvm and shortly afterwards it started working again
  • 13:26 logmsgbot: reedy@tin Synchronized wmf-config/InitialiseSettings-labs.php: labs copy upload thing. noooooop for prod (duration: 00m 33s)
  • 13:02 logmsgbot: reedy@tin Synchronized wmf-config/InitialiseSettings.php: labs copy upload thing. noooooop for prod (duration: 00m 28s)
  • 11:37 logmsgbot: ori@tin Synchronized wmf-config/extension-list-labs: I0d081186: Also add Newsletter to extension-list-labs (duration: 00m 27s)
  • 11:34 logmsgbot: ori@tin Synchronized wmf-config/InitialiseSettings-labs.php: I3ffe65b8: Load the Newsletter extension on the beta cluster (2/2) (duration: 00m 33s)
  • 11:34 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings-labs.php: I3ffe65b8: Load the Newsletter extension on the beta cluster (1/2) (duration: 00m 34s)
  • 03:14 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 56m 43s)

2016-04-02

  • 20:28 hashar: Restarted Jenkins on gallium
  • 19:16 logmsgbot: dereckson@tin Synchronized wmf-config/InitialiseSettings.php: 350K celebration logo for cs.wikipedia (T131605) (duration: 00m 29s)
  • 19:14 logmsgbot: dereckson@tin Synchronized static/images/project-logos/cswiki.png: 350K celebration logo for cs.wikipedia (duration: 00m 33s)
  • 10:45 logmsgbot: reedy@tin rebuilt wikiversions.php and synchronized wikiversions files: wikitech back to .19
  • 10:01 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.19/extensions/OATHAuth: Fix for 2FA testing (duration: 00m 30s)
  • 07:22 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.19/extensions/NavigationTiming/modules/ext.navigationTiming.js: T131565 (duration: 00m 33s)
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Apr 2 02:32:42 UTC 2016 (duration 8m 36s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 10m 48s)
  • 02:03 logmsgbot: l10nupdate@tin LocalisationUpdate failed (1.27.0-wmf.18) at 2016-04-02 02:03:32+00:00
  • 01:04 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.19/includes/specials/SpecialRedirect.php: T131328 (duration: 00m 39s)

2016-04-01

  • 23:43 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.19/extensions/MobileFrontend/includes/MobileFrontend.hooks.php: T131337 (duration: 00m 38s)
  • 22:39 Pchelolo: rolling restart restbase to pick up https://gerrit.wikimedia.org/r/#/c/280951/ config change
  • 22:13 mutante: hooft - shutdown
  • 21:36 mutante: hooft: stopping ganglia aggregators, remove from icinga/storedconfigclean
  • 21:26 mutante: hooft.esams - stop puppet, stop salt
  • 20:57 mutante: bast3001 - sign puppet certs, salt keys
  • 19:20 logmsgbot: reedy@tin Synchronized wmf-config/InitialiseSettings-labs.php: noop change for labs (duration: 00m 29s)
  • 19:10 mutante: booting slauerhoff into PXE, installing as new bast3001
  • 19:10 logmsgbot: reedy@tin Synchronized wmf-config/CommonSettings.php: Replace old copyright image config (duration: 00m 32s)
  • 19:09 logmsgbot: reedy@tin Synchronized wmf-config/InitialiseSettings-labs.php: consistency (duration: 00m 28s)
  • 17:45 bearND: mobileapps deployed 66f8dac
  • 17:12 bearND: starting mobileapps deploy (retry of yesterdays which did not complete)
  • 15:37 papaul: installing graphite2002
  • 15:04 YuviPanda: grafana made me and godog admins
  • 15:01 moritzm: rebooting copper for upgrade to 4.4
  • 13:39 gehel: deploying apache config for MW appservers: new cache-control headers for portals (https://gerrit.wikimedia.org/r/#/c/280204/ T126280)
  • 11:06 Dereckson: Imported Jerusalem_Old_City_Walking_to_the_Western_Wall_4K.webm to Commons (T131441)
  • 09:14 gehel: testing new Cache-Control headers on mw1017
  • 09:09 logmsgbot: reedy@tin Synchronized wmf-config/throttle.php: Fix my quote fail (duration: 00m 29s)
  • 08:42 logmsgbot: reedy@tin rebuilt wikiversions.php and synchronized wikiversions files: Revert labswiki to wmf.18 as 2FA seems to be broken
  • 08:38 godog: bootstrap restbase2004-b
  • 08:29 logmsgbot: dereckson@tin Synchronized wmf-config/throttle.php: Fix throttle rules (Gerrit change 280819). (duration: 00m 29s)
  • 08:11 _joe_: progressively reducing weight of the older servers in the api cluster
  • 07:56 _joe_: depooling mw1121-1130 from the api cluster
  • 07:15 logmsgbot: reedy@tin Synchronized wmf-config/throttle.php: T130460 Jerusalem Hackathon (duration: 00m 40s)
  • 07:00 _joe_: depooling mw1070-89 from the appserver cluster. T126242
  • 06:53 _joe_: setting all newer appservers weight to 20 in eqiad
  • 05:28 bearND: mobileapps deployed 66f8dac
  • 05:04 bearND: starting mobileapps deploy
  • 02:38 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Apr 1 02:38:39 UTC 2016 (duration 8m 58s)
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 10m 29s)
  • 00:55 logmsgbot: csteipp@tin Synchronized wmf-config/CommonSettings-labs.php: Synching labs revert (duration: 00m 31s)
  • 00:54 logmsgbot: csteipp@tin Synchronized wmf-config/InitialiseSettings-labs.php: Synching labs revert (duration: 00m 27s)
  • 00:37 logmsgbot: csteipp@tin Synchronized wmf-config/CommonSettings-labs.php: Sync labs change to keep repo clean (duration: 00m 31s)
  • 00:36 logmsgbot: csteipp@tin Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 34s)

2016-03-31

  • 23:48 logmsgbot: awight@tin Finished scap: (no message) (duration: 26m 44s)
  • 23:21 logmsgbot: awight@tin Started scap: (no message)
  • 23:14 logmsgbot: awight@tin Synchronized php-1.27.0-wmf.19/extensions/OATHAuth: SWAT deployment of OATHAuth fixes, take 2 (duration: 00m 32s)
  • 23:13 logmsgbot: awight@tin Synchronized php-1.27.0-wmf.19/extensions/OATHAuth: SWAT deployment of OATHAuth fixes (duration: 00m 46s)
  • 22:51 Pchelolo: rolling restart restbase. Apply https://gerrit.wikimedia.org/r/#/c/280711/ config change
  • 22:22 Pchelolo: finished update restbase to ba39d2bc
  • 22:15 Pchelolo: started update restbase to ba39d2bc
  • 22:06 Pchelolo: update restbase to ba39d2bc canary on restbase1005
  • 22:02 Pchelolo: reenable puppet on cerium in restbase staging
  • 21:26 Pchelolo: disabling puppet on cerium, updating config and deploying restbase to staging. Testing https://gerrit.wikimedia.org/r/#/c/280711/
  • 21:00 mutante: bast2001 has been reinstalled and can be used again. fingerprints at https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bast2001.wikimedia.org
  • 20:38 mutante: bast2001 - revoking old, signing new puppet certs, salt key..
  • 20:21 mutante: bast2001 - reinstall after disk replacement
  • 20:01 ottomata: stopping eventlogging, uninstalling globally installed eventlogging python code, running puppet, restarting eventlogging from /srv/deployment/eventlogging/eventlogging
  • 19:59 ottomata: temporarily stopped puppet on eventlog1001
  • 19:12 volans: Starting slave for s2 on db1047
  • 19:04 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.19
  • 17:46 mutante: rcs1002 - stop ferm
  • 15:56 tgr: running checkLocalUser.php --delete on some wikis for T119736
  • 15:17 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.19/extensions/CirrusSearch/includes/ElasticsearchIntermediary.php: SWAT: Ignore ResultSets that do not return pages gerrit:280669 (duration: 00m 38s)
  • 14:01 volans: manually running puppet on es2018 to double verify merged changes
  • 13:55 mutante: hooft - reboot to pxe, one more time
  • 13:49 mutante: netmon1001 - re-enabled puppet (was for torrus issue earlier)
  • 13:44 elukey: forced puppet agent on netmon1001
  • 13:27 ema: repooling cp1044, upgraded to varnish 4 (T122880)
  • 13:17 moritzm: installing rsync security updates
  • 13:09 ema: depooling cp1044 for varnish 4 upgrade (T122880)
  • 12:02 moritzm: removed rsync 3.10-2ubuntu0.1~wmf1 from carbon. this backport was only needed when the hosts used a mix of precise and trusty. this is no longer the case, so remove the backport and allow to use stock Ubuntu updates on precise again
  • 10:42 elukey: stopping torrus-common on netmon1001 to try https://wikitech.wikimedia.org/wiki/Torrus#Deadlock_problem
  • 10:35 elukey: restarted torrus-common on netmon1001
  • 10:24 moritzm: logstash on logstash100[1-3] is now using systemd
  • 09:50 moritzm: disabled puppet on logstash1002/1003 (to activate the new logstash systemd unit in steps)
  • 09:00 volans: Forced WriteBack cache policy mode on db1047 RAID
  • 06:11 Krinkle: mwscript deleteEqualMessages.php --wiki zh_min_nanwiki (T45917)
  • 05:31 logmsgbot: krinkle@tin Synchronized docroot/wikipedia.org/speed-tests/: (no message) (duration: 00m 33s)
  • 05:13 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.19/extensions/MobileFrontend/: Iaa5ed38c712b19e (duration: 00m 31s)
  • 05:10 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.18/extensions/MobileFrontend/: Iaa5ed38c712b19e (duration: 00m 42s)
  • 02:48 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 10m 26s)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 11m 52s)
  • 00:30 mutante: argon - removing apache and config
  • 00:04 logmsgbot: krinkle@tin Synchronized errorpages/: (no message) (duration: 00m 33s)

2016-03-30

  • 23:42 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.19/extensions/Kartographer/: (no message) (duration: 00m 27s)
  • 23:35 yuvipanda: cleaned salt keys for analytics1017 and 1021
  • 23:34 yuvipanda: cleaned out puppet cert for analytics 1017 and 1021
  • 23:10 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.18/extensions/Kartographer/: https://gerrit.wikimedia.org/r/#/c/280600/ (duration: 00m 31s)
  • 23:08 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.19/extensions/Kartographer/: https://gerrit.wikimedia.org/r/#/c/280601/ (duration: 00m 34s)
  • 23:06 mutante: torrus - follow instruction for deadlock problem
  • 23:06 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/280449/ (duration: 00m 43s)
  • 22:13 mutante: smokeping/netmon - remove hooft/bast3001, restart service
  • 21:05 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.19/extensions/CirrusSearch/: https://gerrit.wikimedia.org/r/#/c/280549/ (duration: 00m 40s)
  • 20:20 subbu: finished deploying parsoid version a20ef276
  • 20:15 Pchelolo: finished update restbase to 1b52276f
  • 20:11 subbu: synced code; restarted parsoid on wtp1004 as a canary
  • 20:10 Pchelolo: starting update restbase to 1b52276f
  • 20:07 subbu: starting parsoid deploy
  • 20:03 Pchelolo: update restbase to 1b52276f canary deploy to restbase1005
  • 19:48 mutante: hooft revoking puppet cert, salt key, reboot into PXE
  • 19:43 mutante: hooft is going to be reinstalled and renamed, affects esams bastion and ganglia during the install
  • 19:39 mutante: hooft - updating root password
  • 19:08 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.19
  • 18:58 Pchelolo: doing a puppet run in codfw restbase nodes and performing a rolling restart of restbase
  • 18:55 Pchelolo: finished restbase update to 9fe8676d
  • 18:20 Pchelolo: starting restbase update to 9fe8676d
  • 17:46 logmsgbot: hoo@tin Synchronized wmf-config/interwiki.php: Sync changes on meta (duration: 00m 42s)
  • 17:22 akosiaris: restarting grrrit-wm
  • 16:24 ostriches: gerrit: restarting to clean up mismatched jvm versions
  • 15:41 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.18/extensions/Echo/modules/ooui/mw.echo.ui.NotificationBadgeWidget.js: SWAT: Change threshold for survey invitation from 2 unread notifs to 1 gerrit:280369 (duration: 00m 28s)
  • 15:32 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.19/extensions/Translate/resources/js/ext.translate.editor.js: SWAT: Fix regressions in insertables placement gerrit:280414 (duration: 00m 27s)
  • 15:26 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.19/extensions/ContentTranslation/modules/publish/ext.cx.publish.js: SWAT: Try to avoid JS error gerrit:280389 (duration: 00m 29s)
  • 15:20 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.19/extensions/Echo/modules/ooui/mw.echo.ui.NotificationBadgeWidget.js: SWAT: Change threshold for survey invitation from 2 unread notifs to 1 gerrit:280370 (duration: 00m 28s)
  • 14:53 akosiaris: restart zotero on sca2001, sca2002
  • 14:52 akosiaris: restart zotero on sca1001, sca1002
  • 14:34 mobrovac: restbase rolling restart to apply https://gerrit.wikimedia.org/r/#/c/280091/
  • 14:29 mobrovac: restbase deploy 3ea08751a8 on restbase2004 after reimage
  • 14:26 godog: bootstrap cassandra-a on restbase2004
  • 12:40 ema: repooling cp1043 running varnish4 (T122880)
  • 11:37 godog: reimage restbase2004.codfw.wmnet
  • 11:18 kart_: Updated cxserver to 5699a49
  • 10:25 jynus: starting regression/stress testing of codfw mediawiki infrastructure
  • 09:07 ema: depooling cp1043 for varnish4 upgrade (T122880)
  • 08:58 moritzm: repooling maps-test2001
  • 08:45 moritzm: depooling maps-test2001(to apply ferm)
  • 08:22 jynus: restarting db1047 due to data corruption on Aria tables
  • 06:56 moritzm: restarted hhvm on mw1128, mw1139
  • 06:06 Tim: on terbium: making visual diff dump lists with makeDumpList.php
  • 05:52 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.18/extensions/WikimediaMaintenance/makeDumpList.php: (no message) (duration: 00m 27s)
  • 05:39 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.19/extensions/WikimediaMaintenance/makeDumpList.php: (no message) (duration: 00m 38s)
  • 03:17 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Mar 30 03:17:15 UTC 2016 (duration 9m 29s)
  • 03:07 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 18m 03s)
  • 02:32 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 11m 29s)

2016-03-29

  • 23:55 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.19/extensions/WikidataPageBanner/: https://gerrit.wikimedia.org/r/#/c/280327/ (duration: 00m 33s)
  • 23:50 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.19/extensions/UploadWizard/: https://gerrit.wikimedia.org/r/#/c/280339/ (duration: 00m 33s)
  • 23:48 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.18/extensions/WikidataPageBanner/: https://gerrit.wikimedia.org/r/#/c/280328/ (duration: 00m 38s)
  • 23:10 awight: update payments from 5a1996bc21fe694b99556bc1b501e484075dabe2 to a9659965d8b55b11518680a1170242f311c7f1d2
  • 22:47 dapatrick: Deployed patch for T123653 to wmf18 and wmf19
  • 22:23 awight: update payments to experimentally rollback
  • 22:21 csteipp: Deployed patch for T127420 to wmf18 and wmf19
  • 21:47 awight: updating payments from 5a1996bc21fe694b99556bc1b501e484075dabe2 to a9659965d8b55b11518680a1170242f311c7f1d2
  • 21:40 ejegg: enabled thank you mail send job
  • 21:32 ejegg: updated CiviCRM from 18431eec74de251066deb77d5b63607e06f6f135 to bfe563f2451725e734d6ddbddb3be716463a3512
  • 20:53 awight: rollback payments
  • 20:52 awight: update payments from 5a1996bc21fe694b99556bc1b501e484075dabe2 to a9659965d8b55b11518680a1170242f311c7f1d2
  • 20:48 logmsgbot: thcipriani@tin Synchronized wmf-config/throttle.php: Wikipedia course, Prague throttle rule gerrit:280255 (duration: 00m 38s)
  • 20:48 mutante: welcome Dereckson to Mediawiki deployers
  • 20:29 logmsgbot: thcipriani@tin Purged l10n cache for 1.27.0-wmf.15
  • 20:27 awight: rolled back payments to 5a1996bc21fe694b99556bc1b501e484075dabe2
  • 20:26 awight: update payments from 5a1996bc21fe694b99556bc1b501e484075dabe2 to a9659965d8b55b11518680a1170242f311c7f1d2
  • 20:24 ejegg: disabled thank you mail send
  • 20:10 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.19
  • 20:02 logmsgbot: thcipriani@tin Finished scap: testwiki to php-1.27.0-wmf.19 and rebuild l10ncache (duration: 56m 27s)
  • 19:47 cmjohnson1: disabling puppet on hosts cp1056/57 and cp1069/70. All are being reclaimed
  • 19:06 logmsgbot: thcipriani@tin Started scap: testwiki to php-1.27.0-wmf.19 and rebuild l10ncache
  • 18:51 jynus: performing schema change on testwiki T130692
  • 18:39 mutante: alsafi deleted service template file remnant
  • 17:36 thcipriani: starting branch cut for wmf.19
  • 17:00 ejegg: updated SmashPig from 9f08f6a1891b0a2bb70eacf460c2f9a8153c3b4e to 5cadcf3abcfcda4552b068c783337d82b743e2e5
  • 16:34 gehel: restarting apache on palladium, strontium and rhodium. Restart should be graceful. In case it is not, puppet errors will happen.
  • 16:20 jynus: upgrading some packages on silver, restarting bacula-fd
  • 16:17 gehel: restarting puppetmaster on palladium
  • 15:59 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for DoubleWiki gerrit:280242 (duration: 00m 28s)
  • 15:52 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for UnicodeConverter gerrit:280227 (duration: 00m 28s)
  • 15:48 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for Poem gerrit:280226 (duration: 00m 28s)
  • 15:36 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Revert Use extension registration for DoubleWiki (duration: 00m 27s)
  • 15:33 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for DoubleWiki gerrit:280225 (duration: 00m 28s)
  • 15:29 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for SyntaxHighlight_GeSHi gerrit:280224 (duration: 00m 28s)
  • 15:26 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for ImageMap gerrit:280223 (duration: 00m 31s)
  • 15:12 logmsgbot: thcipriani@tin Synchronized static/images/project-logos/astwiktionary.png: SWAT: Logo for ast.wiktionary gerrit:280057 (duration: 00m 27s)
  • 15:08 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: HD logo for da.wikipedia gerrit:279897 (duration: 00m 27s)
  • 15:07 logmsgbot: thcipriani@tin Synchronized static/images/project-logos: SWAT: HD logo for da.wikipedia gerrit:279897 (duration: 00m 38s)
  • 15:06 moritzm: repooled maps-test2004
  • 15:02 moritzm: repooled maps-test2003 and depooled maps-test2004 (to apply ferm)
  • 14:54 moritzm: depooled maps-test2003 (to apply ferm)
  • 13:53 moritzm: repooled maps-test2002 (nodejs being put on hold prevented generation of ferm rules provided by service::node, this has been fixed)
  • 13:14 jynus: restoring (but not provisioning) older backup from labswiki sql
  • 13:08 moritzm: unheld nodejs on maps*
  • 11:51 hashar: Jenkins / Zuul lagging out trying to catch up with a huge number of changes I have sent
  • 10:11 logmsgbot: reedy@tin Synchronized wmf-config/throttle.php: Remove old throttle rules (duration: 00m 44s)
  • 09:50 godog: powercycle ms-be2008, got stuck again while diagnosing failed /dev/sdl
  • 09:08 paravoid: hard-resetting alsafi, I/O-stuck (qemu bug?)
  • 09:05 paravoid: hard-resetting mx1001, I/O-stuck (qemu bug?)
  • 08:45 moritzm: upgraded chromium on osmium
  • 03:29 bblack: cr1/2-eqiad: re-activating static routes for ns0/ns1 (ipv4/ipv6) pointing to radon -
  • 02:45 Krinkle: mwscript deleteEqualMessages.php --wiki zhwikisource
  • 02:45 Krinkle: mwscript deleteEqualMessages.php --wiki zhwikinews
  • 02:36 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Mar 29 02:36:39 UTC 2016 (duration 8m 46s)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 11m 34s)
  • 01:07 Pchelolo: restbase finish update to 3ea08751a8
  • 01:00 Pchelolo: restbase start update to 3ea08751a8
  • 00:15 logmsgbot: krinkle@tin Synchronized docroot/wikipedia.org/speed-tests/: (no message) (duration: 00m 32s)
  • 00:12 mutante: mw2090 - reinstalled, re-signed puppet/salt

2016-03-28

  • 23:38 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.18/includes/api/ApiUpload.php: https://gerrit.wikimedia.org/r/#/c/280092/ (duration: 02m 26s)
  • 23:34 mutante: mw2090 - reboot, reinstall as regular appserver
  • 23:27 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/280115/ (duration: 00m 35s)
  • 23:19 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.18/extensions/Echo/: https://gerrit.wikimedia.org/r/#/c/280112/ (duration: 00m 43s)
  • 23:16 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.18/extensions/MobileApp/: https://gerrit.wikimedia.org/r/#q,279980,n,z (duration: 01m 07s)
  • 22:03 mutante: mw2090 - disable puppet/cron, crons now on wasat
  • 21:46 mutante: mw1119 restarted hhvm
  • 21:26 bblack: http/2 enabled on pinkunicorn.wikimedia.org for testing - T96848
  • 21:26 bblack: http/2 enabled on pinkunicorn.wikimedia.org for testing
  • 21:12 Pchelolo: restbase finised deploy of 78e6eab37c
  • 21:04 Pchelolo: restbase start deploy of 78e6eab37c
  • 21:04 mdholloway: mobileapps deployed 90cfdcd
  • 20:37 Pchelolo: restbase canary deploy of 78e6eab37c to restbase1005
  • 20:35 mdholloway: starting mobileapps deployment
  • 20:09 ejegg: restarted thank you email job
  • 19:53 ejegg: rolled civicrm back to 18431eec74de251066deb77d5b63607e06f6f135
  • 19:27 ejegg: updated civiCRM from 18431eec74de251066deb77d5b63607e06f6f135 to d95c93f73a9329d6367403ac3e7d60c97987e61f
  • 19:11 ejegg: disabled thank you mail job
  • 18:30 andrewbogott: labs public DNS will remain in a transitional state until MarkMonitor updates their records and the TTL expires. DNS resolution should proceed as usual in the meantime.
  • 18:03 mutante: mw1140 restart hhvm
  • 17:52 cmjohnson1: shutting down radon to re-apply thermal paste:
  • 17:45 mutante: mw1152,mw2090,terbium apt-get remove python-mysqldb (T84075)
  • 17:44 legoktm: manually fixed T129881 rename
  • 17:20 paravoid: cr1/2-eqiad: deactivating static routes for ns0/ns1 (ipv4/ipv6) pointing to radon
  • 17:18 bblack: rebooting radon again
  • 15:31 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.18/extensions/OpenStackManager/OpenStackManager.php: SWAT: Wikitech: Remove address, domain, proxy special pages gerrit:279569 (duration: 00m 31s)
  • 13:32 bblack: rebooting radon (unresponsive console)
  • 08:44 paravoid: force-rebooting ms-be2008 to fix hundreds of unkillable mkfs.xfs stuck due to a hosed disk
  • 08:21 paravoid: powercycling ms-be2016, down for ~1d, unresponsive on console
  • 02:35 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Mar 28 02:35:00 UTC 2016 (duration 8m 26s)
  • 02:26 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 11m 32s)

2016-03-27

  • 22:21 logmsgbot: ori@tin Synchronized wmf-config/StartProfiler.php: I1b5c620b85: Better request profiling via XWD header (duration: 00m 33s)
  • 20:11 elukey: restarted hhvm on mw1123
  • 20:04 elukey: Increased nf_conntrack_max to ~528k for the kafka brokers (https://gerrit.wikimedia.org/r/279776)
  • 02:57 jzerebecki: disabling 2fa for Dereckson T130892 [labswiki]> delete from oathauth_users where id=402;
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Mar 27 02:31:03 UTC 2016 (duration 8m 35s)
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 10m 06s)
  • 01:51 jzerebecki: disabling 2fa for Hym411 T130994 [labswiki]> delete from oathauth_users where id=1363;

2016-03-26

  • 21:11 csteipp_ooo: removed 2FA from wikitech accounts that looked to be affected by T130892
  • 16:21 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.18/extensions/SemanticForms: https://gerrit.wikimedia.org/r/#/c/279701/ (duration: 00m 32s)
  • 13:46 urandom: Cassandra on restbase2004.codfw.wmnet shut down, hardware failure; Down for the weekend : T130990
  • 13:26 urandom: Stopping Cassandra on restbase2004.codfw.wmnet
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Mar 26 02:31:13 UTC 2016 (duration 8m 34s)
  • 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 10m 06s)
  • 00:11 mutante|away: scb[12]00[12] - delete changeprop main.log per request

2016-03-25

  • 21:52 ori: Updated Grafana to latest nightly
  • 21:19 logmsgbot: ori@tin Synchronized wmf-config/logging.php: Ibc96f9d3bd: Integrate X-Wikimedia-Debug with Logstash; I28b137f5e035: Logging: convert to short array syntax (duration: 00m 36s)
  • 20:46 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.18/includes/filerepo/LocalRepo.php: Idaa1237638: Request-local caching of image_redirect and I545ce6b160b: Lower pcTTL in checkRedirect() to 30 (duration: 00m 37s)
  • 19:32 tgr: running extensions/CentralAuth/maintenance/checkLocal{Names,User}.php on jawiktionary and dewikiquote
  • 19:08 mutante: scb1001 - Unit changeprop.service entered failed state - it's a new deployment though, acked earlier by mobrovac already
  • 18:27 mobrovac: restbase started mobile-sections dump of enwiki on restbase1009 for articles edited before 2016-03-23 as per T130698
  • 17:19 akosiaris: rolling restart of restbase nodes to apply https://gerrit.wikimedia.org/r/#/c/279597/
  • 16:55 logmsgbot: ori@tin Synchronized wmf-config/logging.php: Ia2cd5daaf3: Update logging config for request IDs (duration: 00m 31s)
  • 16:50 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.18/includes: Iaf90c20c330e: Provide a unique request identifier (duration: 01m 22s)
  • 15:32 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: Remove T44894 FIXME note (duration: 00m 27s)
  • 15:03 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: Remove sampling from ApiAction kafka channel (duration: 00m 33s)
  • 14:58 logmsgbot: demon@tin Synchronized wmf-config/db-codfw.php: docfix (duration: 00m 34s)
  • 14:43 godog: chown _graphite:_graphite frontend.navtiming.loadEventEnd
  • 14:38 godog: depool restbase and drain cassandra from restbase1007
  • 14:20 bblack: all authdns servers running ferm rules now
  • 13:15 bblack: enabling ferm on authdns servers (one at a time, while watching stuff...)
  • 08:38 akosiaris: depool maps-test2002.codfw.wmnet temporarily
  • 03:45 csteipp: redeployed Scribunto patch for T110143
  • 02:38 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Mar 25 02:38:53 UTC 2016 (duration 8m 55s)
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 09m 54s)
  • 00:24 logmsgbot: catrope@tin Finished scap: Pushing out i18n changes from SWAT (duration: 26m 15s)

2016-03-24

  • 23:58 logmsgbot: catrope@tin Started scap: Pushing out i18n changes from SWAT
  • 23:52 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/ContentTranslation: Remove registration of missing i18n keys (duration: 00m 28s)
  • 23:51 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/Flow: Remove registration of missing i18n keys (duration: 00m 46s)
  • 23:51 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/Graph: Remove registration of missing i18n keys (duration: 00m 28s)
  • 23:42 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/MobileApp/: Roll out RESTbase usage to mobile app at 2% (duration: 00m 27s)
  • 23:38 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/MobileFrontend/: SWAT (duration: 00m 30s)
  • 23:31 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable Echo footer notice on enwiki (duration: 00m 28s)
  • 23:22 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Switch back to legacy language overlay (duration: 00m 27s)
  • 23:16 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/Echo/: Styling fixes for footer link (duration: 00m 31s)
  • 23:15 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/OATHAuth/: SWAT (duration: 00m 28s)
  • 23:13 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/VisualEditor: SWAT (duration: 00m 28s)
  • 23:12 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: extension.json for ParserFunctions (duration: 00m 27s)
  • 23:11 logmsgbot: catrope@tin Synchronized wmf-config/extension-list: extension.json for ParserFunctions (duration: 00m 27s)
  • 23:10 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: massmessage group changes for zhwiki (duration: 00m 27s)
  • 23:07 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: extension.json changes (duration: 00m 32s)
  • 23:06 logmsgbot: catrope@tin Synchronized wmf-config/extension-list: extension.json changes (duration: 00m 27s)
  • 23:03 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable subpages in Template namespace on ruwikisource (duration: 00m 31s)
  • 22:24 jzerebecki: deactivate 2fa for my account T130892 mysql:wikiadmin@silver [labswiki]> delete from oathauth_users where id=922;
  • 22:10 csteipp: deployed followup patch for T110143
  • 22:09 logmsgbot: csteipp@tin Synchronized php-1.27.0-wmf.18/extensions/SyntaxHighlight_GeSHi: (no message) (duration: 00m 29s)
  • 21:47 bblack: re-weighted eqiad upload frontend caches (even)
  • 21:34 bblack: restarting cache_misc frontends globally for https://gerrit.wikimedia.org/r/#/c/279460/
  • 21:08 logmsgbot: gehel@tin Synchronized wmf-config/LabsServices.php: T130219 Configure a single elasticsearch server for CirrusSearch beta cluster (duration: 00m 38s)
  • 21:05 gehel: deploying mediawiki-config: single elasticsearch server for CirrusSearch beta cluster (T130219)
  • 19:59 gehel: deploying latest versio of WDQS
  • 19:06 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.18
  • 18:29 elukey: camus and puppet re-enabled on analytics1027
  • 18:18 bblack: starting decom/re-role of cp3003-6,cp3011-14 (already depooled)
  • 17:57 elukey: enabled Hadoop Master Node automatic failover on analytics1001/1002 (this time without fireworks).
  • 16:47 urandom: Reactivating restbase2004.codfw.wmnet : T130254]
  • 16:43 logmsgbot: filippo@tin Synchronized wmf-config/filebackend-production.php: move codfw swift back to async replication T129089 (duration: 00m 28s)
  • 16:41 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-labs.php: T130219 CirrusSearch labs configured to use HTTPS connection pool (duration: 00m 30s)
  • 16:31 godog: force puppet run on upload caches in eqiad for https://gerrit.wikimedia.org/r/#/c/279358/
  • 16:25 bblack: depooling cp3003-6,cp3011-14 from esams text varnish-be over the next ~70 mins - prep for last steps of T125485
  • 16:22 logmsgbot: demon@tin Synchronized README: no-op, doing co-master sync (duration: 00m 38s)
  • 16:17 godog: force puppet run on upload caches in codfw/eqiad for https://gerrit.wikimedia.org/r/#/c/279357/
  • 16:15 elukey: disabled camus and puppet on analytics1027
  • 16:10 godog: force puppet run on upload caches in eqiad for https://gerrit.wikimedia.org/r/#/c/279356/
  • 15:46 bblack: starting depool/downtime/re-role process to decom cp3019-22
  • 15:43 bblack: restarted pybal on lvs300x (persistent failure to load new etcd data...)
  • 15:38 moritzm: upgrading xenon, cerium, praseodymium to Linux 4.4
  • 15:09 bblack: downtime/re-role of cp3007-10 starting - T125485
  • 14:21 bblack: restarted ganglia-monitor-aggregator instances on hooft.esams
  • 14:17 ema: puppet re-enabled on all cache nodes T124279
  • 13:29 ema: testing vcl v4 merge, puppet disabled on all cache nodes T124279
  • 13:22 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1027; pool db1078 with low weight (duration: 00m 42s)
  • 13:17 elukey: puppet re-enabled on analytics1027
  • 11:20 hashar: Restarted Jenkins, jobs got blocked notifying to IRC
  • 11:10 gehel: running puppet-merge manually on strontium to correct unmerged changes.
  • 10:18 gehel: activating SSL certificate check on elasticsearch - T130366
  • 10:07 moritzm: installing exim upgrades on non-mail hubs in production
  • 09:56 elukey: puppet disabled on analitycs1027 to disable Camus
  • 09:50 elukey: puppet disabled on analytics1001/1002 (HDFS namenodes) as pre-step before enabling automatic failover
  • 09:30 jynus: stopping slave on db2018
  • 09:18 jynus: stopping db1027 for upgrade and cloning to db1078
  • 08:57 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1044; Pool db1077; Depool db1027 (duration: 00m 31s)
  • 08:56 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Add db1077 and db1078 (duration: 00m 28s)
  • 05:05 Krinkle: mwscript deleteEqualMessages.php --wiki zhwikiquote (T45917 , P1988)
  • 05:05 Krinkle: mwscript deleteEqualMessages.php --wiki zhwiktionary
  • 05:04 Krinkle: mwscript deleteEqualMessages.php --wiki zhwiki
  • 05:03 Krinkle: mwscript deleteEqualMessages.php --wiki zhwikibooks
  • 04:58 logmsgbot: krinkle@tin Synchronized wmf-config/InitialiseSettings.php: 3f2943e (duration: 00m 26s)
  • 04:55 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings.php: 3f2943e (duration: 00m 41s)
  • 04:43 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 00m 35s)
  • 03:02 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Mar 24 03:02:09 UTC 2016 (duration 8m 52s)
  • 02:53 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 09m 59s)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.17) (duration: 11m 07s)
  • 00:47 twentyafterfour: phab upgrade/maintenance finished
  • 00:47 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.18/includes/media/DjVu.php: Idbd11637a8: Request-local caching of DjVu dimensions (duration: 00m 39s)
  • 00:13 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/Echo/modules/ooui/mw.echo.ui.FooterNoticeWidget.js: touch (duration: 00m 26s)
  • 00:06 logmsgbot: catrope@tin Finished scap: Oops, Echo had i18n changes (duration: 26m 50s)
  • 00:02 moritzm: rebooting iridium (phabricator host) for kernel upgrade

2016-03-23

  • 23:45 yuvipanda: restart nova-compute on labvirt1008
  • 23:39 logmsgbot: catrope@tin Started scap: Oops, Echo had i18n changes
  • 23:33 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Plumbing for Echo footer link settings (duration: 00m 26s)
  • 23:33 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable Echo footer link on testwiki (duration: 00m 26s)
  • 23:32 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/UploadWizard/: SWAT (duration: 00m 27s)
  • 23:31 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/extensions/Echo/: sWAT (duration: 00m 32s)
  • 23:29 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.18/includes/specials/SpecialEditWatchlist.php: Try to fix watchlist fatals (duration: 00m 25s)
  • 23:28 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.17/extensions/VisualEditor/lib/ve/src/ce/ve.ce.Surface.js: Move table paste logic out of external paste block (duration: 00m 26s)
  • 23:25 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Use extension registration for Cite (duration: 00m 28s)
  • 23:22 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Namespace config for newiktionary (duration: 00m 43s)
  • 23:21 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Import sources for newikibooks (duration: 00m 47s)
  • 23:03 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add channel for slow diff logs (duration: 00m 28s)
  • 22:17 andrewbogott: rebooting labvirt1010
  • 22:17 logmsgbot: csteipp@tin Synchronized php-1.27.0-wmf.18/extensions/OATHAuth: Deploy HTMLForm update for OATH (duration: 00m 30s)
  • 22:13 andrewbogott: rebooting labvirt1007 <- correction
  • 22:12 andrewbogott: rebooting labvirt1006
  • 22:12 csteipp: ran update_scratch_token_format.php on silver
  • 22:03 logmsgbot: csteipp@tin Synchronized php-1.27.0-wmf.18/extensions/OATHAuth: (no message) (duration: 00m 34s)
  • 21:59 andrewbogott: rebooting labvirt1006
  • 21:43 andrewbogott: rebooting labvirt1005
  • 21:41 bblack: depooling cp3043 - T125485
  • 21:38 bblack: depooling cp3042 - T125485
  • 21:20 chasemp: reboot labvirt1004
  • 20:53 chasemp: reboot labvirt1003
  • 20:40 mdholloway: found an issue with the mobileapps deployment, reverting to 85856f7
  • 20:29 mdholloway: starting mobileapps deployment
  • 20:18 chasemp: reboot labvirt1002
  • 20:14 subbu: finished deploying parsoid version 5538d868
  • 20:10 subbu: synced code. restarted parsoid on wtp1002 (~4 minutes back) as a canary
  • 20:03 subbu: starting parsoid deploy
  • 19:56 andrewbogott: rebooting labvirt1001
  • 19:38 chasemp: rebooting labvirt1011
  • 19:36 andrewbogott: rebooting labvirt1009
  • 19:27 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.18/includes/Revision.php: I77575d6d0ea: Request-local caching of revision text (duration: 00m 28s)
  • 19:25 andrewbogott: rebooting labvirt1008
  • 19:05 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.18
  • 19:01 legoktm: restarting zuul
  • 18:12 urandom: Removing compaction throughput throttling on restbase2004.codfw.wmnet : T130254
  • 18:05 urandom: Incresing compactionthroughput to 200MB/s on restbase2004.codfw.wmnet : T130254
  • 17:54 urandom: Removing old heap dumps on restbase2004.codfw.wmnet : T130254
  • 17:48 urandom: Increasing compactionthroughput to 120MB/s on restbase2004.codfw.wmnet : T130254
  • 17:36 urandom: Increasing compactionthroughput to 100MB/s on restbase2004.codfw.wmnet : T130254
  • 17:24 urandom: Starting scrub of parsoid_html on restbase2004.codfw.wmnet : T130254
  • 17:21 urandom: Disabling gossip and binary transport on restbase2004.codfw.wmnet : T130254
  • 17:18 urandom: Starting Cassandra on restbase2004.codfw.wmnet : T130254
  • 17:16 urandom: Disabling puppet on restbase2004.codfw.wmnet to override compactor concurrency : T130254
  • 17:14 urandom: Cancelling offline scrubs on restbase2004.codfw.wmnet : T130254
  • 17:02 logmsgbot: bd808@tin Synchronized wmf-config/InitialiseSettings.php: touched (duration: 00m 25s)
  • 16:57 logmsgbot: bd808@tin Synchronized wmf-config/InitialiseSettings.php: Logging: add ApiAction kafka logging (34f236c) (T108618) (duration: 00m 28s)
  • 16:56 logmsgbot: bd808@tin Synchronized wmf-config/event-schemas: Logging: add ApiAction kafka logging (34f236c) (duration: 00m 31s)
  • 16:51 logmsgbot: bd808@tin Synchronized php-1.27.0-wmf.17/includes/api/ApiMain.php: Rename ApiRequest to ApiAction (4dc12de) (duration: 00m 47s)
  • 15:49 elukey: updated puppet-compiler to 0.1.2 version (added submodule support)
  • 15:18 urandom: CORRECTION: Starting cleanups on restbase10{08,10,11}-{a,b}.eqiad.wmnet : T125842
  • 15:17 urandom: Starting cleanups on restbase10{08,12,13}-{a,b}.eqiad.wmnet : T125842
  • 14:33 godog: rolling-restart restbase after https://gerrit.wikimedia.org/r/279112
  • 12:29 godog: pool restbase1012 / restbase1013
  • 12:27 godog: halt restbase1003 / restbase1004
  • 12:17 moritzm: installing various security updates on mediawiki eqiad servers (along with HHVM restarts): graphite2, libldap, pixman, sqlite, pygments, gnutls26 (already running fine on canaries since yesterday)
  • 11:54 godog: swift eqiad-prod ms-be1020 / ms-be1021 to weight 3500
  • 10:59 jynus: stopping and restarting db1015 for upgrade and clone to db1077
  • 10:44 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1015, increase weight of db1044 (duration: 00m 25s)
  • 10:38 godog: depool restbase1003 / restbase1004 prior to deprovisioning the hardware
  • 10:29 moritzm: installing various security updates on mediawiki codfw servers (along with HHVM restarts): graphite2, libldap, pixman, sqlite, pygments, gnutls26 (already running fine on canaries since yesterday)
  • 09:21 jynus: start mysql on es2019 at es2018-bin.000044:287914983
  • 08:38 jynus: starting mysql at es2019
  • 08:37 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Add db2008 (x1) depooled, depool es2019 (duration: 00m 26s)
  • 08:30 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1044, pool db1075 (duration: 00m 25s)
  • 08:27 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Add db1075 (duration: 00m 40s)
  • 07:59 jynus: powercycling es2019 - it was down
  • 07:19 _joe_: progressively activating cross-dc replica and encryption between the jobqueue redises
  • 03:34 mutante: tin - re-arm keyholder
  • 03:26 mutante: tin - restart keyholder - re: < twentyafterfour> 1 mismatch. I guess that keyholder-proxy needs to be restarted on tin
  • 03:12 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Mar 23 03:12:07 UTC 2016 (duration 9m 20s)
  • 03:02 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 17m 31s)
  • 02:28 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.17) (duration: 11m 18s)

2016-03-22

  • 23:47 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.18/extensions/WikimediaEvents/: https://gerrit.wikimedia.org/r/#/c/279066/ (duration: 00m 27s)
  • 23:46 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.17/extensions/WikimediaEvents/: https://gerrit.wikimedia.org/r/#/c/279066/ (duration: 00m 28s)
  • 23:40 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: SWAT x 3 (duration: 00m 26s)
  • 23:35 logmsgbot: maxsem@tin Synchronized portals: (no message) (duration: 00m 25s)
  • 23:35 logmsgbot: maxsem@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 26s)
  • 23:26 MaxSem: ran echo 'https://test.wikidata.org/static/favicon/testwikidata.ico' | mwscript purgeList.php
  • 23:23 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/278334/ (duration: 00m 27s)
  • 23:23 logmsgbot: maxsem@tin Synchronized static/images/project-logos: https://gerrit.wikimedia.org/r/#/c/278334/ (duration: 00m 26s)
  • 23:20 logmsgbot: maxsem@tin Synchronized static/favicon/testwikidata.ico: https://gerrit.wikimedia.org/r/#/c/278321/ (duration: 00m 31s)
  • 23:15 mutante: mira - restarting and re-arming keyholder
  • 23:14 mutante: tin - restarting and re-arming keyholder
  • 23:04 mutante: re-arm keyholder on tin for new phab key
  • 23:02 mutante: mira re-arm keyholder for new phab key
  • 22:56 logmsgbot: aaron@tin Synchronized wmf-config: Lower "max lag" and $wgAPIMaxLagThreshold to 8/6 (duration: 00m 29s)
  • 22:52 logmsgbot: aaron@tin Synchronized wmf-config/jobqueue-eqiad.php: Bump timeout to 300ms (duration: 00m 29s)
  • 22:50 logmsgbot: aaron@tin Synchronized wmf-config/filebackend-production.php: comment tweak (duration: 00m 35s)
  • 22:50 mutante: replacing phab deploy key with a new one
  • 21:41 hoo: Ran sync-common on mw1139 (which missed two deploys)
  • 21:28 bd808: Restarted logstash process on logstash1002; dead from OOM since 2016-03-18T11:47:12
  • 21:27 logmsgbot: hoo@tin Synchronized php-1.27.0-wmf.18/extensions/Wikidata: Update Wikibase: Fix add qualifier link not getting disabled (duration: 03m 43s)
  • 21:24 ejegg: update payments from 79f5c9389edd089ae5951a7d172e74e68946a93c to 5a1996bc21fe694b99556bc1b501e484075dabe2
  • 21:23 logmsgbot: hoo@tin Synchronized php-1.27.0-wmf.17/extensions/Wikidata: Update Wikibase: Fix add qualifier link not getting disabled (duration: 07m 30s)
  • 21:01 mutante: mira - arm keyholder
  • 20:59 mutante: tin - arm keyholder with deployment keys
  • 20:49 urandom: Stopping restbase2004.codfw.wmnet for offline sstablescrub : T130254
  • 20:46 urandom: decommissioning restbase1004-a.eqiad.wmnet : T125842
  • 20:45 urandom: Rolling restart of RESTBase Cassandra cluster complete : T130393, T128787
  • 20:25 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.18
  • 20:22 gehel: deployed latest version of WDQS
  • 20:18 logmsgbot: thcipriani@tin Finished scap: testwiki to php-1.27.0-wmf.18 and rebuild l10n cache (duration: 46m 09s)
  • 19:53 urandom: Rolling restart of RESTBase Cassandra cluster : T130393, T128787
  • 19:32 logmsgbot: thcipriani@tin Started scap: testwiki to php-1.27.0-wmf.18 and rebuild l10n cache
  • 19:00 mutante: mw1125 - powercycle
  • 18:41 urandom: Aborting restart of Cassandra on maps cluster : T130393, T128787
  • 18:39 urandom: Rolling restart of Cassandra on maps cluster : T130393, T128787
  • 18:39 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Increase weight of db1024, db1074 and db1076 (duration: 01m 34s)
  • 17:33 mdholloway: mobileapps deployment complete, deployed 85856f7
  • 17:33 thcipriani: starting wmf.18 branching
  • 17:33 ejegg: rolled payments back to 79f5c9389edd089ae5951a7d172e74e68946a93c
  • 17:27 ejegg: updated payments from from 79f5c9389edd089ae5951a7d172e74e68946a93c to 62365063548836618b93b12a3bcbe65781a62a94
  • 17:15 ejegg: rolled payments back to 79f5c9389edd089ae5951a7d172e74e68946a93c
  • 17:11 ejegg: updated payments from from 79f5c9389edd089ae5951a7d172e74e68946a93c to 62365063548836618b93b12a3bcbe65781a62a94
  • 17:05 mdholloway: starting mobileapps deployment
  • 17:03 urandom: Rolling restart of AQS Cassandra cluster complete : : T130393, T128787
  • 16:54 urandom: Performing rolling restart of AQS Cassandra cluster : T130393, T128787
  • 16:49 jynus: stopping db1044 mysql to clone to db1075
  • 16:37 elukey: re-added rdb1001 (Job Queue master) to the Jobrunners' config. Forcing puppet agent and restarting jobchron on all jobrunners/videoscalers in eqiad.
  • 16:33 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1024; pool db1074 and db1076; depool db1044 (duration: 00m 28s)
  • 16:32 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Add db1074 and db1076 eqiad databases (duration: 00m 31s)
  • 16:21 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: REVERT - Remove rdb1001 from the Redis Job Queues for maintenance (duration: 00m 25s)
  • 16:21 urandom: Restarting Cassandra on restbase1007.eqiad.wmnet (canary) : T130393, T128787
  • 15:56 dcausse: elasticsearch: creating wikimania2017wiki indices in codfw
  • 15:38 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable signature button at NS:102 for frwiki gerrit:272479 (duration: 00m 30s)
  • 15:34 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Configuring wgMetaNamespace for an.wiktionary gerrit:278876 (duration: 00m 29s)
  • 15:28 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.17/extensions/ContentTranslation/modules/tools/ext.cx.tools.mt.js: SWAT: Fix JS error in MT tool: MTControlCard.providers undefined gerrit:278843 (duration: 00m 25s)
  • 14:56 bblack: deploying VCL changes for do_stream on all pass traffic
  • 14:37 elukey: puppet disabled on rdb1001/rdb1002 (redis master slave) as part of the rdb1001 re-image. rdb1002 set with SLAVEOF NO ONE as precaution.
  • 14:09 elukey: removed rdb1001 from the JobRunner config (hieradata/eqiad/mediawiki/jobrunner). Forcing also a puppet run and a jobchron restart on all the Job Runners and VideoScalers in eqiad.
  • 14:03 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: Remove rdb1001 from the Redis Job Queues for maintenance (duration: 00m 25s)
  • 13:30 moritzm: installing various security updates on mediawiki canary servers (along with HHVM restarts): graphite2, libldap, pixman, sqlite, pygments, gnutls26
  • 12:14 godog: nodetool decommission restbase1003 T125842
  • 10:35 hoo: Updated WikibaseQualityConstraints data on wikidata (wikidatawiki.wbqc_constraints)
  • 10:20 moritzm: rolling reboot of ocg* for kernel upgrades
  • 09:59 jynus: stopping and cloning db1024 to db1074 and db1076
  • 09:49 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1024 for maintenance (duration: 01m 15s)
  • 09:46 moritzm: rolling reboot of sca* for kernel upgrades
  • 09:14 moritzm: rebooting tin for kernel upgrade
  • 08:46 moritzm: installing squid/jessie security updates
  • 07:59 moritzm: installing git security updates
  • 07:56 moritzm: restarted cassandra on restbase2004, it ran out of heap memory
  • 07:51 moritzm: restarting hhvm on mw1119, mw1121, mw1136 and mw1140, all got stuck over night
  • 06:54 _joe_: banning all pages with content-length of 25 from the caches, T130575
  • 06:04 _joe_: restarted hhvm on mw1122, memory leak

2016-03-21

  • 19:58 akosiaris: powercycle mw1142, console available but not ever prompting for the root password, stuck at username
  • 15:44 logmsgbot: hoo@tin Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch on Wikidata after Property conversions (duration: 00m 28s)
  • 15:13 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.17/includes/upload/UploadBase.php: SWAT: UploadBase: Set mFileSize, if given, even if mTempPath is unknown gerrit:278724 (duration: 00m 30s)
  • 14:26 ottomata: altering kafka topics webrequest_text and webrequest_upload, increasing each from 12 partitions to 24 partitions
  • 14:23 jynus: restarting labsdb1001 mysql
  • 14:05 mobrovac: restbase deploy end of 26f9e90
  • 13:33 mobrovac: restbase deploy start of 26f9e90 on canary restbase1003
  • 10:52 hashar: Live hacked puppet compiler on compiler02.puppet3-diffs.eqiad.wmflabs to debug it not processing submodules. Reinstalled it from the last tag in the process
  • 10:24 jynus: Altering user_properties engine to InnoDB on db1069:3313
  • 09:26 jynus: Altering change_tag engine to InnoDB on db1069:3313
  • 09:09 elukey: restarted hhvm on mw1116
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Mar 21 02:33:31 UTC 2016 (duration 8m 41s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.17) (duration: 10m 57s)

2016-03-20

  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Mar 20 02:32:04 UTC 2016 (duration 8m 30s)
  • 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.17) (duration: 10m 22s)

2016-03-19

  • 22:28 jynus: powercycling oxygen, looks kernel-dead
  • 22:16 urandom: removing 22G of heap dumps from restbase2004.codfw.wmnet
  • 22:16 urandom: removing 22G of heap dumps
  • 22:07 urandom: clearing snapshots on restbase2004.codfw.wmnet
  • 15:43 logmsgbot: reedy@tin Synchronized wmf-config/throttle.php: Throttle rules for event T130447 (duration: 00m 26s)
  • 11:38 godog: restart slapd on seaborgium, oom-killed
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Mar 19 02:31:46 UTC 2016 (duration 8m 31s)
  • 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.17) (duration: 10m 07s)
  • 01:54 urandom: bootstrapping restbase1013-b.eqiad.wmnet : T125842

2016-03-18

  • 23:35 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.17/extensions/WikimediaEvents/modules/ext.wikimediaEvents.deprecate.js: (no message) (duration: 00m 35s)
  • 21:11 ostriches: cleaned up stale /srv/mediawiki/php-1.27.0-wmf.{10,11} from the apaches.
  • 21:09 logmsgbot: krinkle@tin Synchronized wmf-config/missing.php: (no message) (duration: 00m 25s)
  • 20:53 ottomata: reenabling puppet on krypton
  • 19:53 ottomata: temporarily disabling puppet on krypton
  • 19:21 ori: rebooting bohrium
  • 19:20 ori: upgraded bohrium VM: vcpus 2 => 8, ram 4 => 8g
  • 19:06 logmsgbot: ori@tin Synchronized wmf-config/logging.php: Iabca8858e: Allow finer-grained control over debug logging via XWD (duration: 00m 32s)
  • 18:56 logmsgbot: demon@tin Synchronized .arclint: no op really, co master sync (duration: 00m 39s)
  • 18:08 gehel: restarting elasticsearch server elastic1031.eqiad.wmnet
  • 17:59 mutante: netmon1001: failed torrus service - recovery steps as outlined on wikitech Torrus
  • 17:55 ori: on bohrium: /etc/apache2/sites-enabled/.links2 ; was causing puppet to refresh apache2 on each run
  • 17:30 gehel: restarting elasticsearch server elastic1030.eqiad.wmnet
  • 17:05 gehel: restarting elasticsearch server elastic1029.eqiad.wmnet
  • 16:53 jynus: starting enwiki import to labs from dbstore1002 (expect lag and consistency problems during the hot import)
  • 16:37 moritzm: restarted hhvm on mw1205
  • 16:30 moritzm: bumped connection tracking table size on mw1161-mw1169 to 524288 to cope with currently elevated connections on those (T130364)
  • 16:19 godog: reboot ms-be2010 to pick up new disk ordering
  • 15:23 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: REVERT - Re-enabled persistence between Job Queues and Job Runners. (duration: 00m 19s)
  • 15:03 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: Re-enabled persistence between Job Queues and Job Runners. (duration: 00m 30s)
  • 15:02 godog: bootstrap restbase1013-a
  • 14:36 gehel: restarting elasticsearch server elastic1028.eqiad.wmnet
  • 14:02 elukey: restarted eventlog1001.eqiad.wmnet and eventlog2001.codfw.wmnet for kernel upgrade
  • 13:43 gehel: restarting elasticsearch server elastic1027.eqiad.wmnet
  • 13:24 gehel: restarting pybal on lvs2003.codfw.wmnet
  • 13:22 gehel: enabling all nodes for service search.svc.codfw.wmnet:9243 (elastic-https) on codfw
  • 13:22 gehel: restarting pybal on lvs2006.codfw.wmnet
  • 13:06 gehel: restarting elasticsearch server elastic1026.eqiad.wmnet
  • 12:43 gehel: restarting elasticsearch server elastic1025.eqiad.wmnet
  • 12:35 godog: finished ms-fe1* rolling reboot
  • 12:15 godog: finished ms-be1* rolling reboot
  • 12:00 elukey: Forcing puppet agent run on all the Jobrunners and videoscalers since rdb1005 is now back in service. Will also restart jobchron as well.
  • 11:58 elukey: Added rdb1005 back to the jobrunners puppet config after maintenance.
  • 11:57 gehel: restarting elasticsearch server elastic1024.eqiad.wmnet
  • 11:46 gehel: restarting pybal on lvs1003
  • 11:43 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: Add rdb1005 back to the Redis Job Queues after maintenance (duration: 01m 22s)
  • 11:23 moritzm: powercycled mw1163, hung on reboot and serial console stuck
  • 11:05 moritzm: rolling reboot of mw1161 to mw1169 for kernel upgrade
  • 11:04 gehel: restarting pybal on lvs1012
  • 11:04 gehel: restarting pybal on lvs1009
  • 10:58 gehel: activating elasticsearch-ssl service on LVS / eqiad
  • 10:51 gehel: restarting pybal on lvs1006
  • 10:48 jynus: dbstore2002 just crashed
  • 10:34 godog: reboot ms-fe1003 for kernel upgrade
  • 10:33 akosiaris: gehel: restarting pybal on lvs1006
  • 10:27 gehel: activating elasticsearch HTTPS on LVS for eqiad - https://gerrit.wikimedia.org/r/#/c/277956/
  • 10:06 moritzm: rolling reboot of swift backend servers in codfw for kernel upgrade
  • 09:46 godog: rolling-reboot ms-be1* for kernel updates
  • 09:37 elukey: forcing puppet agent and restarting jobchron on all the Job Runners and VideoScalers as rdb1005 has been removed from the configs.
  • 09:32 elukey: removed rdb1005 from the Job Runners config for maintenance
  • 09:24 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: Remove rdb1005 from the Redis Job Queues for maintenance (duration: 01m 07s)
  • 09:19 moritzm: rolling reboot of swift frontend servers in codfw for kernel upgrade
  • 09:08 godog: Issuing nodetool scrub -s -- local_group_wikipedia_T_parsoid_html data on restbase2004.eqiad.wmnet : T130254
  • 09:01 moritzm: rolling reboot of mw1001 to mw1016 for kernel upgrade
  • 08:22 _joe_: started cassandra on restbase2004
  • 08:11 moritzm: rearmed keyholder on mira
  • 08:06 moritzm: rebooting mira for kernel update
  • 07:42 jynus: restarting dbstore2002 to apply new mysql config
  • 06:29 gehel: restarting elasticsearch server elastic1023.eqiad.wmnet
  • 02:38 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Mar 18 02:38:00 UTC 2016 (duration 8m 40s)
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.17) (duration: 09m 58s)
  • 01:31 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.17/extensions/CirrusSearch/: Emergency fix https://gerrit.wikimedia.org/r/#/c/278224/ (duration: 00m 36s)
  • 00:38 ebernhardson: reboot elastic1022.eqiad.wmnet for kernel upgrade
  • 00:09 ebernhardson: reboot elastic1021.eqiad.wmnet for kernel upgrade
  • 00:09 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/276934/ (duration: 00m 26s)
  • 00:02 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/276410/ (duration: 00m 26s)
  • 00:01 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/278206/ (duration: 00m 25s)

2016-03-17

  • 23:57 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/276917 (duration: 00m 26s)
  • 23:53 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/278021/ (duration: 00m 26s)
  • 23:46 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/278029/ (duration: 00m 28s)
  • 23:42 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.17/includes/specials/SpecialUploadStash.php: https://gerrit.wikimedia.org/r/#/c/278190/ (duration: 00m 27s)
  • 23:39 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/278198/ (duration: 00m 37s)
  • 23:25 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/278032/ (duration: 00m 32s)
  • 23:21 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/278054/ (duration: 00m 47s)
  • 22:52 logmsgbot: bd808@tin Synchronized php-1.27.0-wmf.17/includes/api/ApiMain.php: Cast API timeSpentBackend to an int (duration: 00m 25s)
  • 22:47 gehel: resetting cluster.routing.allocation.disk.watermark.high to 90% on eqiad elasticsearch cluster - shards have moved round, cluster is mostly balanced
  • 22:42 logmsgbot: bd808@tin Synchronized wmf-config/InitialiseSettings.php: Disable ApiRequest properly (T108618) (duration: 00m 27s)
  • 22:37 gehel: restarting elasticsearch server elastic1020.eqiad.wmnet
  • 22:17 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.17/includes/specials/SpecialUploadStash.php: Revert: Debug code for T130204 (it worked!) (duration: 00m 25s)
  • 22:14 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.17/includes/specials/SpecialUploadStash.php: Debug code for T130204 (duration: 00m 28s)
  • 22:12 logmsgbot: bd808@tin Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 28s)
  • 22:10 logmsgbot: bd808@tin Synchronized wmf-config/InitialiseSettings.php: Disable ApiRequest kafka logging (T108618) (duration: 00m 31s)
  • 22:05 logmsgbot: bd808@tin Synchronized wmf-config/InitialiseSettings.php: Add ApiRequest kafka logging (T108618) (duration: 00m 34s)
  • 22:04 logmsgbot: bd808@tin Synchronized wmf-config/event-schemas: Add ApiRequest kafka logging (T108618) (duration: 00m 38s)
  • 21:47 bd808: deleted /var/lib/l10nupdate/caches/cache-1.27.0-wmf.1[345] on tin. Freed ~4G of disk
  • 21:46 gehel: reducing cluster.routing.allocation.disk.watermark.high to 70% on eqiad elasticsearch cluster
  • 21:07 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all to 1.27.0-wmf.17
  • 20:51 logmsgbot: twentyafterfour@tin Finished scap: sync php-1.27.0-wmf.17 (duration: 26m 57s)
  • 20:40 urandom: Issuing `nodetool scrub -s -- local_group_wikipedia_T_parsoid_html data` on restbase2004.eqiad.wmnet : T130254
  • 20:24 logmsgbot: twentyafterfour@tin Started scap: sync php-1.27.0-wmf.17
  • 20:14 urandom: Starting Cassandra on restbase2004.codfw.wmnet (OOM. again.)
  • 20:13 urandom: Restarting Cassandra on restbase1007-a.eqiad.wmnet (compaction seems stalled)
  • 20:00 gehel: restarting elasticsearch server elastic1019.eqiad.wmnet
  • 19:41 gehel: restarting elasticsearch server elastic1018.eqiad.wmnet
  • 18:37 cscott: updated OCG to version c1a8232594fe846bd2374efd8f7c20d7e97ac449
  • 18:28 moritzm: rebooting labnet1001 for kernel update
  • 18:23 moritzm: rebooting labcontrol1002 for kernel update
  • 18:03 moritzm: rebooting silver for kernel update
  • 18:00 mobrovac: restbase restarted cassandra on restbase2004, OOM
  • 17:57 gehel: restarting elasticsearch server elastic1017.eqiad.wmnet
  • 17:10 urandom: rolling restart of restbase production complete
  • 16:51 urandom: rolling restart of restbase production to apply https://gerrit.wikimedia.org/r/#/c/277112/ and https://gerrit.wikimedia.org/r/#/c/277836/
  • 16:45 urandom: performing restart of restbase1003.eqiad.wmnet (canary) to apply https://gerrit.wikimedia.org/r/#/c/277112/ and https://gerrit.wikimedia.org/r/#/c/277836/
  • 16:41 urandom: rolling restart of restbase staging complete
  • 16:41 gehel: restarting elasticsearch server elastic1016.eqiad.wmnet
  • 16:35 urandom: performing rolling restart of restbase staging to apply https://gerrit.wikimedia.org/r/#/c/277112/ and https://gerrit.wikimedia.org/r/#/c/277836/
  • 16:27 urandom: restarting restbase on xenon.eqiad.wmnet (canary), to apply https://gerrit.wikimedia.org/r/#/c/277112/ and https://gerrit.wikimedia.org/r/#/c/277836/
  • 16:27 akosiaris: do a for i in mathoid citoid graphoid cxserver; do sudo confctl --tags dc=eqiad,cluster=sca,service=$i --action delete sca1002.eqiad.wmnet ; done. Same for sca1001.eqiad.wmnet
  • 16:18 urandom: bootstrapping restbase1012-b.eqiad.wmnet : T125842
  • 16:13 _joe_: load testing done
  • 15:58 _joe_: performing a load test on mw1239
  • 15:44 gehel: restarting elasticsearch server elastic1015.eqiad.wmnet
  • 15:32 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Use wmfLocalServices for wgUploadThumbnailRenderHttpCustomDomain PART II gerrit:277786 (duration: 00m 25s)
  • 15:32 logmsgbot: thcipriani@tin Synchronized wmf-config/ProductionServices.php: SWAT: Use wmfLocalServices for wgUploadThumbnailRenderHttpCustomDomain PART I gerrit:277786 (duration: 00m 25s)
  • 15:31 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Use wmfLocalServices for wgUploadThumbnailRenderHttpCustomDomain PART II gerrit:277786 (duration: 00m 25s)
  • 15:30 logmsgbot: thcipriani@tin Synchronized wmf-config/ProductionServices.php: SWAT: Use wmfLocalServices for wgUploadThumbnailRenderHttpCustomDomain PART I gerrit:277786 (duration: 00m 26s)
  • 15:22 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Portal namespace to ne.wikipedia gerrit:278009 (duration: 00m 28s)
  • 15:15 logmsgbot: thcipriani@tin Synchronized portals: SWAT: Bumping portals to master gerrit:277964 (duration: 00m 29s)
  • 15:14 logmsgbot: thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: SWAT: Bumping portals to master gerrit:277964 (duration: 00m 28s)
  • 15:09 logmsgbot: thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT: Enable ICU Folding on greek wikipedia PART II gerrit:277477 (duration: 00m 31s)
  • 15:08 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ICU Folding on greek wikipedia PART I gerrit:277477 (duration: 00m 30s)
  • 15:07 gehel: restarting elasticsearch server elastic1014.eqiad.wmnet
  • 14:27 gehel: restarting elasticsearch server elastic1013.eqiad.wmnet
  • 14:09 moritzm: rolling reboot of parsoid servers in eqiad for kernel upgrade
  • 13:42 mobrovac: restbase resrting cassandra on restbase2004 - OOM again
  • 13:36 gehel: restarting elasticsearch server elastic1012.eqiad.wmnet
  • 13:02 gehel: restarting elasticsearch server elastic1011.eqiad.wmnet
  • 13:00 moritzm: rolling reboot of parsoid servers in codfw for kernel upgrade
  • 12:38 _joe_: switching all services back to eqiad
  • 12:25 gehel: restarting elasticsearch server elastic1010.eqiad.wmnet
  • 12:19 akosiaris: restart pybal on lvs2001
  • 12:16 akosiaris: restart pybal on lvs2002
  • 12:05 _joe_: switching cxserver to codfw
  • 11:44 moritzm: rebooting acamar for kernel upgrade
  • 11:41 gehel: restarting elasticsearch server elastic1009.eqiad.wmnet
  • 11:33 _joe_: switching citoid to use codfw as well
  • 11:31 moritzm: rebooting install2001 for kernel upgrade
  • 11:11 logmsgbot: oblivian@tin Synchronized wmf-config/ProductionServices.php: switching mediawiki to use restbase in eqiad again (duration: 00m 32s)
  • 11:04 mobrovac: restbase restarted cassandra-a on restbase1007
  • 11:02 gehel: restarting elasticsearch server elastic1008.eqiad.wmnet
  • 11:00 mobrovac: restbase restarted cassandra on restbase2004 - OOM
  • 10:45 moritzm: rebooting francium for kernel upgrade
  • 10:28 logmsgbot: oblivian@tin Synchronized wmf-config/ProductionServices.php: switching mediawiki to use restbase in codfw (duration: 00m 32s)
  • 10:20 moritzm: rebooting osmium for kernel upgrade
  • 10:14 _joe_: external traffic is now flowing through restbase in codfw
  • 10:08 _joe_: running puppet across eqiad varnishes to switch traffic of restbase eqiad => codfw
  • 09:34 volans: Copying data from db2009 to db2008 T130098
  • 09:02 jynus: reimporting missing rows from production to labs (expect some lag during the day)
  • 08:59 gehel: restarting elasticsearch server elastic1007.eqiad.wmnet
  • 08:44 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/jobs/RefreshLinksJob.php: Revert: Job queue bankruptcy: force all refreshlinks jobs to be non-recursive (duration: 00m 29s)
  • 08:44 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.17/includes/jobqueue/jobs/RefreshLinksJob.php: Revert: Job queue bankruptcy: force all refreshlinks jobs to be non-recursive (duration: 00m 39s)
  • 08:26 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.17/includes/jobqueue/jobs/RefreshLinksJob.php: Job queue bankruptcy: force all refreshlinks jobs to be non-recursive (duration: 00m 25s)
  • 08:25 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/jobs/RefreshLinksJob.php: Job queue bankruptcy: force all refreshlinks jobs to be non-recursive (duration: 00m 25s)
  • 07:48 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/jobs/RefreshLinksJob.php: Job queue bankruptcy: force all refreshlinks jobs to be non-recursive (duration: 00m 24s)
  • 07:06 ori: Short-circuiting RefreshLinksJob::run() to bail if the root job timestamp is older than March 1st
  • 06:46 ori: Powercycled elastic1006; unresponsive.
  • 06:11 ebernhardson: rebooting elastic1006.eqiad.wmnet for kernel update
  • 03:23 eileen: Updated CiviCRM from c99abc42188b7f47726202b4740fbaf68d1c06ab to 18431eec74de251066deb77d5b63607e06f6f135
  • 03:22 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Mar 17 03:22:55 UTC 2016 (duration 9m 33s)
  • 03:14 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/277452 (labs only change, just keeping file in sync) (duration: 00m 27s)
  • 03:13 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.17) (duration: 17m 44s)
  • 02:38 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 17m 36s)
  • 00:58 ebernhardson: rebooting elastic1005.eqiad.wmnet for kernel update
  • 00:05 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable Flow by default in all talk namespaces on gomwiki (duration: 00m 28s)

2016-03-16

  • 23:56 ebernhardson: rebooting elastic1004.eqiad.wmnet for kernel update
  • 23:47 logmsgbot: catrope@tin Finished scap: bswiki namespace changes; MobileFrontend SWAT patches (duration: 05m 04s)
  • 23:42 logmsgbot: catrope@tin Started scap: bswiki namespace changes; MobileFrontend SWAT patches
  • 23:15 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Namespace config changes for bswiki (duration: 00m 44s)
  • 22:09 ebernhardson: rebooting elastic1003.eqiad.wmnet for linux kernel upgrade
  • 21:47 csteipp: deployed patch for T123071
  • 21:18 gehel: restarting elasticsearch server elastic1003.eqiad.wmnet
  • 20:06 mutante: mira - git pull in mw-staging
  • 19:58 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.17
  • 19:57 gehel: regenerating puppet SSL certificates for elasticsearch eqiad cluster
  • 19:54 mutante: rcs1001 - starting redis, disabling puppet (T130147)
  • 19:42 mutante: rcs1002 - starting redis-server
  • 19:42 mutante: rcs1001 - starting redis-server
  • 19:32 twentyafterfour: Deploying 1.27.0-wmf.17 to group1 wikis
  • 19:31 godog: restart pybal on lvs1005 T130143
  • 19:21 ejegg: rolled back CiviCRM to c99abc42188b7f47726202b4740fbaf68d1c06ab
  • 19:18 legoktm: created urlshortcodes table on wikishared db for UrlShortener
  • 19:16 logmsgbot: legoktm@tin Synchronized wmf-config/CommonSettings.php: Configure UrlShortener extension in read-only mode (2/2) (duration: 00m 26s)
  • 19:15 logmsgbot: legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Configure UrlShortener extension in read-only mode (1/2) (duration: 00m 28s)
  • 19:14 logmsgbot: aaron@tin Synchronized php-1.27.0-wmf.17/includes/jobqueue/Job.php: 3da38ce (duration: 00m 37s)
  • 19:12 logmsgbot: legoktm@tin Finished scap: Building l10n cache for UrlShortener - T108557 (try #2) (duration: 63m 05s)
  • 19:11 gehel: regenerating puppet SSL certificates for elasticsearch codfw cluster
  • 19:08 gehel: restarting elasticsearch server elastic1002.eqiad.wmnet
  • 19:02 volans: restarted mysql on dbstore2002 and all replicas except x1, still investigating T130128
  • 19:01 godog: restart pybal on lvs1011 T130143
  • 18:52 ejegg: updated CiviCRM from c99abc42188b7f47726202b4740fbaf68d1c06ab to 9e03bcc4d0e6e80aedefedbbf1eed608b7e9d38d
  • 18:51 ejegg: disabled thank you mail sender
  • 18:20 mutante: repooling rcs1001
  • 18:18 gehel: restarting elasticsearch server elastic1001.eqiad.wmnet
  • 18:15 mutante: rebooting rcs1001
  • 18:10 mutante: depool rcs1001
  • 18:09 logmsgbot: legoktm@tin Started scap: Building l10n cache for UrlShortener - T108557 (try #2)
  • 18:09 legoktm: deleting /var/lib/l10nupdate/caches/cache-1.27.0-wmf.12 on tin to free up some space
  • 18:05 mutante: repooling rcs1002
  • 17:59 mutante: rcs1002 - traffic graph flat in ganglia, reboot
  • 17:57 mutante: rcs1002 - the last message was about 1002
  • 17:56 mutante: rcs1002 - depool from rcstream service
  • 17:55 logmsgbot: legoktm@tin scap aborted: Building l10n cache for UrlShortener - T108557 (duration: 02m 51s)
  • 17:52 logmsgbot: legoktm@tin Started scap: Building l10n cache for UrlShortener - T108557
  • 17:43 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Enable completion suggester as default on ja, zh, pl, ar and nlwiki (duration: 00m 29s)
  • 17:28 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Enable completion suggester as default on ru, fr, pt and itwiki (duration: 00m 28s)
  • 17:10 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Enable completion suggester as default on eswiki (duration: 00m 25s)
  • 17:09 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.17/extensions/Kartographer/extension.json: (no message) (duration: 00m 25s)
  • 16:46 gehel: restarting elasticsearch server elastic2024.codfw.wmnet
  • 16:36 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Enable completion suggester as default on enwiki (duration: 00m 30s)
  • 16:32 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 00m 31s)
  • 16:11 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Enable completion suggester as default on dewiki (duration: 00m 44s)
  • 16:11 moritzm: upgraded exim packages on magnesium to the version from USN 2933, without it add_environment can't be set exim failed to start
  • 16:06 moritzm: upgraded remaining exim binary packages on krypton to 4.84-8+deb8u2 (only exim4-daemon-heavy was at the version, resulting in the new add_environment variable being unknown and exim failing to start)
  • 16:02 gehel: restarting elasticsearch server elastic2023.codfw.wmnet
  • 15:25 moritzm: rebooting graphite1001 for kernel upgrade
  • 15:16 moritzm: rebooting graphite2001 for kernel upgrade
  • 15:09 volans: restarting mysql on dbstore2002 Aria engine crashed
  • 15:08 moritzm: rebooting lithium for kernel upgrade
  • 15:07 logmsgbot: jzerebecki@tin Synchronized wmf-config/InitialiseSettings.php: Whitelist feeds on mediawiki.org 622c186b2712c923fbcc48c27f65ebf396176f3e T127176 (duration: 00m 31s)
  • 14:52 volans: Reimporting table bnwikisource.shorturls into dbstore2002 as InnoDB
  • 14:51 gehel: restarting elasticsearch server elastic2022.codfw.wmnet
  • 14:36 volans: REPAIR NO_WRITE_TO_BINLOG TABLE bnwikisource.shorturls replica failed for corrupted table
  • 14:18 papaul: labstore200[3-4}] - signing puppet certs, salt-key, initial run
  • 14:06 papaul: wasat signing puppet certs, salt-key, initial run
  • 14:04 gehel: restarting elasticsearch server elastic2021.codfw.wmnet
  • 14:03 chasemp: cleanup snapshots on labstore1001
  • 14:02 volans: Reimaging db2008 to jessie T130098
  • 13:23 gehel: restarting elasticsearch server elastic2020.codfw.wmnet
  • 13:15 elukey: puppet re-enabled on analytics1027
  • 12:39 gehel: restarting elasticsearch server elastic2019.codfw.wmnet
  • 12:36 godog: bootstrapping restbase1012-a T125842
  • 11:44 elukey: rebooting analytics1001/1002 (Yarn/HDFS master nodes) for kernel upgrade
  • 11:24 elukey: rebooting analytics105* for kernel upgrade
  • 11:02 volans: Revoking puppet key for db2008.codfw.wmnet (puppet wasn't running already) T130098
  • 10:51 elukey: rebooting analytics102[89] and analtics104* for kernel upgrades
  • 10:51 moritzm: rolling reboot of mw* servers in eqiad (except job runners to not interfere with T129517)
  • 10:21 elukey: Rebooting analytics103* for kernel upgrade
  • 09:33 elukey: puppet disabled on analytics1027 as preparation step for the Hadoop nodes reboots
  • 09:18 jynus: setting up pending cross-datacenter master-master database links
  • 09:04 moritzm: rebooting stat1002/stat1003 for kernel upgrade
  • 08:54 _joe_: restarting hhvm on mw1007, stuck in a deadlock, apparently on HPHP::Treadmill::getAgeOldestRequest
  • 08:53 moritzm: rebooting labtest*2001 for kernel upgrade
  • 08:45 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Return s2 slaves to normal weight (duration: 00m 35s)
  • 08:04 moritzm: restarted mw1129/1002 (HHVM hung)
  • 06:47 ebernhardson: rebooting elastic2018.codfw.wmnet for kernel upgrade
  • 06:10 ebernhardson: rebooting elastic2017.codfw.wmnet for kernel upgrade
  • 05:39 ebernhardson: rebooting elastic2016.codfw.wmnet for kernel upgrade
  • 04:42 ebernhardson: rebooting elastic2015.codfw.wmnet for kernel upgrade
  • 03:41 ebernhardson: rebooting elastic2014.codfw.wmnet for kernel upgrade
  • 03:19 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Mar 16 03:19:26 UTC 2016 (duration 9m 41s)
  • 03:09 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.17) (duration: 17m 36s)
  • 02:45 ebernhardson: rebooting elastic2013.codfw.wmnet for kernel upgrade
  • 02:34 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 13m 44s)
  • 02:17 ebernhardson: rebooting elastic2012.codfw.wmnet for kernel upgrade
  • 01:32 ebernhardson: rebooting elastic2011.codfw.wmnet for kernel upgrade
  • 01:18 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: Debug (duration: 00m 26s)
  • 00:57 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: Debug (duration: 00m 24s)
  • 00:49 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/utils/BacklinkJobUtils.php: I32ec0 (duration: 00m 30s)
  • 00:46 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: Debug (duration: 00m 28s)
  • 00:17 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: Log REQUEST_URI (duration: 00m 28s)
  • 00:15 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/extensions/Wikidata/extensions/Wikibase/client/includes/Hooks/DataUpdateHookHandlers.php: Revert briefly disable Wikibase\Client\Hooks\DataUpdateHookHandlers::onParserCacheSaveComplete (duration: 00m 31s)
  • 00:09 eileen: Updating civicrm from 097bc2d298b93424ff07abb18eaf1cfb3f79a0f2 to c99abc42188b7f47726202b4740fbaf68d1c06ab
  • 00:01 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/extensions/Wikidata/extensions/Wikibase/client/includes/Hooks/DataUpdateHookHandlers.php: Briefly disable Wikibase\Client\Hooks\DataUpdateHookHandlers::onParserCacheSaveComplete (duration: 00m 26s)

2016-03-15

  • 23:43 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: (no message) (duration: 00m 24s)
  • 23:40 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: AdHocDebug for JobQueueGroup->lazyPush() (exclude job runners) (duration: 00m 28s)
  • 23:39 awight: update crm from 090d443c856574d45c80f89c4ae7ccb86c97448f to 097bc2d298b93424ff07abb18eaf1cfb3f79a0f2
  • 23:37 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: AdHocDebug for JobQueueGroup->lazyPush() (with ip) (duration: 00m 30s)
  • 23:33 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: AdHocDebug for JobQueueGroup->lazyPush() (duration: 00m 28s)
  • 23:12 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Strip rather than hide HTML in MobileFrontend (duration: 00m 29s)
  • 23:10 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: WikidataPageBanner config changes (duration: 00m 33s)
  • 22:40 logmsgbot: ori@tin Synchronized wmf-config/InitialiseSettings.php: I87c174cf83: Turn on RecursiveLinkPurge log bucket, for I29636c045 (duration: 00m 25s)
  • 22:32 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.16/extensions/VisualEditor/extension.json: T129704 (duration: 00m 24s)
  • 22:01 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/api/ApiPurge.php: I29636c04: Add RecursiveLinkPurge log for API requests (duration: 00m 33s)
  • 21:47 logmsgbot: twentyafterfour@tin Purged l10n cache for 1.27.0-wmf.14
  • 21:44 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.17
  • 21:24 gehel: restarting elasticsearch server elastic2010.codfw.wmnet
  • 20:48 gehel: restarting elasticsearch server elastic2009.codfw.wmnet
  • 20:45 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.16/extensions/MoodBar/MoodBar.php: T129978 (duration: 00m 53s)
  • 20:36 twentyafterfour: grr now it says .17, pebkac
  • 20:35 twentyafterfour: testwiki still shows 1.27.0-wmf.16 :(
  • 20:35 mutante: terbium - gzip nutcracker.log.1 for disk space
  • 20:34 logmsgbot: twentyafterfour@tin Finished scap: testwiki to 1.27.0-wmf.17 (duration: 52m 15s)
  • 20:05 papaul: labstore200[3-4] OS install on hold: making new partman recipe
  • 19:42 logmsgbot: twentyafterfour@tin Started scap: testwiki to 1.27.0-wmf.17
  • 19:16 gehel: restarting elasticsearch server elastic2008.codfw.wmnet
  • 19:05 akosiaris: enable puppet on neon
  • 19:04 _joe_: rolling restart of restbase in codfw
  • 19:00 mutante: alsafi back up with 4.4 kernel
  • 18:57 mutante: alsafi - url-downloader codfw - reboot
  • 18:44 gehel: restarting elasticsearch server elastic2007.codfw.wmnet
  • 18:43 akosiaris: disable puppet on neon for a few minutes while deploying https://gerrit.wikimedia.org/r/#/c/276199/7
  • 18:27 akosiaris: pool sca1001, sca1002 for apertium.svc.eqiad.wmnet in conftool
  • 18:23 papaul: OS install labstore200[3-4]
  • 17:37 gehel: restarting elasticsearch server elastic2006.codfw.wmnet
  • 17:05 mutante: mw1017 (canary) - test wikimania redirect change, restart apache
  • 16:58 gehel: restarting elasticsearch server elastic2005.codfw.wmnet
  • 16:55 _joe_: repooling mw1107
  • 16:51 _joe_: repool mw1201
  • 16:51 mutante: authdns-update - add labstore2004
  • 16:46 mutante: labstore200[3-4] added to DHCP (T128764) @papaul #codfw
  • 16:46 _joe_: repooling mw1196
  • 16:44 _joe_: repooling mw2020
  • 16:30 mutante: osmium - delete /srv/ruthenium data that has already been copied back
  • 16:27 mutante: osmium - stopping rsyncd, removing remnants from backup job for ruthenium upgrade T122328
  • 16:27 ottomata: reenabled puppet on analytics1027
  • 16:16 gehel: restarting elasticsearch server elastic2004.codfw.wmnet
  • 16:08 akosiaris: restarted pybal on lvs1004, lvs1005, lvs1006 to pickup apertium change
  • 16:07 akosiaris: restarted pybal on lvs2004, lvs2005, lvs2006
  • 15:53 ottomata: disabled puppet on analytics1027 for camus stop and reboot to apply kernel update
  • 15:50 moritzm: rebooting analytics1027 for kernel upgrade
  • 15:38 akosiaris: restarted pybal on lvs1007, lvs1008, lvs1009. Had already restarted pybal on lvs1010, lvs1011, lvs1012 about an hour before
  • 15:26 moritzm: rebooting nobelium for kernel upgrade
  • 15:20 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: guwikiquote namespace and import stuff (duration: 00m 27s)
  • 15:13 moritzm: rebooting californium for kernel upgrade
  • 15:11 logmsgbot: demon@tin Synchronized wmf-config/throttle.php: Taller d'iniciació a la Viquipèdia, Montserrat throttle rule (duration: 00m 27s)
  • 15:09 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: VE for ip users on dewiki (duration: 00m 28s)
  • 15:08 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: VE single edit table plwiki (duration: 00m 29s)
  • 15:05 logmsgbot: demon@tin Synchronized wmf-config/CommonSettings.php: password policy stuff (duration: 00m 32s)
  • 14:57 gehel: restarting elasticsearch server elastic2003.codfw.wmnet
  • 14:23 logmsgbot: hoo@tin Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 36s)
  • 14:15 gehel: restarting elasticsearch server elastic2002.codfw.wmnet
  • 13:14 gehel: restarting elasticsearch server elastic2001.codfw.wmnet
  • 11:55 logmsgbot: hoo@tin Synchronized wmf-config/: Consistency sync (duration: 02m 29s)
  • 11:24 hoo: Updated Wikidata's property suggester with data from Monday's json dump
  • 11:05 moritzm: rebooting subra/suhail for kernel upgrade
  • 10:55 moritzm: rolling reboot of mw* in codfw for kernel upgrade
  • 09:27 mobrovac: restbase rolling restart for https://gerrit.wikimedia.org/r/277056
  • 08:24 mobrovac: restbase deploy end of c68f5f456
  • 08:15 mobrovac: restbase deploy start of c68f5f456
  • 08:10 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: I01c01dcd: Follow-up for Ieeb76087: tolerate missing trailing semicolon (duration: 00m 29s)
  • 07:59 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: Ieeb76087: Allow X-Wikimedia-Debug header to request pages in read-only mode (duration: 00m 25s)
  • 07:57 logmsgbot: ori@tin Synchronized wmf-config/StartProfiler.php: I82ec01a: X-Wikimedia-Debug: profile if "profile" attribute set (duration: 00m 25s)
  • 07:27 ori: log files for RO test in fluorine:/a/mw-log.read-only.1458025363.tar.bz2
  • 07:15 logmsgbot: ori@tin Synchronized wmf-config/db-eqiad.php: I1eb69f16: Revert "Put eqiad in read-only mode for scheduled test" (for real) (duration: 00m 28s)
  • 07:10 _joe_: reenabling puppet, jobrunner, jobchron on jobrunners and videoscalers
  • 07:10 logmsgbot: ori@tin Synchronized wmf-config/db-eqiad.php: I1eb69f16: Revert "Put eqiad in read-only mode for scheduled test" (duration: 00m 28s)
  • 07:02 logmsgbot: ori@tin Synchronized wmf-config/db-eqiad.php: Ie3f798ac: Put eqiad in read-only mode for scheduled test (duration: 00m 55s)
  • 06:59 _joe_: stopping all jobrunners in eqiad
  • 06:56 _joe_: stopping jobrunner and jobchron on the videoscalers in eqiad
  • 06:45 _joe_: stopping puppet on the eqiad jobrunners, in preparation for the read-only test
  • 02:53 logmsgbot: krinkle@tin Synchronized wmf-config/mobile.php: Remove legacy scripts from autoload on mobile (duration: 00m 26s)
  • 02:48 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Mar 15 02:48:16 UTC 2016 (duration 8m 54s)
  • 02:39 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 17m 31s)
  • 01:46 mutante: alsafi was hanging and in the second in connected to the Ganeti console it was back like nothing happened
  • 00:07 logmsgbot: reedy@tin Synchronized wmf-config/extension-list: Remove OAI (duration: 00m 24s)
  • 00:06 logmsgbot: reedy@tin Synchronized wmf-config/CommonSettings.php: Disable OAI (duration: 00m 25s)
  • 00:01 MaxSem: ran mwscript maintenance/updateCollation.php --wiki=ruwikinews --force

2016-03-14

  • 23:55 MaxSem: ran mwscript maintenance/updateCollation.php --wiki=ruwikiquote --force
  • 23:52 MaxSem: ran mwscript maintenance/updateCollation.php --wiki=ruwikiversity --force
  • 23:51 MaxSem: ran mwscript maintenance/updateCollation.php --wiki=ruwikivoyage --force
  • 23:50 MaxSem: mwscript namespaceDupes.php --wiki=knwiki --fix produced 3changes unrelated to new namespaces, --source-pseudo-namespace gave no results
  • 23:48 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/277432 (duration: 00m 26s)
  • 23:46 MaxSem: ran mwscript maintenance/updateCollation.php --wiki=ruwikibooks --force
  • 23:41 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/276743/ (duration: 00m 26s)
  • 23:39 MaxSem: mwscript namespaceDupes.php --wiki=kowiktionary gives no pages to fix
  • 23:37 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/276919/ (duration: 00m 27s)
  • 23:34 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.16/extensions/WikimediaEvents/: https://gerrit.wikimedia.org/r/#/c/277337/ (duration: 00m 26s)
  • 23:33 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.16/extensions/CirrusSearch/: (no message) (duration: 00m 32s)
  • 23:29 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: SWAT (duration: 00m 27s)
  • 23:28 logmsgbot: maxsem@tin Synchronized dblists/commonsuploads.dblist: SWAT (duration: 00m 26s)
  • 23:26 logmsgbot: maxsem@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/276895/ (duration: 00m 26s)
  • 23:23 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/276547/ (duration: 00m 26s)
  • 23:22 logmsgbot: maxsem@tin Synchronized static/images/project-logos/guwiktionary.png: https://gerrit.wikimedia.org/r/#/c/276547/ (duration: 00m 30s)
  • 23:21 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/276485 (duration: 00m 26s)
  • 23:14 logmsgbot: maxsem@tin Synchronized docroot/noc: https://gerrit.wikimedia.org/r/276993 (duration: 00m 35s)
  • 22:14 greg-g: read-only mode, intermittently, on mw.org and other s3 wikis(?)
  • 22:11 hoo: Restarted hhvm on mw1166 (was flooding with undefined variable notices)
  • 22:09 logmsgbot: hoo@tin Synchronized wmf-config/InitialiseSettings.php: Log 'Wikibase\Client\Changes\WikiPageUpdater' (duration: 00m 27s)
  • 21:56 mutante: logstash - nginx -> T129934
  • 20:36 gehel: HTTPS activated on elasticsearch (no client using it yet)
  • 19:17 logmsgbot: krinkle@tin Synchronized docroot/: Update static symlinks (duration: 00m 28s)
  • 19:09 logmsgbot: krinkle@tin Synchronized docroot/mediawiki/: Update static symlink (duration: 00m 29s)
  • 18:56 urandom: Starting Cassandra repairs on restbase1007-a.eqiad.wmnet : T108611
  • 18:44 logmsgbot: krinkle@tin Synchronized w: Replace /w/static with symlink (duration: 00m 30s)
  • 18:43 logmsgbot: krinkle@tin Synchronized static/: Moved /srv/mediawiki/w/static to /srv/mediawiki/static (duration: 00m 29s)
  • 18:27 ejegg: updated SmashPig from ad27a3a18cccd6bbd843e671218a1f5c190bdb8a to 9f08f6a1891b0a2bb70eacf460c2f9a8153c3b4e
  • 18:04 ejegg: increased max jobs per run for Adyen job runner from 10 to 250, time limit from 60 to 120
  • 18:00 chasemp: restart pdns on labservices1001 :)
  • 16:39 csteipp: deployed fix for scribunto issue related to T110143
  • 16:31 logmsgbot: thcipriani@tin Finished scap: Better announce new optional MT services available gerrit:277195 (duration: 26m 17s)
  • 16:04 logmsgbot: thcipriani@tin Started scap: Better announce new optional MT services available gerrit:277195
  • 16:01 _joe_: repooling mw1141
  • 16:00 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.16/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.TableNode.js: SWAT: Update VE core submodule to wmf/1.27.0-wmf.16 HEAD gerrit:277183 (duration: 00m 30s)
  • 15:54 _joe_: repooling mw1128
  • 15:28 moritzm: rebooting nobelium for kernel update
  • 15:16 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.16/extensions/Echo: SWAT: thank-you-edit: canRender for deleted page and extra fix gerrit:276916 (duration: 00m 39s)
  • 14:53 ejegg: updated civicrm from 7e21d5ad1f9ff404ac155a38f771b744bf238ccf to 090d443c856574d45c80f89c4ae7ccb86c97448f
  • 14:37 elukey: re-imaging mw2090.codfw for T126987
  • 14:27 godog: pool restbase1009
  • 14:16 godog: pool restbase101[01]
  • 13:16 elukey: re-added kafka1002 to the eventbus confd pool after maintenance
  • 13:14 elukey: removed kafka1002.eqiad.wmnet from eventbus' pool via confd for maintenance
  • 13:13 elukey: re-added kafka1001 to the eventbus confd pool after maintenance
  • 12:57 elukey: removed kafka1001 from eventbut's pool via confd
  • 11:47 moritzm: installing security updates for openssl, curl, gcrypt, libpng, jasper, expat and libxml2 on remaining mw1* app servers (along with HHVM restarts)
  • 10:56 moritzm: installing rsync security updates
  • 10:38 godog: shut restbase100[12]
  • 10:37 moritzm: installing security updates for openssl, curl, gcrypt, libpng, jasper, expat and libxml2 on mw2* (along with HHVM restarts)
  • 10:27 mobrovac: restbase restarting restbase in prod to apply https://gerrit.wikimedia.org/r/276728
  • 10:13 mobrovac: restbase deployed 3bedb8f on restbase101[01].eqiad.wmnet
  • 10:06 mobrovac: restbase stopping restbase on rb100[12] before full deprovisioning
  • 09:46 godog: depool restbase1001 / restbase1002 before deprovisioning
  • 05:19 ori: Restarting HHVM on mw1025 with known-bad config option 'hhvm.enable_reusable_tc = true' for debugging
  • 02:38 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Mar 14 02:38:54 UTC 2016 (duration 8m 41s)
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 13m 23s)

2016-03-13

  • 02:39 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Mar 13 02:39:06 UTC 2016 (duration 8m 35s)
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 13m 24s)

2016-03-12

  • 17:12 hoo: Updated operations/dumps/dcat on snapshot1003 from 92ab37d94e to e97408df39
  • 04:44 logmsgbot: legoktm@tin Synchronized php-1.27.0-wmf.16/extensions/CharInsert/: Revert "Remove inline event handler js from charinsert" - https://gerrit.wikimedia.org/r/#/c/276932/ T129524 (duration: 00m 29s)
  • 02:38 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Mar 12 02:38:16 UTC 2016 (duration 8m 45s)
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 12m 52s)
  • 00:55 logmsgbot: aaron@tin Synchronized rpc/RunJobs.php: 7a8bd37247b7dfb (duration: 00m 38s)

2016-03-11

  • 23:21 logmsgbot: krinkle@tin Synchronized wmf-config/InitialiseSettings.php: test2wiki favicon (duration: 00m 30s)
  • 22:50 logmsgbot: aaron@tin Synchronized wmf-config/jobqueue-eqiad.php: Disable persistent redis connections and bump timeout a bit (duration: 00m 38s)
  • 19:40 jynus: restarting dbstore1002
  • 18:51 mutante: mw1001 - removed font again, mw1153 confirmed puppet installs it (only) on imagescalers
  • 18:42 mutante: mw1001 install fonts-gujr-extra to confirm gerrit 276501 is fine
  • 17:38 elukey: Increased mediawiki::jobrunner::runners_basic to 30 for mw116[123789]
  • 16:38 logmsgbot: mattflaschen@tin Synchronized wmf-config/InitialiseSettings.php: Comment-only change (duration: 00m 41s)
  • 15:44 chasemp: labtestmetal and labtestvirt I am commondeering for some nfs and storage testing that cannot be done in labs, I will reimage when done
  • 15:41 elukey: forced puppet agent -tv on mw1164
  • 15:39 elukey: increased mediawiki::jobrunner::runners_basic from 20 to 30 for mw116[3-6]
  • 15:26 gehel: deploying https://gerrit.wikimedia.org/r/#/c/274382/ - new defined type not used anywhere, should have less than zero impact.
  • 15:20 moritzm: installing security updates for openssl, curl, gcrypt, libpng, jasper, expat and libxml2 on mw1018-1025 (and restarts of HHVM)
  • 15:04 logmsgbot: demon@tin Synchronized README: no-op, co-master sync (duration: 00m 30s)
  • 15:03 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: moving large.dblist back to wmf.16, did not help
  • 14:45 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: moving largest.dblist back to wmf.15 for now
  • 14:33 godog: run swiftrepl thumbs eqiad -> codfw with concurrency 96
  • 14:25 godog: run swiftrepl thumbs eqiad -> codfw with concurrency 128
  • 14:12 moritzm: installed security updates for openssl, curl, gcrypt, libpng, jasper, expat and libxml2 on mw1017 (other canaries will be upgraded later on if all is well, the rest of mw* on Monday)
  • 13:50 apergos: on fluorine truncated archive/redis.log-20160311 and archive/JobQueueFederated.log-20160311 to 5mb each, they were each about 500gb
  • 13:26 mobrovac: restbase deploy end of 3bedb8f5c42
  • 12:56 mobrovac: restbase deploy start of 3bedb8f5c42 on canary restbase1001
  • 12:36 mobrovac: mathoid deploying 7a282a4181a4
  • 12:11 moritzm: uploaded backport of linux-tools 4.4-4 for jessie-wikimedia to carbon (provides kbuild and perf amonst others)
  • 11:48 akosiaris: restarting gerrit on ytterbium
  • 11:27 jynus: running authdns-update to solve a dns typo
  • 11:25 jynus: killed some long-running git-upload-pack tasks
  • 10:26 hashar: Gerrit task management documented on https://wikitech.wikimedia.org/wiki/Gerrit#Tasks_management
  • 10:11 hashar: Gerrit: killed stuck "git-upload-pack '/mediawiki/core.git'" tasks
  • 09:46 elukey: Rebooting rdb200[56].codfw for kernel upgrade
  • 09:13 ema: umounted /sys/kernel/debug/tracing on cp1067
  • 09:12 hashar: Enabling puppet on gallium.wikimedia.org . Been disabled since ~ Mon Mar 7 10:26:01 UTC 2016
  • 09:07 moritzm: uploaded linux-meta 1.9 for jessie-wikimedia to carbon (which now defaults linux-meta to installing Linux 4.4)
  • 08:49 ema: cp1067: apt-get removed linux-image-3.16.0-4-amd64 and linux-image-4.4.0-1-amd64-dbg to free up some disk space
  • 08:20 moritzm: restarted hhvm on mw1246, mw1258
  • 08:18 moritzm: restarted hhvm on mw1220, mw1244
  • 08:17 moritzm: restarted hhvm on mw1174, mw1186, mw1211
  • 07:31 logmsgbot: kartik@tin Synchronized php-1.27.0-wmf.16/extensions/ContentTranslation: Deploying 276700 for ContentTranslation (duration: 00m 43s)
  • 06:44 _joe_: restarted hhvm on mw1166
  • 04:50 ori: Uninstalled HHVM on nobelium; not puppetized.
  • 04:15 papaul: rdb200[5-6] installation complete
  • 03:45 papaul: rdb200[5-6] signing puppet certs, salt-key, initial run
  • 03:00 ejegg: updated payments-wiki from 7248c10613018c1a15a1754ab80242f79d04532f to 79f5c9389edd089ae5951a7d172e74e68946a93c
  • 02:57 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.16/extensions/VisualEditor/modules/ve-mw/ui/pages/ve.ui.MWTemplatePage.js: touch - see comment on https://gerrit.wikimedia.org/r/#/c/274120/ (duration: 00m 31s)
  • 02:54 papaul: installing rdb200[5-6]
  • 02:39 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Mar 11 02:39:01 UTC 2016 (duration 8m 45s)
  • 02:31 mutante: cygnus/technetium: puppet cert clean, salt-key -d (neodymium), puppetstoredconfigclean.rb (rm from Icinga), gnt-instance remove (destroy VMs)
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 12m 33s)
  • 02:23 mutante: cygnus - poweroff (for variety)
  • 02:21 mutante: technetium - shutdown -h now
  • 01:56 ejegg: updated SmashPig from 154688db3595060b55498dea3cdf1ee206a854a0 to ad27a3a18cccd6bbd843e671218a1f5c190bdb8a
  • 01:25 logmsgbot: krinkle@tin Synchronized w/static.php: 6da604f and 49c07ac (duration: 00m 39s)
  • 01:06 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Un-revert Flow change on mw.org (duration: 00m 26s)
  • 01:06 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Un-revert Flow change on mw.org (duration: 00m 29s)
  • 01:04 MaxSem: Made mediawiki/php/wikidiff read-only in Gerrit
  • 00:46 logmsgbot: catrope@tin Synchronized portals: (no message) (duration: 00m 26s)
  • 00:45 logmsgbot: catrope@tin Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 00m 26s)
  • 00:43 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.16/extensions/Echo/Hooks.php: Try fixing the thank-you-edit bug again (duration: 00m 26s)
  • 00:41 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.16/includes/diff/DifferenceEngine.php: Convert timing to milliseconds (duration: 00m 26s)
  • 00:32 RoanKattouw: Running populateContentModel.php on mediawikiwiki so I can un-revert the Flow change
  • 00:29 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Revert Flow change on mw.org (duration: 00m 27s)
  • 00:28 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Revert Flow change on mw.org (duration: 00m 27s)
  • 00:24 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable cross-wiki notifications beta feature on all wikis (duration: 00m 27s)
  • 00:22 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Clean up old Flow occupy stuff (duration: 00m 26s)
  • 00:22 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Clean up old Flow occupy stuff (duration: 00m 25s)
  • 00:19 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Make Flow the default in talk namespaces on mediawikiwiki (duration: 00m 38s)
  • 00:08 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable completion suggester on all but top 12 wikis (duration: 00m 32s)

2016-03-10

  • 23:53 urandom: Starting Cassandra cleanup op on restbase10{07,10,11}-{a,b}.eqiad.wmnet : T125842
  • 23:53 urandom: Starting Cassandra cleanup op on restbase10{07,10,11}-{a,b}.eqiad.wmnet : T125832
  • 23:27 ejegg: updated civicrm from cbcfafcb2e6d6e1dae12a2b2d554445871992aff to 7e21d5ad1f9ff404ac155a38f771b744bf238ccf
  • 23:06 logmsgbot: twentyafterfour@tin Synchronized php-1.27.0-wmf.16/languages/LanguageConverter.php: deploying https://gerrit.wikimedia.org/r/#/c/276467/ (duration: 00m 31s)
  • 22:17 ejegg: updated payments wiki from 07dcb0f3962143fd9497ccad19b7b682beb991fe to 7248c10613018c1a15a1754ab80242f79d04532f
  • 22:03 csteipp: deployed patch for T129506 to wmf15 & 16
  • 21:55 mutante: cygnus , killing akumar's processes
  • 21:42 logmsgbot: twentyafterfour@tin Synchronized php-1.27.0-wmf.16/extensions/Wikidata: Deploy https://gerrit.wikimedia.org/r/#/c/276473/ (duration: 02m 10s)
  • 21:03 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.16
  • 20:32 cmjohnson1: replacing failed disk ms1001
  • 20:31 AaronSchulz: Ran <<mwscriptwikiset extensions/WikimediaMaintenance/filebackend/setZoneAccess.php private.dblist --backend=local-multiwrite --private>>
  • 19:23 godog: enable puppet in cache_cluster in eqiad, followed by the rest
  • 19:17 godog: set timeline and math data containers as readable in swift codfw
  • 18:53 logmsgbot: krinkle@tin Synchronized wmf-config/db-codfw.php: I47954e21 (duration: 00m 30s)
  • 18:46 godog: sync wikipedia-commons-gwtoolset-metadata with swiftrepl eqiad -> codfw T129359
  • 18:43 yurik: deployed latest kartotherian
  • 18:19 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.15/includes/GlobalFunctions.php: wfShellExec() debug logging for T129467 (take 2) (duration: 00m 26s)
  • 18:13 jynus: importing zhwiki missing records from production to labs
  • 18:13 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.15/includes/GlobalFunctions.php: wfShellExec() debug logging for T129467 (duration: 00m 28s)
  • 17:44 godog: running puppet in batches on cache_upload in eqiad after https://gerrit.wikimedia.org/r/#/c/276223/
  • 17:38 logmsgbot: filippo@tin Synchronized wmf-config/filebackend-production.php: swift codfw sync replication T129089 (duration: 00m 39s)
  • 17:19 ori: cleared hhbc bytecode repo on mw1082 and mw1245 on suspicion that old translations were reused
  • 16:36 moritzm: mw1122: restarted hhvm
  • 15:12 akosiaris: restart hhvm on mw1107
  • 14:29 moritzm: mw1107: restarted hhvm
  • 14:15 _joe_: restarted multiple times hhvm on mw112; it didn't help, so removed manually the warmup upstart task and the old sqlite cache; also disabling puppet on the machine
  • 14:08 urandom: increasing outbound stream throughput on restbase1002.eqiad.wmnet to 200mbps : T125842
  • 13:32 bblack: restarted mw1122 hhvm
  • 13:26 bblack: mw1210: restarted hhvm
  • 13:25 moritzm: mw1241: restarted hhvm
  • 13:20 bblack: mw1188: restarted hhvm (before alert hit IRC, was already pending in icinga)
  • 13:19 bblack: mw1240: restarted hhvm
  • 13:17 bblack: mw1122: restarted hhvm, killed stale procs from Jan24 looking like: /usr/bin/hhvm --php -c /etc/hhvm/fcgi.ini -r echo ini_get("hhvm.jit_warmup_requests")?:11;
  • 13:15 bblack: restarting hhvm on mw1112
  • 13:12 bblack: restarting hhvm on mw1248, mw1151
  • 13:06 bblack: restarting hhvm on mw1251 ...
  • 13:02 bblack: restarting hhvm on mw1258 ...
  • 13:01 bblack: restarting hhvm on mw1243
  • 12:59 bblack: restarting hhvm on mw1091 mw1249 mw1252 mw1242 ...
  • 12:56 bblack: restarting hhvm on mw1217
  • 12:01 _joe_: restarted hhvm on mw1122, stuck at startup
  • 10:39 mobrovac: restbase deploy end of 26bd4aa28
  • 10:31 jynus: setting up the rest of the cross-datacenter master-master connections pending in wmf databases
  • 10:09 godog: decommissioning restbase1002.eqiad.wmnet : T125842
  • 09:50 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Rebalance external storage servers in codfw T127330 (duration: 00m 34s)
  • 09:50 mobrovac: restbase deploy start of 26bd4aa28 on restbase1001
  • 08:55 ema: apache2 and hhvm restarted on mw1107, mw1122 and mw1119
  • 03:08 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Mar 10 03:08:54 UTC 2016 (duration 9m 38s)
  • 02:59 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 12m 33s)
  • 02:33 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 14m 08s)
  • 00:45 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.15/extensions/ZeroBanner: SWAT (duration: 00m 28s)
  • 00:44 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.15/extensions/MobileFrontend: SWAT (duration: 00m 32s)
  • 00:39 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.16/extensions/ZeroBanner: SWAT (duration: 00m 28s)
  • 00:39 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.16/extensions/MobileFrontend: SWAT (duration: 00m 36s)
  • 00:31 ori: Upgrade of HHVM package to 3.12.1+dfsg-1 complete on all eqiad hosts save terbium
  • 00:23 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Set NS_PROJECT on azwiktionary (duration: 00m 29s)
  • 00:13 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Remove svnadmins group (duration: 00m 41s)

2016-03-09

  • 23:54 bblack: ending caches codfw->direct testing
  • 23:15 bblack: restarted puppetmaster on palladium
  • 23:03 bblack: caches: starting test codfw->direct (codfw caches -> eqiad apps)
  • 22:42 bearND: mobileapps deployed 26d4031
  • 22:36 bearND: starting mobileapps deploy, second try (without dead link removal patch)
  • 21:54 bearND: mobileapps deployed 95a2d76
  • 21:28 ori: Depooled mw1107
  • 21:26 bearND: starting mobileapps deploy
  • 21:22 ori: restarted HHVM on mw1107; lock-up
  • 21:06 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.16
  • 20:57 ori: Upgrade HHVM on CODFW app servers to 3.12.1+dfsg-1
  • 20:19 urandom: decommissioning restbase1001.eqiad.wmnet : T125842
  • 20:12 logmsgbot: krinkle@tin Synchronized wmf-config/db-eqiad.php: Clean up - Ibb4bb0b32f5 (duration: 00m 29s)
  • 20:11 logmsgbot: krinkle@tin Synchronized wmf-config/db-codfw.php: Clean up - Ibb4bb0b32f5 (duration: 00m 35s)
  • 19:01 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Change codfw external storage topology T127330 (duration: 00m 27s)
  • 18:52 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.16/includes/session/SessionManager.php: Add backtrace to log for MW_NO_SESSION warning mode gerrit:276232 (duration: 00m 50s)
  • 18:44 volans: Changing topology of local codfw masters for es2 and es3 before merging https://gerrit.wikimedia.org/r/#/c/276229/1 T127330
  • 17:53 csteipp: deployed patch for T110143 to wmf16
  • 17:40 csteipp: deployed patch for T122056
  • 16:09 logmsgbot: thcipriani@tin Synchronized wmf-config/throttle.php: SWAT: Wikipedia while at Women of the World Festival throttle rule gerrit:276179 (duration: 00m 30s)
  • 16:05 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Change LanguageOverlay bucket rates gerrit:276172 (duration: 00m 52s)
  • 14:54 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Rebalance external storage servers in codfw T127330 (duration: 00m 41s)
  • 14:14 moritzm: installing Django security updates
  • 13:19 moritzm: installing beanshell security updates
  • 12:46 paravoid: set disable on cr2-knams:xe-1/2/0 (Init7), issues with at least one large network
  • 12:25 elukey: added rdb1003 back to the Job Runners queue. All the jobchron processes on jobrunners/videoscalers need to be restarted.
  • 12:07 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: Add rdb1003 back to the Redis JobQueue pool after maintenance (duration: 00m 34s)
  • 12:00 elukey: re-enabled puppet on rdb1004
  • 11:47 moritzm: uploaded firmware-nonfree 20151018 to jessie-wikimedia on carbon
  • 11:39 godog: rolling restart cassandra in staging after merging https://gerrit.wikimedia.org/r/#/c/275917/
  • 11:27 jynus: setting up master-master cross-datacenter replication for s2 (db1018-db2017)
  • 11:20 ema: hhvm restarted on mw1140
  • 11:04 jynus: deployed manually new code at dbtree (noc.wikimedia.org)
  • 10:39 elukey: puppet disabled on rdb1004 (plus Redis instances) as precautionary step for master reimage - rdb1003
  • 10:35 jynus: setting up master-master replication on tools (labsdb100[45])
  • 09:32 moritzm: rebooting iron for kernel update
  • 09:26 elukey: puppet disabled on rdb1003 for debian re-image. Redis servers going to be stopped too as pre-step for backup.
  • 08:32 logmsgbot: oblivian@tin Synchronized wmf-config/jobqueue-eqiad.php: re-routing writes from rdb1003 to rdb1005 for reimaging (duration: 00m 36s)
  • 08:29 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2035 (duration: 00m 39s)
  • 07:49 _joe_: uploading hhvm 3.12, extensions to reprepro
  • 06:26 mutante: bast2001 - still installing in snail mode - please feel free to check if it's done, and if so re-add to puppet so users get created.thx
  • 03:16 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Mar 9 03:16:21 UTC 2016 (duration 8m 2s)
  • 03:08 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 17m 51s)
  • 02:32 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 13m 28s)
  • 02:13 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.15/includes/diff/DairikiDiff.php: I4d4b8f81c: Dont quote assert expressions in DairikiDiff (duration: 00m 27s)
  • 02:13 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.16/includes/diff/DairikiDiff.php: I4d4b8f81c: Dont quote assert expressions in DairikiDiff (duration: 00m 31s)
  • 01:40 csteipp: redeploy patch for T129120
  • 01:39 logmsgbot: csteipp@tin Synchronized php-1.27.0-wmf.16/includes/api/ApiQueryInfo.php: (no message) (duration: 00m 33s)
  • 00:33 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.15/extensions/Echo/: SWAT (duration: 00m 32s)
  • 00:32 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.15/extensions/MobileFrontend/: SWAT (duration: 00m 32s)
  • 00:30 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.16/extensions/MobileFrontend/: SWAT (duration: 00m 29s)
  • 00:29 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.16/extensions/Echo/: SWAT (duration: 00m 38s)
  • 00:25 logmsgbot: maxsem@tin Synchronized wmf-config/: https://gerrit.wikimedia.org/r/275378 - comment only (duration: 00m 27s)
  • 00:16 logmsgbot: maxsem@tin Synchronized wmf-config/filebackend-production.php: https://gerrit.wikimedia.org/r/272922 (duration: 00m 27s)
  • 00:13 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/275858/ (duration: 00m 28s)
  • 00:09 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/275275 (duration: 00m 29s)

2016-03-08

  • 23:55 ejegg: updated payments wiki from 2f16a0121fb9313a01a9646b8c71750385523569 to 07dcb0f3962143fd9497ccad19b7b682beb991fe
  • 23:20 mutante: bast2001 - install issues - extending downtime, bbiaw
  • 23:20 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.16
  • 23:11 mutante: neon: re-enable puppet, start icinga-wm
  • 23:03 logmsgbot: twentyafterfour@tin Synchronized w/static/: (no message) (duration: 00m 32s)
  • 22:53 mutante: bast2001 - powercycle, reinstall
  • 22:19 logmsgbot: twentyafterfour@tin Finished scap: testwiki to php-1.27.0-wmf.16 and rebuild l10n cache (duration: 26m 06s)
  • 22:04 ori: Canary application servers (mw1017-mw1025) and canary API application servers (mw1114-mw1119) upgraded to HHVM 3.12.1
  • 22:04 jynus: recreating slave watchdog events on all servers
  • 21:53 logmsgbot: twentyafterfour@tin Started scap: testwiki to php-1.27.0-wmf.16 and rebuild l10n cache
  • 21:02 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2040 (duration: 00m 34s)
  • 21:01 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1066 (duration: 00m 33s)
  • 20:49 ori: Updated mw1200 to HHVM 3.12.1; repooling
  • 20:45 ori: Depooled mw1200 for HHVM update
  • 20:40 jynus: killing long running queries on db1066
  • 20:39 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1066 (duration: 01m 49s)
  • 18:55 mutante: temp. stopped icinga-wm
  • 18:54 moritzm: uploaded linux 4.4.2-3+wmf1/jessie-wikimedia (based on Linux 4.4.4) to carbon
  • 17:46 bblack: switching text cache routing: ulsfo->codfw
  • 17:36 papaul: sinistra signing puppet certs, salt-key, initial run
  • 17:34 jynus: applying new grants to all database servers
  • 17:12 godog: upgrade cassandra to 2.1.13 on restbase1010
  • 16:57 godog: upgrade cassandra to 2.1.13 on restbase1009
  • 16:46 godog: upgrade cassandra to 2.1.13 on restbase1008
  • 16:34 godog: upgrade cassandra to 2.1.13 on restbase1007
  • 16:25 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove reader segmentation survey gerrit:275669 (duration: 00m 25s)
  • 16:19 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Update pbkdf2 hash parameters gerrit:274795 (duration: 00m 31s)
  • 16:16 godog: restarting cassandra on restbase1*
  • 16:12 godog: upgrade cassandra to 2.1.13 on restbase1001
  • 16:12 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor for new accounts on the German Wikipedia PART II gerrit:271712 (duration: 00m 29s)
  • 16:11 logmsgbot: thcipriani@tin Synchronized dblists/visualeditor-default.dblist: SWAT: Enable VisualEditor for new accounts on the German Wikipedia PART I gerrit:271712 (duration: 00m 32s)
  • 16:06 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Permissions configuration changes for gl.wikipedia gerrit:275285 and Modify throttle settings for frwiki and cawiki due to Workshop gerrit:275287 (duration: 00m 28s)
  • 15:40 godog: upgrade cassandra to 2.1.13 on restbase2006
  • 15:36 bblack: switching upload cache routing: ulsfo->codfw
  • 15:32 godog: upgrade cassandra to 2.1.13 on restbase2005
  • 15:29 godog: upgrade cassandra to 2.1.13 on restbase2004
  • 15:24 godog: upgrade cassandra to 2.1.13 on restbase2003
  • 15:15 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2038, db2039; depool db2035 for partitioning (duration: 00m 40s)
  • 15:09 bblack: switching misc-web cache routing: ulsfo->codfw
  • 14:51 godog: upgrade cassandra to 2.1.3 on restbase2002
  • 14:27 godog: upgrade cassandra to 2.1.3 on restbase2001
  • 14:10 godog: update reprepro with cassandra 2.1.13 T126629
  • 11:54 jynus: performing schema change (s7 partitioning) on db2040
  • 11:27 _joe_: restarting pybal on lvs2003,6 to pick up the config change
  • 08:43 elukey: Redis and Puppet stopped on rdb1002 (Job queue slave) as pre-step for Debian re-image
  • 08:00 jynus: starting schema change (s6 partitioning) on db2039
  • 03:05 ori: Upgraded mw1017 to hhvm_3.12.1+dfsg-1
  • 02:33 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 13m 50s)
  • 02:09 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.15/resources/src/mediawiki.special/mediawiki.special.preferences.js: T122702 (duration: 02m 28s)
  • 01:03 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.15/extensions/Kartographer/: https://gerrit.wikimedia.org/r/#/c/275732/ (duration: 02m 25s)
  • 00:36 ejegg: updated payments-wiki from b85a671bc96edda35f36846d5c669a78b294524a to 2f16a0121fb9313a01a9646b8c71750385523569, updated fraud filters
  • 00:30 logmsgbot: ebernhardson@tin Synchronized wmf-config/: Set VisualEditorSingleEditTabSwitchTime to correct dates (duration: 02m 27s)
  • 00:25 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.15/extensions/Echo/Hooks.php: T128249 Try and avoid race conditions with thank-you-edit notifications (duration: 02m 23s)
  • 00:20 logmsgbot: ebernhardson@tin Synchronized wmf-config: T128774 enable completion suggester by default on test/test2wiki (duration: 02m 27s)
  • 00:14 logmsgbot: ebernhardson@tin Synchronized wmf-config/filebackend-production.php: Enable async swift writes to all wikis except commons (duration: 02m 25s)

2016-03-07

  • 23:45 greg-g: ssh: connect to host mw2212.codfw.wmnet port 22: Connection timed out
  • 23:40 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.15/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.init.js: Remember editor preference in WikiEditor too (duration: 02m 21s)
  • 23:34 ori: Updating mw1099 (which is depooled) to HHVM 3.12
  • 23:20 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/275703, now for realz (duration: 02m 22s)
  • 23:13 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/275703 (duration: 02m 23s)
  • 23:06 logmsgbot: ebernhardson@tin Synchronized tests/loggingTest.php: Sync out test only change (duration: 02m 20s)
  • 22:58 logmsgbot: maxsem@tin Finished scap: Enable Kartographer on testwiki (duration: 28m 31s)
  • 22:50 papaul: installing sinistra :new mw log host
  • 22:29 logmsgbot: maxsem@tin Started scap: Enable Kartographer on testwiki
  • 22:24 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.15/extensions/Kartographer: Initial deploy: get files into place (duration: 02m 27s)
  • 22:10 bearND: revert of mobileapps deploy complete
  • 22:06 bearND: starting to revert mobileapps deploy
  • 21:36 bearND: mobileapps deployed 49169e9
  • 21:31 bearND: starting mobileapps deploy
  • 21:14 subbu: finished deploying parsoid sha 5db1d28b
  • 21:07 subbu: synced code; restarted parsoid on wtp1001 as a canary
  • 21:05 subbu: starting parsoid deploy
  • 20:56 hashar: Nodepool successfully upgraded. T118573
  • 20:19 hashar: Nodepool restarting
  • 20:15 hashar: stopping nodepool
  • 20:07 csteipp: deployed patch for T129120
  • 20:02 hashar: Upgrading Nodepool from 0.1.1-wmf3 to 0.1.1-wmf.4 with andrewbogott | T118573
  • 19:17 csteipp: deployed initial patch for T109140
  • 19:03 mobrovac: restbase deploy end of 5add37b16
  • 18:32 mobrovac: restbase deploy start of 5add37b16 on restbase1001
  • 16:56 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable assignment of accountcreator for maiwiki gerrit:270897 (duration: 02m 21s)
  • 16:55 mobrovac: restbase deploy start of 88363c03e0 on restbase1001
  • 16:51 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: Update CirrusSearch PoolCounter for cross-dc search gerrit:270897 (duration: 02m 25s)
  • 16:46 logmsgbot: thcipriani@tin Synchronized wmf-config/PoolCounterSettings.php: SWAT: Create pool counter for CirrusSearch completion suggester gerrit:268029 (duration: 02m 22s)
  • 16:38 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.15/extensions/Translate/tag/PageTranslationHooks.php: SWAT: Fix regression in marking page for translation gerrit:275366 (duration: 02m 20s)
  • 16:33 elukey: moved OCG Redis Job Queue from rdb1002 to rdb1007 for maintenance.
  • 16:31 _joe_: powercycling mw2212
  • 16:30 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Ateneo de Manila University workshops throttle rule gerrit:275149 and Namespace configuration on he.wikivoyage gerrit:275154 (duration: 02m 30s)
  • 16:17 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration on wuu.wikipedia gerrit:275155 (duration: 02m 26s)
  • 16:10 _joe_: shutting down mw1033
  • 15:02 jynus: depooling mw1033
  • 14:59 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Depool db2038, db2039, db2040 (duration: 02m 29s)
  • 14:54 chasemp: clean out snapshots from teh weekend on labstore1001 as load is running higher than expected
  • 14:50 moritzm: installing squid security updates
  • 14:47 jynus: sync-common mw1033
  • 14:41 jynus: powercycling mw1033 (unresponsive)
  • 14:40 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Depool db2038, db2039, db2040 (duration: 02m 59s)
  • 13:42 jynus: performing schema change on db2038 (s5: T120513), lag on that server expected
  • 12:52 godog: repool ms-fe1003
  • 12:13 godog: depool ms-fe1003 for trusty upgrade T125024
  • 10:51 moritzm: reimaging iron to jessie
  • 10:24 godog: disable puppet on graphite1001 / graphite2001 / labmon1001 before merging https://gerrit.wikimedia.org/r/#/c/274716
  • 10:04 moritzm: uploaded kernel-wedge 2.93+wmf1 for jessie-wikimedia to carbon (needed to build modern kernels)
  • 08:41 _joe_: disabled puppet on mw1026-69, cleaning up puppet facts and certs, then shutting them down
  • 02:32 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 14m 27s)

2016-03-06

  • 10:45 _joe_: rebooting alsafi
  • 02:41 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Mar 6 02:41:06 UTC 2016 (duration 8m 2s)
  • 02:33 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 15m 10s)

2016-03-05

  • 13:29 ema: hhvm restarted on mw1025
  • 11:32 godog: nodetool stop COMPACTION / CLEANUP on restbase1006
  • 11:14 volans: Data trasnfer completed during the night, (re)starting MySQL on es200[124] and es201[123] T127330
  • 02:37 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Mar 5 02:37:44 UTC 2016 (duration 7m 43s)
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 13m 51s)
  • 00:01 chasemp: initiate dd replicate from labstore1001 tools snapshot to labstore1002 lv of tools-04032016

2016-03-04

  • 22:41 gwicke: restbase1005: `nodetool stop -- CLEANUP; nodetool stop -- COMPACTION`
  • 22:29 matt_flaschen: Ran P2709 against DB manually to work around T127693
  • 21:58 mutante: bast2001 if your ssh client shows the fingerprint as base64 SHA256, the new default, you can ssh -o FingerprintHash=md5 bast2001.wikimedia.org to compare
  • 21:29 mutante: bast2001 - reinstalled with jessie, fingerprints on https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bast2001.wikimedia.org
  • 21:17 mutante: bast2001 - revoke and sign new puppet cert / salt keys
  • 21:01 mutante: bast2001 - rebooting into PXE for T128899
  • 20:00 volans: Added logging to post-merge hook on palladium T128895
  • 17:51 cwd: updated paymentswiki from fbd22230ebf14e57f2c69ef1b342d4cdbc47d9a6 to b85a671bc96edda35f36846d5c669a78b294524a
  • 17:28 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-labs.php: prod nop, enables https in beta cluster for elasticsearch connections (duration: 00m 33s)
  • 17:25 jynus: chgrp recursive on tin to wikidev on .git/objects
  • 16:25 jynus: changing in a hot way db1047 replication filters
  • 16:03 ejegg: re-enabled payments queue consumers
  • 16:02 Jeff_Green: taking payments & listener out of maintenance mode
  • 15:58 ottomata: puppet disabled on stat1003 for reportupdater deployment, paused until dan is out of meetings
  • 14:56 cmjohnson1: rebooting iron to fix virtual console problem
  • 14:34 jynus: upgrade and restart dbstore2002 to apply new replication filters
  • 14:28 apergos: all services back in operation from dataset1001
  • 14:20 apergos: web service restored for dumps/download.wikimedia.org
  • 14:16 moritzm: installing perl security updates
  • 14:09 Jeff_Green: putting payments, civi, listener into maintenance mode
  • 14:06 ejegg: disabled fundraising queue consumer jobs
  • 14:04 moritzm: installing postgres security updates on labsdb1004
  • 14:02 bblack: puppet back online for all caches (ipsec changes complete)
  • 13:48 moritzm: installing pillow security updates
  • 13:41 bblack: disabling puppet on esams,ulsfo,codfw caches for ipsec changes, to minimize alertspam...
  • 13:39 urandom: canceling doomed bootstrap on restbase1009-a.eqiad.wmnet
  • 13:31 apergos: dumps/download wikimedia.org service interrupted now while server is being upgraded
  • 13:03 apergos: nfs filesystem from dataset1001 now unavailable as we prep for upgrade
  • 11:19 jynus: deploying new replication check algorithm cross-fleet
  • 10:30 volans: Start copying data from es200[124] to es201[123] (ETA ~16-17h) T127330
  • 10:07 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Update codfw external storage servers topology T127330 (duration: 00m 39s)
  • 09:10 moritzm: re-imaging iron with jessie
  • 08:41 jynus: downtiming all mysql replicas lag for 2 hours to test new alert check
  • 06:24 mutante: gerrit being restarted for config change 274741
  • 03:34 urandom: Starting `nodetool cleanup' on restbase100{1,2,7-a,7-b}.eqiad.wmnet and restbase1010-a : T95253
  • 03:28 urandom: starting decomission of restbase1009.eqiad.wmnet : T95253
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Mar 4 02:33:49 UTC 2016 (duration 7m 44s)
  • 02:26 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 11m 22s)
  • 00:50 ejegg: updated payments wiki from 111f92133cbd6b3890f53e29831ac022d3c26f51 to fbd22230ebf14e57f2c69ef1b342d4cdbc47d9a6
  • 00:35 logmsgbot: awight@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: followup Namespace configuration on wuu.wikipedia (duration: 00m 26s)
  • 00:34 RoanKattouw: Running extensions/Echo/maintenance/backfillUnreadWikis.php on all wikis. This will probably take a few days
  • 00:26 ejegg: rolled payments wiki back to 111f92133cbd6b3890f53e29831ac022d3c26f51
  • 00:24 ejegg: updated payments wiki from 111f92133cbd6b3890f53e29831ac022d3c26f51 to b8ac7f9188ba997d7e26882d0911f4eb97276d86
  • 00:23 logmsgbot: awight@tin Synchronized wmf-config: SWAT: Disable useless Echo eventlogging schema; Site name configuration on wuu.wikipedia; Set default completion suggester scoring for beta and prod (take 2) (duration: 00m 32s)
  • 00:12 logmsgbot: awight@tin Synchronized wmf-config: SWAT: Disable useless Echo eventlogging schema; Site name configuration on wuu.wikipedia; Set default completion suggester scoring for beta and prod (duration: 00m 36s)

2016-03-03

  • 22:36 twentyafterfour: applying hotfix for T128751 (link ../../../extensions/Gerrit* to /srv/phab/phabricator/src/extensions/Gerrit*) - will submit puppet patch to make this permanent.
  • 22:01 urandom: rolling restart of restbase production complete : T127387
  • 21:53 urandom: rolling restart of restbase production (config change) : T127387
  • 21:45 urandom: rolling restart of restbase staging complete : T127387
  • 21:40 urandom: rolling restart of restbase staging (config change) : T127387
  • 21:38 logmsgbot: legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Enable VisualEditor Single Edit Tab on officewiki https://gerrit.wikimedia.org/r/#/c/274814/ (duration: 00m 33s)
  • 20:26 urandom: rolling restart of restbase production complete : T127387
  • 20:22 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: (no message)
  • 20:22 logmsgbot: demon@tin Synchronized php-1.27.0-wmf.15/extensions/Echo: Roan made me do it (duration: 00m 53s)
  • 20:20 urandom: rolling restart of restbase production to apply config change : T127387
  • 20:03 urandom: rolling restart of restbase staging complete : T127387
  • 19:58 urandom: rolling restart of restbase in staging to apply config change : T127387
  • 19:44 gehel: elastic1031.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 19:27 volans: Completed migration of data from es200[68] to es201[579], added es201[579] to tendril. T127330
  • 19:08 logmsgbot: csteipp@tin Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 00m 40s)
  • 19:05 chasemp: create temp 3T lv on labstore2001 to store test backup deltas
  • 18:41 gehel: elastic1030.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 18:14 urandom: lowering outbound stream throughput limit on restbase1010-a.eqiad.wmnet to 25mbps : T128107 T95253
  • 17:58 urandom: increasing stream throughput for restbase1010-b.eqiad.wmnet boostrap by 25mbps (5x5) : T128107 T95253
  • 17:09 gehel: elastic1029.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 16:25 gehel: elastic1028.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 16:25 Krinkle: krinkle@terbium mwscript deleteEqualMessages.php --wiki frwiki
  • 16:18 godog: restbase1001 bump stream throughput to 60mbps on restbase1001
  • 16:10 jynus: downtime on all mariadb replication lag checks in preparation to changing its check
  • 16:09 logmsgbot: jzerebecki@tin Synchronized w/static/images/project-logos/wikitech.png: Change the wikitech favicon and logo to the actual wikitech logo a29196d359b9924719b9166dca98a474ad9a6a2b 2 of 2 (duration: 00m 29s)
  • 16:08 logmsgbot: jzerebecki@tin Synchronized w/static/favicon/wikitech.ico: Change the wikitech favicon and logo to the actual wikitech logo a29196d359b9924719b9166dca98a474ad9a6a2b 1 of 2 (duration: 00m 30s)
  • 16:06 logmsgbot: jzerebecki@tin Synchronized wmf-config/CirrusSearch-production.php: CirrusSearch: Enable popqual (quality+pageviews) scoring method for the completion suggester T127943 (duration: 00m 37s)
  • 15:42 gehel: elastic1027.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 15:12 godog: cassandra throttle 1001, 1002, 1007-a, 1007-b, and 1010-a to 30mbps T95253
  • 15:03 moritzm: uploaded openssl 1.0.2g for jessie-wikimedia
  • 14:34 gehel: elastic1026.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 13:47 gehel: elastic1025.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 12:48 gehel: elastic1024.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 12:22 godog: temporary repool ms-fe1004, apply https://gerrit.wikimedia.org/r/#/c/273431 to test T128081
  • 12:12 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: Revert - Remove rdb1003 from the Redis JobQueue pool for maintenance (duration: 00m 28s)
  • 12:07 logmsgbot: elukey@tin Synchronized wmf-config/jobqueue-eqiad.php: Remove rdb1003 from the Redis JobQueue pool for maintenance (duration: 00m 32s)
  • 12:02 gehel: elastic1023.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 11:55 _joe_: disabled notifications from the redises IPSEC checks while replication is disabled
  • 11:53 volans: Migrating data es2006->es2015 and es2008->es2017->es2019 T127330
  • 11:40 moritzm: upgrading cp1008 to openssl 1.0.2g
  • 11:21 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Depool es2005,es2008 to migrate data to es2015,es2017 T127330 (duration: 00m 53s)
  • 10:54 godog: replicate swift unsharded -deleted containers eqiad -> codfw T128096
  • 10:48 volans: Changing local replica topology for shard es3 in codfw for T127330
  • 10:37 volans: Changing local replica topology for shard es2 in codfw for T127330
  • 10:21 _joe_: rolling restart of strongswan on eqiad failing servers
  • 10:17 _joe_: restarted strongswan on mc1011
  • 10:07 gehel: elastic1022.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 09:57 volans: Added es2014,es2016,es2018 to tendril [ T127330 ]
  • 09:46 jynus: schema change finished on all hosts (except delayed slaves)
  • 09:21 _joe_: puppet re-enabled everywhere, now troubleshooting ipsec issues
  • 08:59 moritzm: repooled scb1002
  • 08:49 moritzm: repooled scb1001, depooling scb1002 for nodejs upgrade
  • 08:35 _joe_: disabled puppet across the main redises fleet in order to merge https://gerrit.wikimedia.org/r/271261 safely
  • 08:33 moritzm: depooling scb1001 for nodejs upgrade
  • 08:27 jynus: altering heartbeat table on all production servers
  • 06:39 ebernhardson: upgrade elastic1021.eqiad.wmnet to elasticsearch 1.7.5
  • 05:47 ebernhardson: upgrade elastic1020.eqiad.wmnet to elasticsearch 1.7.5
  • 05:02 ebernhardson: upgrade elastic1019.eqiad.wmnet to elasticseach 1.7.5
  • 04:42 eileen: CiviCRM updated from ff0e0c6f7bc8424f8097bc66e529b7836474d416 to cbcfafcb2e6d6e1dae12a2b2d554445871992aff
  • 04:05 bblack: disabling puppet on caches for a bit, JIC
  • 03:51 ebernhardson: upgrade elastic1018.eqiad.wmnet to elasticsearch 1.7.5
  • 03:13 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Mar 3 03:13:23 UTC 2016 (duration 8m 38s)
  • 03:04 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 18m 46s)
  • 03:03 ebernhardson: upgrade elastic1017.eqiad.wmnet to elasticsearch 1.7.5
  • 02:28 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 13m 44s)
  • 02:18 ebernhardson: upgrade elastic1016.eqiad.wmnet to elasticserach 1.7.5
  • 02:03 bd808: Events flowing into logstash elasticsearch cluster again after forcing allocation of missing shard replica
  • 01:59 twentyafterfour: puppet ran on iridium, no errors. :)
  • 01:54 bd808: Deleted logstash-2016.02.03 index to free disk space
  • 01:51 bd808: New index not being created due to low disk watermark exceeded on logstash1006
  • 01:49 bd808: Logstash elasticsearch cluster not responsive; investigating
  • 01:48 ebernhardson: upgrade elastic1015.eqiad.wmnet to elasticsearch 1.7.5
  • 01:44 twentyafterfour: phabricator is back online
  • 01:22 twentyafterfour: manually installed scap package on iridium, will fix in puppet immediately after maintenance is finished
  • 01:19 mutante: elastic1013 "dpkg reports broken packages "
  • 01:19 twentyafterfour: puppet says "Provider scap3 is not functional on this host"
  • 01:16 twentyafterfour: testing puppet on iridium
  • 01:07 mutante: iridium - stop apache
  • 01:00 ebernhardson: upgrade elastic1014.eqiad.wmnet to elasticsearch 1.7.5
  • 00:51 twentyafterfour: Phabricator will be going down for maintenance around 01:00 UTC (Approximately 10 minutes from now)
  • 00:37 logmsgbot: hoo@tin Synchronized php-1.27.0-wmf.15/extensions/TemplateData/: Change default format to null instead of 'inline' (duration: 01m 02s)
  • 00:30 bd808: Ran sync-common on mw1025
  • 00:23 hoo: Ran sync-common on mw1025, because it apparently didn't pick up recent changes
  • 00:20 ebernhardson: upgrade elastic1013.eqiad.wmnet to elasticsearch 1.7.5
  • 00:19 hoo: Restarted hhvm on mw1025 because of "Cannot access property on non-object in /srv/mediawiki/php-1.27.0-wmf.14/includes/filerepo/LocalRepo.php"
  • 00:14 mutante: wikitech: delete /a/backup/public/foo and ./bar cruft
  • 00:10 logmsgbot: hoo@tin Synchronized wmf-config/Wikibase-production.php: Set $wgWikimediaBadgesCommonsCategoryProperty to null on commons (T128661) (duration: 01m 09s)
  • 00:00 andrewbogott: restarting pdns on labservices1001

2016-03-02

  • 23:51 chasemp: ran puppet on elastic1012 manually which started a mystery stopped (crashed?) elastic search
  • 22:49 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.15/includes/api/ApiMain.php: Fix PHP Notice (duration: 01m 17s)
  • 22:28 urandom: enabling brotli compression on local_group_wikipedia_T_parsoid_html.data in staging, and forcing rewrite of corresponding tables on xenon : T125906
  • 21:12 urandom: forcing a major compaction on {local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ,local_group_wikipedia_T_parsoid_html}.data, xenon.eqiad.wmnet : T125906
  • 20:53 bblack: repooling cp1048, seems unlikely to recrash (rare kernel bug)
  • 20:46 bblack: cp1048: depooled in confd, too
  • 20:45 bblack: cp1048: unresponsive console, powercycled
  • 20:35 gehel: elastic1011.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 20:11 logmsgbot: demon@tin Finished scap: group1 to wmf.15 (duration: 08m 41s)
  • 20:03 logmsgbot: demon@tin Started scap: group1 to wmf.15
  • 19:21 mobrovac: restbase rolling restart for https://gerrit.wikimedia.org/r/274456 T127387
  • 19:07 volans: Data transfer completed, started MySQL and replica on es2014,es2016,es2018 [ T127330 ]
  • 18:58 gehel: elastic1010.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 18:44 apergos: rolled back all changes for dataset1001, running with same old precise OS, grrrrr
  • 18:06 apergos: still slugging away at pxe book with these broadcom netxtreme II nics (dataset1001)
  • 17:55 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Repooling external storage DBs in codfw after data was copied: T127330 (duration: 01m 06s)
  • 17:44 godog: bounce statsdlb on graphite1001 to add 3x statsite instances T105679
  • 17:35 jynus: disabling puppet on db1009 (m5-master) to test heartbeat changes
  • 17:20 gehel: elastic1009.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 16:54 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enabling ShortURL for bnwikisource gerrit:273936 (duration: 01m 04s)
  • 16:39 mobrovac: restbase deploy end of fb66dbf
  • 16:34 gehel: elastic1008.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 16:31 mobrovac: restbase deploy continue of fb66dbf for the rest of the nodes
  • 16:30 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.15/extensions/ContentTranslation/includes/TranslationStorageManager.php: SWAT: Use correct timestamp for updates gerrit:274363 (duration: 00m 59s)
  • 16:28 urandom: starting post-bootstrap (1009-b) cleanup on restbase100{5,6,9-a}.eqiad.wmnet : T95253
  • 16:25 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.15/extensions/ContentTranslation/modules/widgets/translator/ext.cx.translator.js: SWAT: Translator widget: Fix js error if translator does not have recent contributions gerrit:274340 (duration: 01m 05s)
  • 16:07 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Do not send Referer from private wikis gerrit:274414 (duration: 01m 18s)
  • 15:53 apergos: extending maintenance window for dataset1001 by one hour to 5 pm UTC
  • 15:53 mobrovac: restbase deploy start of fb66dbf on restbase1001
  • 15:44 apergos: may extend the maintenance window for dataset1001 upgrade if headway can be made on PXE boot issues... 15 minutes left to decide
  • 15:16 andrewbogott: rebooting californium just to make sure dist-upgrade didn’t mess up grub
  • 15:15 gehel: elastic1007.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 15:06 andrewbogott: running apt-get dist upgrade to upgrade californium packages to openstack Liberty
  • 15:02 mobrovac: restbase reverting to fa1207e95, problems spotted in logstash
  • 14:58 mobrovac: restbase deploy start of 5def2f8 on restbase1001
  • 14:32 gehel: elastic1006.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 14:06 godog: bootstrap restbase1010-a T128107
  • 14:03 apergos: web service for dumps.wikimedia.org and download.wikimedia.org is now unavailable (upgrade of server to jessie)
  • 13:32 apergos: nfs service for dataset1001 disabled (impacts users of stat100{2,3} in prep for jessie upgrade
  • 13:23 gehel: elastic1005.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 13:13 _joe_: re-enabled puppet on scb1002, repooled scb1001 for mobileapps
  • 13:10 mobrovac: mobileapps re-deploying d384f1ba for T113542
  • 12:33 bblack: restarted logstash on logstash1002
  • 12:32 mobrovac: mobileapps stopping (again) the service on scb1001 for debugging, T113542
  • 12:29 bblack: restarted logstash on logstash1001
  • 12:27 _joe_: puppet disabled on both scb1001/2, depooled scb1001 for moborovac to test and config manually patched on scb1002 so that it runs with the old code correctly
  • 12:25 mobrovac: mobileapps rolling back to 68e38ec7, problems found in the latest deploy for T113542
  • 12:00 mobrovac: mobileapps stopping the service on scb1001 for debug purposes, T113542
  • 11:56 _joe_: stopped puppet on scb1002, depooled scb1001 from mobileapps
  • 11:36 mobrovac: mobileapps deploying d384f1ba
  • 11:09 jynus: profiling db1023 and db1061 for 24 hours- 1/20th of the queries slightly slower
  • 10:42 hashar: Zuul should no more be caught in death loop due to Depends-On on an event-schemas change. Hole filled with https://gerrit.wikimedia.org/r/#/c/274356/ T128569
  • 10:42 moritzm: restarting graphite-web on graphite1001 (for django security update)
  • 10:36 elukey: stopped Redis multi-instance on rdb1006 (Job Queue slave) as pre-step for Debian re-image
  • 10:16 gehel: elastic1004.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 09:43 volans: Cloning es2005->es2014, es2007->es2016, es2009->es2018, see T127330
  • 09:30 moritzm: installing nodejs updates on restbase*
  • 09:19 elukey: redis multi-instance stopped on rdb1004 (jobqueue slave) as pre-step for Debian re-image
  • 09:16 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Depooling external storage DBs in codfw for migration: T127330 (duration: 01m 24s)
  • 09:13 hashar: Zuul went crazy / caught in a loop of doom. Same has Saturday. It went back magically at 08:32 UTC T128569
  • 08:48 gehel: elastic1003.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 08:33 moritzm: installing Django security updates
  • 08:17 _joe_: disabling puppet on all memcached hosts in preparation for enabling ipsec
  • 07:36 logmsgbot: legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Disable $wgReferrerPolicy on private wikis (duration: 01m 01s)
  • 06:45 _joe_: rebooting serpens
  • 03:04 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Mar 2 03:04:14 UTC 2016 (duration 8m 49s)
  • 02:55 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 09m 31s)
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 12m 32s)
  • 00:45 logmsgbot: krenair@tin Synchronized portals: https://gerrit.wikimedia.org/r/#/c/274316/ - try #2, this time with the submodule update (duration: 01m 17s)
  • 00:44 logmsgbot: krenair@tin Synchronized portals/prod/wikipedia.org/assets: https://gerrit.wikimedia.org/r/#/c/274316/ - try #2, this time with the submodule update (duration: 01m 16s)
  • 00:31 logmsgbot: krenair@tin Synchronized portals: https://gerrit.wikimedia.org/r/#/c/274316/ (duration: 01m 18s)
  • 00:30 logmsgbot: krenair@tin Synchronized portals/prod/wikipedia.org/assets: https://gerrit.wikimedia.org/r/#/c/274316/ (duration: 01m 18s)
  • 00:26 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/272926/ - prepare for VE default switch on dewiki (duration: 01m 17s)
  • 00:12 logmsgbot: krenair@tin Synchronized dblists/visualeditor-default.dblist: https://gerrit.wikimedia.org/r/#/c/274129/ - +testwiki (duration: 01m 20s)
  • 00:10 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/274129/ - VE SET on mediawikiwiki/testwiki (duration: 01m 21s)
  • 00:04 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/271932/ - disable Gather on enwiki (duration: 01m 26s)

2016-03-01

  • 23:57 ebernhardson: upgrade elastic1002.eqiad.wmnet to elasticsearch 1.7.5
  • 23:17 mutante: maps-test2001 - could not find dependency for postgres class is NOT related to my recent change. icinga crit since a long time
  • 22:34 mutante: re-enabled puppet runs on all mw* servers, mediawiki roles now in modules/role/manifests/mediawiki/
  • 22:27 mutante: temp. disabling puppet runs on mw appservers to be extra safe during mediawiki module change
  • 21:29 gehel: elastic1001.eqiad.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 20:29 logmsgbot: demon@tin Finished scap: group0 to wmf.15 (duration: 31m 24s)
  • 19:58 logmsgbot: demon@tin Started scap: group0 to wmf.15
  • 19:19 jynus: testing heartbeat in m5 (db1009, db2030)
  • 19:14 logmsgbot: demon@tin scap aborted: testwikis to wmf.15 and rebuild l10n (duration: 01m 19s)
  • 19:14 chasemp: clean out /var/log/atop and /var/log/account on iridium
  • 19:13 logmsgbot: demon@tin Started scap: testwikis to wmf.15 and rebuild l10n
  • 18:53 mutante: iridium - gzip /var/log/atop/atop_20160*
  • 18:51 mutante: iridium: apt-get clean for some more disk space
  • 18:49 subbu: finished deploying parsoid sha 1f7ed5d0
  • 18:44 subbu: synced parsoid code; restarted parsoid on wtp1002 as a canary
  • 18:41 subbu: starting parsoid deploy
  • 17:52 gehel: elastic2024.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 17:18 mobrovac: restbase rolling-restart restbase for https://gerrit.wikimedia.org/r/#/c/273974/
  • 17:05 logmsgbot: thcipriani@tin Synchronized wmf-config/ProductionServices.php: SWAT: Add kafka1012.eqiad.wmnet back to the media-wiki config gerrit:273488 (duration: 00m 39s)
  • 16:44 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable rollbacker and suppressredirect group at cewiki gerrit:273828 (duration: 00m 41s)
  • 16:40 gehel: elastic2023.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 16:39 hashar: restarting Jenkins
  • 16:37 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Correct one Domain at $wgCopyUploadsDomains gerrit:273776 (duration: 00m 40s)
  • 16:33 hashar: Bunch of Jenkins job got stall because I have killed threads in Jenkins to unblock integration-slave-trusty-1003 :-( Jenkins / Zuul is catching up.
  • 16:32 logmsgbot: thcipriani@tin Synchronized wmf-config/filebackend-production.php: SWAT: Configure redis LockManager in both DCs, use the master everywhere. PART II gerrit:266514 (duration: 00m 46s)
  • 16:30 logmsgbot: thcipriani@tin Synchronized wmf-config/ProductionServices.php: SWAT: Configure redis LockManager in both DCs, use the master everywhere. PART I gerrit:266514 (duration: 00m 40s)
  • 16:24 logmsgbot: thcipriani@tin Synchronized wmf-config/redis.php: SWAT: Use wmfMasterDatacenter for picking the master redis config gerrit:266513 (duration: 00m 39s)
  • 16:18 logmsgbot: thcipriani@tin Synchronized wmf-config/CirrusSearch-production.php: SWAT: Add references to wmfServices for Cirrusearch gerrit:266512 (duration: 00m 56s)
  • 15:45 gehel: elastic2022.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 14:55 logmsgbot: elukey@tin Synchronized wmf-config/filebackend-production.php: Add mc1003 to the lock managers pool after maintenance (duration: 00m 40s)
  • 14:47 elukey: mc1003.eqiad added back to the redis/memcached pool after maintenance.
  • 14:34 gehel: elastic2021.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 13:53 gehel: elastic2020.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 13:09 logmsgbot: elukey@tin Synchronized wmf-config/filebackend-production.php: Remove mc1003 from the lock managers pool for maintenance (duration: 00m 40s)
  • 12:48 elukey: removed mc1003 from redis/memcached pools for maintenance
  • 12:37 gehel: elastic2019.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 12:16 moritzm: shutting down curium (decomissioned)
  • 11:53 moritzm: shutting down berkelium (decomissioned)
  • 11:44 jynus: shutting down pc100[123]
  • 11:43 logmsgbot: elukey@tin Synchronized wmf-config/filebackend-production.php: Add mc1002 back to the lock managers pool after maintenance (duration: 01m 01s)
  • 11:40 gehel: elastic2018.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 11:20 elukey: mc1002.eqiad added back to the memcached/redis pools after maintenance
  • 10:53 jynus: disabling puppet and following steps to decommission pc100[123]
  • 10:42 gehel: elastic2017.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 10:06 elukey: Amended previous log - Remove mc1002 from the lock managers after maintenance
  • 10:05 logmsgbot: elukey@tin Synchronized wmf-config/filebackend-production.php: Add mc1002 from the lock managers after maintenance (duration: 00m 56s)
  • 09:46 gehel: elastic2016.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 09:32 elukey: removed mc1002 from the redis/memcached pools for maintenance
  • 07:11 ebernhardson: upgrade elastic2015.codfw.wmnet to elasticsearch 1.7.5
  • 06:10 ebernhardson: upgrade elastic2014.codfw.wmnet to elasticsearch 1.7.5
  • 04:55 ebernhardson: upgrade elastic2013.codfw.wmnet to elasticsearch 1.7.5
  • 04:03 ebernhardson: upgrade elastic2012.codfw.wmnet to elasticsearch 1.7.5
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Mar 1 02:32:41 UTC 2016 (duration 7m 36s)
  • 02:30 ebernhardson: upgrade elastic2011.codfw.wmnet to elasticsearch 1.7.5
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 10m 50s)
  • 01:40 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.14/extensions/CentralAuth: Idc873134: Avoid using "new CentralAuthUser" since it avoids the cache (duration: 00m 51s)
  • 01:36 ebernhardson: upgrade elastic2010.codfw.wmnet to elasticsearch 1.7.5
  • 01:03 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.14/includes/user/User.php: I419f356b: Cache user data in memory (duration: 00m 42s)
  • 00:45 ebernhardson: upgrade elastic2009.codfw.wmnet to elasticsearch 1.7.5
  • 00:39 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.14/extensions/Echo/Hooks.php: Add debug logging for thank-you-edit notifications (duration: 00m 43s)
  • 00:36 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Configure default Echo subscriptions user options on hewiki (take 2, part 2) (duration: 00m 40s)
  • 00:35 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Configure default Echo subscriptions user options on hewiki (take 2, part 1) (duration: 00m 42s)
  • 00:31 logmsgbot: catrope@tin Synchronized wmf-config/StartProfiler.php: XHGui: Use SCRIPT_NAME as the URI (duration: 00m 46s)
  • 00:30 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Revert "Configure default Echo subscriptions user options on hewiki", doesnt work (duration: 00m 46s)
  • 00:26 ori: Cleaned up remnants of XHGui role on hafnium, now that XHGui is on tungsten
  • 00:13 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Change rate for reader segmentation survey (duration: 00m 41s)
  • 00:10 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Configure default Echo subscriptions user options on hewiki (duration: 00m 48s)

2016-02-29

  • 23:56 ebernhardson: upgrade elastic2008.codfw.wmnet to elasticsearch 1.7.5
  • 23:17 eileen: Updating civicrm from 181ef79fd1360ca337990b6768ebc10b1185a431 to ff0e0c6f7bc8424f8097bc66e529b7836474d416
  • 23:05 ebernhardson: upgrade elastic2006.codfw.wmnet to elasticsearch 1.7.5
  • 22:56 logmsgbot: ori@tin Synchronized wmf-config/StartProfiler.php: I6d3ab949a6a: Apply xhgui role on tungsten (duration: 00m 56s)
  • 22:28 ostriches: phabricator: stopped phd daemons for the time being so replication can catch up and number of connections stop yelling ad people
  • 22:25 ebernhardson: upgrade elastic2006.codfw.wmnet to elasticsearch 1.7.5
  • 22:14 jynus: setting innodb_flush_log_at_trx_commit=0 on phabricator db slave
  • 21:37 ebernhardson: upgrade elastic2005.codfw.wmnet to elastic 1.7.5
  • 21:37 arlolra: updated Parsoid to version d809ad7a
  • 21:24 logmsgbot: elukey@tin Synchronized wmf-config/filebackend-production.php: Add mc1001 from the lock managers after maintenance (duration: 00m 54s)
  • 20:56 elukey: Added mc1001.eqiad back into memcached/redis pools after maintenance
  • 20:21 gehel: elastic2004.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 20:04 ori: Restarted statsdlb on graphite1001 with xxh64() improvement
  • 19:45 mutante: new planets, Albanian and Bulgarian: https://sq.planet.wikimedia.org/ | https://bg.planet.wikimedia.org/
  • 19:25 elukey: removed mc1001.equiad from the redis/memcached pools for maintenance
  • 19:22 gehel: elastic2003.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 19:13 logmsgbot: ori@tin Synchronized wmf-config/filebackend-production.php: I0ad3e23719c: Follow-up for I8a87a679ed: unlist rdb1 server (duration: 00m 44s)
  • 19:07 logmsgbot: elukey@tin Synchronized wmf-config/filebackend-production.php: Remove mc1001 from the lock managers for maintenance (duration: 00m 41s)
  • 18:40 logmsgbot: mattflaschen@tin Synchronized dblists/nonflow.dblist: Re-enable Flow on ptwikibooks. Accidentally disabled earlier. (duration: 00m 47s)
  • 18:31 ejegg: updated SmashPig from cf71e988c25e8569b84fb4ee52d1b043c7da27a6 to 154688db3595060b55498dea3cdf1ee206a854a0
  • 18:05 logmsgbot: krinkle@tin Synchronized w/static.php: (no message) (duration: 01m 01s)
  • 18:01 gehel: elastic2002.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 17:48 ejegg: disable Adyen SmashPig job runner
  • 17:46 ejegg: updated smashpig from e32e8fcdb43fedee6d2296e25ca1aff1e5028163 to cf71e988c25e8569b84fb4ee52d1b043c7da27a6
  • 16:43 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Set transwiki import sources for hi.wikiquote gerrit:272992 (duration: 00m 41s)
  • 16:38 logmsgbot: thcipriani@tin Synchronized wmf-config/throttle.php: SWAT: Maintenance on throttle.php gerrit:273580 (duration: 00m 41s)
  • 16:32 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.14/extensions/Flow: SWAT: Fix board move DB issue using new hook TitleMoveStarting gerrit:273536 (duration: 01m 00s)
  • 16:28 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.14/extensions/Wikidata: SWAT: Update Wikibase: Fix over-encoding of expanded URLs gerrit:273917 (duration: 02m 04s)
  • 16:23 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.14/includes/MovePage.php: SWAT: Add TitleMoveStarting, mirroring TitleMoveCompleting gerrit:273535 (duration: 00m 41s)
  • 16:09 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Temporarily disable thank-you-edit notifications gerrit:273836 (duration: 00m 41s)
  • 16:06 logmsgbot: thcipriani@tin Synchronized wmf-config/Wikibase.php: SWAT: Update Wikidata property blacklist gerrit:271971 (duration: 01m 01s)
  • 16:02 _joe_: stopping hhvm on mw1050 (depooled) for testing bug T128380
  • 15:50 urandom: Rolling restart of restbase production complete : T103124
  • 15:43 urandom: Perform rolling restart of restbase in production cluster : T103124
  • 15:42 urandom: Rolling restart of restbase staging complete : T103124
  • 15:34 urandom: Perform rolling restart of restbase in staging cluster : T103124
  • 15:32 urandom: forcing puppet run in restbase clustter ((noop) config deploy) : T103124
  • 15:30 urandom: forcing puppet run in restbase staging ((noop) config deploy) : T103124
  • 14:17 bblack: upgrading nginx on cp* to 1.9.4-1+wmf2 for T126616
  • 14:16 gehel: elastic2001.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 14:11 gehel: re-enabling icinga notifications for elasticsearch cluster on codfw
  • 14:00 gehel: icinga: adding authorization for Gehel
  • 13:54 bblack: puppet back to normal on caches
  • 13:35 bblack: disabling puppet on cp* for traffic-pool work
  • 12:50 godog: gmetad on uranium restarted following https://gerrit.wikimedia.org/r/273881 it will converge asap
  • 12:16 gehel: elastic2001.codfw.wmnet: upgrading to 1.7.5, shipping logs to logstash (T122697, T109101)
  • 11:57 jynus: reimaging dbproxy1005 with jessie before its put into production (potential alerts due to new role)
  • 10:46 ema: hhvm restarted on mw1119
  • 10:25 jynus: powercycling mw1140, almost 100% unresponsive, OOM probable cause
  • 09:58 godog: bootstrap restbase1009-b T95253
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Feb 29 02:32:43 UTC 2016 (duration 7m 43s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 11m 12s)

2016-02-28

  • 22:34 logmsgbot: legoktm@tin Synchronized php-1.27.0-wmf.14/includes/parser/ParserCache.php: Include ParserCache::save() backtrace in MF pollution debug log - T124356 (duration: 00m 50s)
  • 07:49 apergos: restarted logmsbot on neon
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Feb 28 02:32:00 UTC 2016 (duration 7m 42s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 10m 24s)
  • 00:59 ori_: Restarted logmsgbot on neon; it lost its IRC connection and did not recover it.

2016-02-27

  • 23:19 legoktm: 23:19:43 Synchronized php-1.27.0-wmf.14/extensions/wikihiero/: Revert "Compress PNGs with zopflipng" - https://gerrit.wikimedia.org/r/273661 (duration: 00m 46s)
  • 22:39 legoktm: legoktm just deployed https://gerrit.wikimedia.org/r/#/c/273655/ (MF cache pollution debug log) for T124356
  • 21:19 urandom: issuing test repair on cerium (restbase staging), wp_parsoid html keyspace, (-w 5) : T108611
  • 20:08 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.14/includes/mime.types: Iec9459c921: Unbreak ULS fonts via wmfstatic (duration: 00m 41s)
  • 19:32 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.14/includes/user/User.php: Revert I43cde3a48: Prevent duplicate memcached lookups for user record (T128246) (duration: 00m 56s)
  • 04:15 ori: On silver: commented out exception from I3eae28a58 to work around T128246

2016-02-26

  • 18:41 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.14/includes/user/User.php: I43cde3a48: Prevent duplicate memcached lookups for user record (duration: 01m 02s)
  • 16:46 chasemp: labstore1001 'mdadm --manage /dev/md126 --add /dev/sdaf'
  • 15:51 jynus: shutting down mariadb on db2030 to clone from db1009
  • 15:24 urandom: forcing puppet run on restbase1009.eqiad.wmnet
  • 15:24 urandom: re-enabling puppet on restbase1009.eqiad.wmnet
  • 15:20 jynus: performing backup of m5-master mysql data
  • 15:17 urandom: blocking CQL native port on restbase1009.eqiad.wmnet : https://phabricator.wikimedia.org/P2677
  • 15:14 urandom: disabling puppet on restbase1009.eqiad to preserve local changes during a quick experiment
  • 15:03 hashar: Switched MediaWiki core npm test to Nodepool instance T119143
  • 13:59 logmsgbot: krinkle@tin Synchronized wmf-config/InitialiseSettings.php: T99096: Enable wmgUseWmfstatic on remaining wikis (duration: 00m 50s)
  • 13:54 moritzm: rebooting lithium for kernel update
  • 13:27 godog: launch swiftrepl continuous replication for unsharded containers on ms-fe1003 T128096
  • 12:31 elukey: added mc1017/mc1018 back to the redis/memcached pools after maintenance
  • 11:42 godog: run swiftrepl eqiad -> codfw for unsharded containers
  • 11:01 elukey: removed mc1018/1017 from the redis memcached pools for maintenance
  • 09:46 elukey: mc1016.eqiad re-added to the memcached/redis pools after maintenance
  • 08:12 elukey: removed mc1016.eqiad from the redis/memcached pools for maintenance
  • 08:01 moritzm: blacklisting aufs kernel module
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Feb 26 02:32:19 UTC 2016 (duration 7m 42s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 10m 34s)
  • 01:06 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Lower survey rate again (duration: 01m 05s)
  • 00:33 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.14/extensions/MobileFrontend/: SWAT (duration: 01m 05s)
  • 00:31 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Raise file upload limit to 2047MB (duration: 01m 02s)
  • 00:22 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Add plumbing for wmgUseGraphWithJsonNamespace (duration: 01m 03s)
  • 00:21 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add wmgUseGraphWithJsonNamespace (duration: 01m 04s)
  • 00:15 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Run reader segmentation survey at 1:500 to test DNT (duration: 01m 21s)

2016-02-25

  • 23:16 bblack: turning puppet back on for cp*, pushing changes through https://gerrit.wikimedia.org/r/#/c/273385/
  • 22:33 bblack: disabling puppet on caches for more scary VCL merges
  • 22:33 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.14/extensions/CentralAuth: I2cfcbf98f3: Reduce memcache traffic for central session storage (duration: 01m 21s)
  • 22:06 bblack: turning puppet back on for cp*, pushing changes through https://gerrit.wikimedia.org/r/273217 to all
  • 22:02 ejegg: set payments wiki to maintenance mode
  • 22:02 ejegg: shut down CiviCRM queue consumer jobs
  • 22:02 ejegg: shut down fundraising campaigns for queue server reboot
  • 21:59 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.14/extensions/CirrusSearch/includes/InterwikiSearcher.php: Fix undefined variable $term in InterwikiSearcher gerrit:273369 (duration: 01m 08s)
  • 21:49 ori: Upgraded Grafana to v3.0.0-pre1.
  • 21:28 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.14
  • 21:21 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.14/extensions/CirrusSearch/includes/Searcher.php: Update file that wasnt synced properly (duration: 01m 50s)
  • 20:58 ejegg: updated SmashPig from 5ef60d1fb73d70b7b1501bc97273505c2d625159 to e6d5c292352156d076ebb089c7e55c9d952b1149
  • 20:57 bblack: disabling puppet on caches for scarier VCL merges
  • 20:46 urandom: starting bootstrap of restbase1008-a T119935
  • 20:31 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: rollback wmf.14
  • 20:05 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.14
  • 19:13 elukey: added mc1015.eqiad.wmnet back to the redis/memcached pools
  • 18:58 jynus: testing schema changes on db2057:testwiki (s3)
  • 18:07 jynus: testing schema change on db2070 (enwiki)
  • 18:05 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.13/includes/session/SessionBackend.php: I43cde3a48: Revert Revert SessionBackend: skip isUserSessionPrevented check for anons (duration: 01m 38s)
  • 18:02 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.14/includes/session/SessionBackend.php: I43cde3a48: Revert Revert SessionBackend: skip isUserSessionPrevented check for anons (duration: 01m 39s)
  • 17:59 godog: bounce cassandra instances on restbase2001, cql not listening
  • 17:41 elukey: removed mc1015 from the redis/memcached pool for maintenance
  • 17:38 logmsgbot: thcipriani@tin Synchronized README: test sync-file (duration: 01m 42s)
  • 17:34 logmsgbot: thcipriani@tin Synchronized README: test sync-file (duration: 01m 46s)
  • 17:14 moritzm: installing xerces-c security updates
  • 17:06 jynus: applying schema change to flowdb (x1)
  • 17:01 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.13/extensions/QuickSurveys/resources/ext.quicksurveys.init/init.js: SWAT: Do not show a survey if DNT is enabled gerrit:273268 (duration: 01m 31s)
  • 16:58 jynus: applying schema change to officewiki:flow (s3)
  • 16:52 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.14/extensions/QuickSurveys/resources/ext.quicksurveys.init/init.js: SWAT: Do not show a survey if DNT is enabled gerrit:273267 (duration: 01m 35s)
  • 16:25 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: SWAT: wikidatawiki to wmf.14
  • 16:19 elukey: added mc1014 back to the redis/memcached pool after maintenance
  • 16:17 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.14/resources/lib/oojs-ui/oojs-ui-core.js: SWAT: OOjs UI: Fix #gatherPreInfuseState called incorrectly, causing TypeErrors gerrit:273250 (duration: 01m 42s)
  • 15:33 godog: stop cassandra/restbase on restbase2001 to finish raid0 grow
  • 15:23 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: wikidata back to wmf.13 for now
  • 11:47 hashar: Reverting session manager cherry picks from wmf branches ( https://gerrit.wikimedia.org/r/#/c/273201/ and https://gerrit.wikimedia.org/r/#/c/273202/ ) they have not been deployed after they got merged
  • 11:33 godog: depool ms-fe1004 for trusty dist-upgrade
  • 11:23 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1021 and db1024 (duration: 01m 45s)
  • 10:58 moritzm: powercycling cp2010
  • 10:41 godog: set xff to 0.01 for graphite metrics swift.*.containers (was 0.5)
  • 10:03 hashar: starting Jenkins
  • 09:57 hashar: Stopping Jenkins
  • 09:29 elukey: removed mc1014.eqiad from the redis/memcached pool for maintenance
  • 04:44 urandom: decommissioning Cassandra on restbase1008-a.eqiad.wmnet T119935
  • 04:35 urandom: restarting restbase1008-a to cancel rebuild T108611 T119935
  • 03:19 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Feb 25 03:19:30 UTC 2016 (duration 9m 6s)
  • 03:10 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 18m 00s)
  • 02:35 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 19m 03s)
  • 00:57 bd808: Started crashed Logstash process on logstash1002 (systemd doesn't restart authomatically due to T127677)
  • 00:03 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: I4cc836f3ca: Fully-qualify EventLoggingBaseUri (duration: 01m 40s)
  • 00:01 logmsgbot: ori@tin Synchronized wmf-config/StartProfiler.php: I016e23d81: xhgui: Sample fewer requests (1:100k instead of 1:10k) (duration: 01m 58s)

2016-02-24

  • 22:49 gehel: reboot logstash1005 for kernel and elasticsearch update
  • 22:29 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.14 too
  • 22:09 gehel: reboot logstash1006 for kernel and elasticsearch update
  • 22:07 logmsgbot: demon@tin Finished scap: group0 to wmf.14 (duration: 47m 50s)
  • 21:19 logmsgbot: demon@tin Started scap: group0 to wmf.14
  • 21:19 subbu: finished deploying parsoid version 581a43c75
  • 21:08 subbu: synced code; restarted parsoid on wtp1001 as a canary
  • 21:01 subbu: starting parsoid deploy
  • 20:49 moritzm: reboot logstash1004 for kernel/elasticsearch update
  • 20:47 ejegg: updated payments-wiki from fdea9fa9a5951d7ae57bb4d54aa9374f236638d6 to 111f92133cbd6b3890f53e29831ac022d3c26f51
  • 20:39 gehel: reboot logstash1003 for kernel and elasticsearch update
  • 20:28 gehel: reboot logstash1002 for kernel and elasticsearch update
  • 20:15 chasemp: reboot labstore1002 to ensure io scheduler grub options work
  • 20:13 moritzm: reboot logstash1001 for kernel update
  • 19:58 ejegg: rolled back payments-wiki to fdea9fa9a5951d7ae57bb4d54aa9374f236638d6 over a missing image :P
  • 19:56 ejegg: updated payments-wiki from fdea9fa9a5951d7ae57bb4d54aa9374f236638d6 to 8fad21b940e59b30ca5382a4b7b705f90404b337
  • 19:46 chasemp: runonce apply for https://gerrit.wikimedia.org/r/#/c/272891/ for labs vm's (only affects nfs clients)
  • 19:46 logmsgbot: legoktm@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/273032 (duration: 01m 41s)
  • 19:41 cmjohnson1: db1021 replacing disk 8
  • 19:04 logmsgbot: legoktm@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/273017 (duration: 01m 37s)
  • 18:52 papaul: es201[1-9] -signing puppet certs, salt-key. initial run
  • 18:39 mutante: restart gitblit
  • 18:10 bblack: disabling nginx keepalives on remaining clusters (upload, misc, maps)
  • 18:07 ori: hafnium did not have enough disk space for mongo to execute db.repairDatabase(), which is necessary for reclaiming disk space. Since existing profile data can be tossed, ran `db.dropDatabase(); db.repairDatabase();`. Need to think this through better, obviously.
  • 18:02 ori: mongodb on hafnium: ran `db.results.remove( { "meta.SERVER.REQUEST_URI": "/wiki/Special:BlankPage" } ); db.repairDatabase();` to drop profiles of PyBal requests and compact the database.
  • 17:44 logmsgbot: demon@tin Synchronized wmf-config/: poolcounter config simplification (duration: 01m 39s)
  • 17:21 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: Re-apply "Set $wgResourceBasePath to /w for medium wikis" (duration: 01m 42s)
  • 17:16 logmsgbot: demon@tin Synchronized wmf-config/: service entries for initialisesettings + fix (duration: 01m 45s)
  • 16:59 papaul: es201[1-9] disabling /revoking puppet and salt keys for re-image
  • 16:57 papaul: es200[1-9] disabling /revoking puppet and salt keys for re-image
  • 16:53 bd808: https://wmflabs.org/sal/production missing SAL data since 2016-02-21T14:39 due to bot crash; needs to be backfilled from wikitech data
  • 16:41 _joe_: started nutcracker on mw1099
  • 16:39 bblack: +do_gzip done for all cache_text
  • 16:38 logmsgbot: demon@tin Synchronized wmf-config/: Rationalize services definitions for labs too. (duration: 01m 45s)
  • 16:16 logmsgbot: demon@tin Synchronized wmf-config/CommonSettings.php: Don't yet allow wikidatasparql graph urls (duration: 01m 37s)
  • 16:12 logmsgbot: demon@tin Synchronized wmf-config/throttle.php: New throttle settings for Edit-a-thon workshop for orwiki (urgent) (duration: 01m 29s)
  • 16:09 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: Revert "Set $wgResourceBasePath to "/w" for medium wikis" (duration: 01m 30s)
  • 15:20 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.13/extensions/wikihiero: Ia0990f5f (duration: 01m 33s)
  • 15:18 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.14/extensions/wikihiero: Ia0990f5f (duration: 01m 33s)
  • 15:15 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.13/includes/OutputPage.php: Iad94bb2 (duration: 01m 50s)
  • 15:13 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.14/includes/OutputPage.php: Iad94bb2 (duration: 01m 43s)
  • 14:55 godog: nodetool-a repair -pr on restbase1008 T108611
  • 14:46 bblack: cache_text: -do_gzip experiment live on all
  • 14:43 godog: bump max reconstruction speed on restbase2001 to 20000 T127951
  • 14:08 logmsgbot: krinkle@tin Synchronized wmf-config/InitialiseSettings.php: T99096: Enable wmfstatic for medium wikis (duration: 01m 40s)
  • 13:46 godog: bump max reconstruction speed on restbase2001 to 12000 T127951
  • 13:15 godog: restart cassandra on restbase2001, throttle raid rebuild speed to 8MB/s
  • 11:06 godog: reboot restbase2001
  • 11:01 godog: mdadm errors on restbase2001 while growing the raid0, load increasing
  • 10:53 godog: grow restbase2001 raid0 to include a 5th disk
  • 08:42 moritzm: installing libssh2 security updates across the cluster
  • 05:46 logmsgbot: demon@tin Synchronized docroot/: removing skel-1.5 symlinks (duration: 01m 41s)
  • 05:37 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: import meta to wikitechwiki (duration: 01m 45s)
  • 05:28 twentyafterfour: applied https://secure.phabricator.com/rP03d6e7f1b699d89c829e92ba0da2178b41ad1d6a on iridium to fix visibility on pastes
  • 05:11 ori: Restarting HHVM on codfw app servers to make sure they pick a file-scope change to stop profiling PyBal health-checks
  • 05:04 logmsgbot: ori@mira Synchronized wmf-config/StartProfiler.php: I0e7be0b5: Never profile PyBal health-checks (duration: 03m 12s)
  • 03:19 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Feb 24 03:19:41 UTC 2016 (duration 8m 46s)
  • 03:10 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 18m 05s)
  • 02:40 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add default to fix notices about wmgUseFlow (duration: 01m 36s)
  • 02:37 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.13/extensions/Echo: SWAT (duration: 01m 40s)
  • 02:35 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.14/extensions/Echo: SWAT (duration: 01m 42s)
  • 02:33 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 13m 47s)
  • 01:50 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Use Flow dblists for deciding which wikis have Flow (duration: 01m 38s)
  • 01:47 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Recognize new Flow dblist (duration: 01m 35s)
  • 01:38 logmsgbot: catrope@tin Synchronized dblists/: Add new dblists for Flow (duration: 01m 32s)
  • 01:34 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Use project logos for welcome notifications (duration: 01m 34s)
  • 01:32 logmsgbot: catrope@tin Synchronized w/static/images: Add project logos for Echo (duration: 01m 33s)
  • 01:24 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/272889/ (duration: 01m 35s)
  • 01:21 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/272837/ (duration: 01m 35s)
  • 01:13 logmsgbot: aaron@tin Synchronized wmf-config/filebackend-production.php: Enable async secondary swift writes for non-"big" wikis (duration: 01m 31s)
  • 00:40 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.14/extensions/CentralAuth: SessionManager backports (duration: 01m 39s)
  • 00:37 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.14/extensions/WikimediaMessages: SessionManager backports (duration: 01m 37s)
  • 00:35 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.14/includes: SessionManager backports (duration: 02m 17s)
  • 00:00 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.13/includes: Backport MW_NO_SESSION changes (duration: 02m 13s)

2016-02-23

  • 23:57 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.13/autoload.php: (no message) (duration: 01m 42s)
  • 23:22 logmsgbot: krinkle@tin Synchronized w/static.php: (no message) (duration: 01m 35s)
  • 23:08 logmsgbot: aude@tin Synchronized wmf-config/Wikibase.php: Bump cache epoch for Wikidata for Wikiversity sitelinks section (duration: 01m 32s)
  • 22:36 logmsgbot: aude@tin Synchronized wmf-config/InitialiseSettings.php: Add Wikibase settings for Wikiversity (duration: 01m 31s)
  • 22:30 logmsgbot: aude@tin Synchronized wmf-config/Wikibase.php: Add Wikiversity site link section to Wikidata (duration: 02m 23s)
  • 22:23 logmsgbot: aude@tin Finished scap: Add Wikidata i18n messages to WikimediaMessages, update Wikibase on wmf14, and some core backports (duration: 51m 45s)
  • 21:32 chasemp: reboot mw1114 via mgmt as unresponsive
  • 21:31 logmsgbot: aude@tin Started scap: Add Wikidata i18n messages to WikimediaMessages, update Wikibase on wmf14, and some core backports
  • 21:25 ejegg: updated SmashPig from e106af8d440b461779cca60d55efba33dbfa7518 to 5ef60d1fb73d70b7b1501bc97273505c2d625159
  • 21:01 ejegg: updated SmashPig from d2083b28c28bd6afe61d30fc913fa0edd4203f82 to e106af8d440b461779cca60d55efba33dbfa7518
  • 20:50 urandom: initiating `nodetool upgradesstables -a' in staging to rewrite sstables (restored earlier compression settings)
  • 20:26 logmsgbot: aude@tin Synchronized php-1.27.0-wmf.13/includes/session/CookieSessionProvider.php: Fix invalid key warning (duration: 01m 32s)
  • 19:38 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/272799 (duration: 01m 37s)
  • 19:02 yuvipanda: download and decompress enwiki sessions dataset on labsdb1001
  • 18:59 ejegg: updated SmashPig from c34a5fc3cfbb16d6fb019bea9a3d8d73eeb73f7b to d2083b28c28bd6afe61d30fc913fa0edd4203f82
  • 18:57 cmjohnson1: db1021 replacing disk 7
  • 18:34 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: un-rv (duration: 01m 37s)
  • 18:23 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: rv (duration: 01m 32s)
  • 18:15 robh: disabled puppet on dataset1001
  • 18:07 ejegg: updated SmashPig from 97629339994bffe8831a9067f5e9c21fa423586b to c34a5fc3cfbb16d6fb019bea9a3d8d73eeb73f7b
  • 18:05 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/272778/ (duration: 01m 31s)
  • 17:34 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Repool es1016 T127330 (duration: 01m 39s)
  • 17:29 godog: bootstrap restbase1008-b T119935
  • 17:16 bblack: depool cp1061 (eqiad upload) - T125486
  • 17:02 ottomata: rebooting analytics1027 for kernel upgrae
  • 16:58 jynus: stopping again db2035 replication
  • 16:49 godog: grow raid0 on restbase1009 T119935
  • 16:23 godog: depool restbase1007 and shut for ram upgrade
  • 16:20 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable uploader group at Simple English Wikipedia gerrit:272744 (duration: 01m 32s)
  • 16:16 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: VisualEditor: Switch to Single Edit Tab mode on Hungarian Wikipedia gerrit:270346 (duration: 01m 32s)
  • 16:11 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable the structured language overlay and increase the instrumentation rate gerrit:271264 (duration: 01m 32s)
  • 16:00 godog: shut restbase1009 for cpu/mem upgrade
  • 15:37 jynus: stopping db2035 replication to test no paging
  • 15:22 godog: halt restbase1008 for cpu/mem upgrade
  • 15:20 ottomata: shutting down analytics (hadoop) cluster for CDH 5.5 upgrade
  • 15:10 andrewbogott: restarted pdns on labservices1001
  • 14:46 jynus: testing database paging changes by stopping mysql slave on db1021 (it should page dbas)
  • 14:44 volans: Run sysbench on es1016 (already depooled) T127330
  • 14:43 bblack: depool cp1051 (upload eqiad) - T125486
  • 14:27 moritzm: fixed apt config on es2011 (also affected by T125044)
  • 14:23 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: Depool to compare RAID perfs with new es201* T127330 (duration: 01m 42s)
  • 13:45 gehel: elasticsearch 1.7.5 now available on apt repository
  • 13:40 gehel: updating reprepro configuration on carbon.eqiad.wmnet to include elasticsearch 1.7 repo
  • 12:38 moritzm: installing cpio security updates
  • 11:27 ema: re-enabling puppet agent on mc2016
  • 11:15 ema: re-enabling puppet agent on scandium
  • 11:08 volans: Starting MariaDB on es2011 (not yet in production) [ T127330 ]
  • 10:46 volans: Executing /opt/wmf-mariadb10/install on not-yet-production es2011-es2019 [T127330]
  • 08:30 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: Depool es2010 (duration: 01m 34s)
  • 05:25 ori: StartProfiler.php sync was of Ic952fab90f: xhgui: profile 1:10,000 requests
  • 05:20 logmsgbot: ori@tin Synchronized wmf-config/StartProfiler.php: (no message) (duration: 01m 46s)
  • 05:06 ori: Deleted gdash docroot on graphite2001 and krypton
  • 04:05 bblack: parsoid-lb.eqiad.wikimedia.org turned off
  • 03:03 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Feb 23 03:03:06 UTC 2016 (duration 9m 13s)
  • 02:53 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 11m 44s)
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 13m 34s)
  • 02:27 tgr: switching mw1017 to wmf.12 for perf tests
  • 01:28 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.14/extensions/VisualEditor: SWAT (duration: 01m 34s)
  • 01:07 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Fix wgReferrerPolicy (duration: 01m 36s)
  • 01:04 ori: ran `mwscript namespaceDupes.php --wiki=labswiki` on silver
  • 01:00 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add Tool namespace on wikitech (duration: 01m 37s)
  • 00:52 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Freeze LQT on fiwikimedia (duration: 01m 39s)
  • 00:10 ori: Depooling mw1099 for debugging
  • 00:07 eileen: Updating civicrm from 07f8ea80d3fee19d4977d673a719f5a692ffbded to 181ef79fd1360ca337990b6768ebc10b1185a431

2016-02-22

  • 23:04 logmsgbot: krinkle@tin Synchronized w/static.php: Set MW_NO_SESSION to warn (duration: 01m 34s)
  • 22:42 jynus: powercycling es2010
  • 22:14 logmsgbot: ori@tin Finished scap: I1d4f90533: Third time's the charm: kill live-1.5 (duration: 11m 05s)
  • 22:03 logmsgbot: ori@tin Started scap: I1d4f90533: Third time's the charm: kill live-1.5
  • 22:02 logmsgbot: ori@tin Synchronized w: I1d4f90533: Third time's the charm: kill live-1.5 (duration: 01m 43s)
  • 21:40 logmsgbot: demon@tin Synchronized README: no-op for co-master sync (duration: 01m 31s)
  • 21:34 yurik: deployed graphoid https://gerrit.wikimedia.org/r/#/c/272602/
  • 21:29 ostriches: undid previous scap with dsh, live-1.5 is still used.
  • 21:28 logmsgbot: demon@tin Finished scap: removing live-1.5 symlink (duration: 13m 03s)
  • 21:26 ori: ran sync-common on silver
  • 21:15 logmsgbot: demon@tin Started scap: removing live-1.5 symlink
  • 21:05 logmsgbot: yurik@tin Synchronized php-1.27.0-wmf.13/extensions/Graph/: Graph ext https://gerrit.wikimedia.org/r/#/c/272473/ (duration: 01m 43s)
  • 20:59 logmsgbot: yurik@tin Synchronized php-1.27.0-wmf.14/extensions/Graph/lib/graph2.compiled.js: https://gerrit.wikimedia.org/r/#/c/272472/ (duration: 01m 40s)
  • 20:50 yurik: deployed and restarted kartotherian - https://gerrit.wikimedia.org/r/#/c/272425/
  • 20:33 ejegg: restarted antifraud queue consumer
  • 20:29 ejegg: update civicrm from 352bd7ce9abc4d3ce954e978aa333e6ec4a59849 to 07f8ea80d3fee19d4977d673a719f5a692ffbded
  • 20:28 ejegg: paused antifraud queue consumer
  • 18:32 logmsgbot: krinkle@tin Synchronized wmf-config/InitialiseSettings.php: T99096: Enable wmfstatic for small wikis (duration: 01m 43s)
  • 18:04 moritzm: corrected dpkg installation status for cassandra on restbase-test200[12]
  • 18:03 logmsgbot: hoo@tin Synchronized php-1.27.0-wmf.13/extensions/Wikidata: Reset entity access counts between parser runs (duration: 03m 01s)
  • 16:52 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: Cache morelike search queries for 24h (duration: 01m 34s)
  • 16:34 godog: reboot krypton.eqiad.wmnet, no answer to gnt-instance console / no ssh
  • 16:25 logmsgbot: demon@tin Synchronized README: no-op for co-master sync (duration: 01m 29s)
  • 16:20 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: azwiki wants some wikilove too (duration: 01m 29s)
  • 16:16 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: dpl on nowikimedia (duration: 01m 29s)
  • 16:13 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: default referrer policy (duration: 01m 31s)
  • 16:09 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: Translation Notifications on commons (duration: 01m 31s)
  • 16:07 logmsgbot: demon@tin Synchronized wmf-config/throttle.php: throttle exemption for or.wikipedia workshop (duration: 01m 32s)
  • 16:02 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Repool of es2001 (duration: 01m 39s)
  • 16:01 volans: repooled es2001 [ T127330 ]
  • 15:39 ottomata: stopping restbase on aqs1001
  • 15:26 ottomata: stopping puppet on aqs* for scap deployment
  • 13:50 logmsgbot: hoo@tin Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch on Wikidata after Property conversions (duration: 01m 39s)
  • 13:31 _joe_: restarting ocg on ocg1001, got stuck after redis restart on rdb1002
  • 12:23 moritzm: restarting apache on neon for glibc update
  • 11:09 moritzm: restarting apache on graphite1001 for glibc update
  • 10:53 moritzm: started salt-minion on scandium (process had died)
  • 04:00 logmsgbot: ori@tin Synchronized wmf-config/StartProfiler.php: Ie4c87619: xhgui: sanitize query string & I219c0901: xhgui: sanitize keys (duration: 01m 45s)
  • 03:03 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Feb 22 03:03:21 UTC 2016 (duration 8m 50s)
  • 02:54 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 12m 05s)
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 13m 28s)
  • 02:13 bd808: Logstash process on logstash1002 died from jvm OOM

2016-02-21

  • 14:39 jynus: restarting db2070 to test configuration change
  • 03:02 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Feb 21 03:02:48 UTC 2016 (duration 8m 47s)
  • 02:54 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 11m 34s)
  • 02:37 logmsgbot: krinkle@tin Synchronized wmf-config/InitialiseSettings.php: T107395 (duration: 01m 29s)
  • 02:35 logmsgbot: krinkle@tin Synchronized images: clean up (duration: 01m 32s)
  • 02:33 logmsgbot: krinkle@tin Synchronized w/static/images: clean up (duration: 01m 32s)
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 13m 20s)

2016-02-20

  • 19:39 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.13/extensions/MobileFrontend/includes/MobileFrontend.hooks.php: Remove live-hacked logging code for T124356 (duration: 01m 42s)
  • 19:37 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.13/extensions/MultimediaViewer/extension.json: (no message) (duration: 01m 36s)
  • 12:56 jynus: disabling ipmi_si kernel module on labsdb* hosts for high cpu usage
  • 08:08 _joe_: gnt-console reboot of alsafi
  • 05:43 logmsgbot: ori@tin Synchronized wmf-config/StartProfiler.php: I14612374: Send xhprof profiles from mw1017 to xhgui (duration: 01m 38s)
  • 03:10 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Feb 20 03:10:55 UTC 2016 (duration 8m 53s)
  • 03:02 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 10m 48s)
  • 02:33 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 13m 16s)
  • 00:45 logmsgbot: ori@tin Finished scap: I2a66b40e4c6: add a dependency on xhprof/xhgui (duration: 50m 12s)
  • 00:32 ori: Restarted HHVM on mw1248 (locked up, T89912).

2016-02-19

  • 23:55 logmsgbot: ori@tin Started scap: I2a66b40e4c6: add a dependency on xhprof/xhgui
  • 23:43 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable SecurePoll poll creation on officewiki (duration: 02m 21s)
  • 23:11 logmsgbot: ori@mira scap failed: ValueError /srv/mediawiki-staging/multiversion/vendor/slim/slim/tests/templates/test.php has content before opening <?php tag (duration: 00m 17s)
  • 23:11 logmsgbot: ori@mira Started scap: I2a66b40e4c6: add a dependency on xhprof/xhgui
  • 20:23 chasemp: reboot labvirt1002
  • 17:01 andrewbogott: reenabling puppet on labservices1001 and restoring original dns settings, pending merge of https://gerrit.wikimedia.org/r/#/c/271797/
  • 16:55 chasemp: reboot labvirt1011
  • 16:09 elukey: rebooted kafka2001.codfw.wmnet for kernel upgrade
  • 15:59 elukey: rebooted kafka2002.codfw for kernel upgrade
  • 15:59 moritzm: rebooting osmium
  • 15:49 elukey: added kafka1002 back to eventbus pool via confctl
  • 15:48 moritzm: rolling restart of maps cluster for glibc update
  • 15:42 elukey: removed kafka1002.eqiad.wmnet from eventbus pool via confctl
  • 15:38 elukey: re-added kafka1001.eqiad.wmnet back to eventbus' pool via confctl
  • 15:34 elukey: rebooting kafka1001 for kernel upgrade
  • 15:29 elukey: removed kafka1001 from eventbus via confctl
  • 15:14 andrewbogott: restarting pdns on labservices1001 again
  • 15:02 andrewbogott: restarting pdns on labservices1001
  • 15:00 paravoid: setting up (e)BGP sessions between ulsfo-codfw
  • 14:52 apergos: labstore1001 issues were (again) cluebot writing to its error log. I chowned that log to root and left a README file in the directory with an explanation plus a pointer to us here if they have questions/need help.
  • 14:48 moritzm: rolling restart of swift-proxy in eqiad
  • 14:27 volans: Restarting MariaDB on es2001 (still depooled) [T127330]
  • 14:27 paravoid: cr2-knams: re-activating BGP with 1257
  • 14:12 jynus: restarting mysql at dbstore1001
  • 13:59 moritzm: restarting salt-master on neodymium
  • 13:49 elukey: puppet re-enabled on analytics1027
  • 13:37 moritzm: rebooting mira
  • 13:06 moritzm: restarting slapd on dubnium/pollux
  • 12:48 moritzm: restarting apache on uranium
  • 12:44 elukey: puppet stopped on analytics1027 for issues with the cluster
  • 11:40 godog: start a new ^global- swift container replication eqiad -> codfw
  • 11:39 akosiaris: reboot cygnus, stuck in 100% IOwait
  • 11:13 moritzm: rolling restart of aqs cluster for glibc update
  • 11:08 mark: correction: Reduced sync_speed_max to 100000 (half) for md125 on labstore1001
  • 11:08 mark: Reduced sync_speed_max to 100000 (half) for md126 on labstore1001
  • 11:02 moritzm: restarted gerrit on ytterbium (actual restart happened ten minutes earlier than this log entry, though)
  • 10:22 moritzm: rolling restart of cassandra on restbase/eqiad
  • 10:05 jynus: restarting db021 slave, will be testing/depooled for a while
  • 10:00 jynus: purging requested rows on eventlogging Edit table (m4) at db1046, db1047, dbstore1002 and dbstore2002
  • 09:59 moritzm: rolling restart of cassandra on restbase/codfw
  • 09:04 _joe_: idled the raid check for md123 on labstore1001
  • 08:51 _joe_: killed the backup rsync on labstore1001 to alleviate the high io load
  • 03:00 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Feb 19 03:00:47 UTC 2016 (duration 9m 10s)
  • 02:51 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 11m 18s)
  • 02:43 mutante: powercycle cp2017
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 13m 18s)
  • 02:07 logmsgbot: krinkle@tin Synchronized wmf-config/InitialiseSettings.php: Clean up wmgUseWmfstatic setting (duration: 01m 34s)
  • 02:04 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings.php: Clean up wmgUseWmfstatic setting (duration: 01m 56s)
  • 01:02 mobrovac: restbase deploy end of fa1207e95
  • 00:55 mobrovac: restbase deploy start of fa1207e95 for T127370
  • 00:55 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable cross-wiki notifications beta feature on initial set of wikis (duration: 01m 40s)
  • 00:55 mobrovac: restbase deploy start of fa1207e95
  • 00:49 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.13/extensions/Echo/: SWAT (duration: 01m 41s)
  • 00:44 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.14/extensions/Echo/: SWAT (duration: 01m 37s)
  • 00:41 logmsgbot: catrope@tin Synchronized wmf-config/filebackend-production.php: Enable deferred writes to codfw swift cluster (duration: 01m 33s)
  • 00:37 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Set initial $wgMaxUserDBWriteDuration value (duration: 01m 38s)
  • 00:28 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: VE config var cleanup (2/2) (duration: 01m 38s)
  • 00:26 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: VE config var cleanup (1/2) (duration: 01m 44s)

2016-02-18

  • 23:49 csteipp: deployed patch for T125290
  • 22:22 chasemp: reboot labvirt1010
  • 22:18 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.13/extensions/SpamBlacklist: I0ad5289324: Pre-cache the link list for external link filters (duration: 02m 26s)
  • 22:01 chasemp: reboot labvirt1009
  • 21:59 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.13/includes/rcfeed/IRCColourfulRCFeedFormatter.php: T127360: Ignore RC_CATEGORIZE (duration: 01m 41s)
  • 21:59 volans: Shutting down MySQL on es2001 (depooled) and starting data trasnfer to es2011 [ T127330 ]
  • 21:32 chasemp: reboot labvirt1008
  • 21:15 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: depool es2001 (duration: 01m 36s)
  • 20:52 volans: Depool es2001 to copy the data to es2011 (T127330)
  • 20:41 chasemp: reboot labvirt1006
  • 20:39 logmsgbot: krinkle@tin Synchronized w/static.php: Set MW_NO_SESSION (duration: 01m 55s)
  • 20:21 urandom: restarted restbase in staging (5 min. delay)
  • 20:16 chasemp: reboot labvirt1005
  • 20:00 ejegg: updated payments-wiki from 5b909f06acce6444186ac02a494439c2ddd624aa to fdea9fa9a5951d7ae57bb4d54aa9374f236638d6
  • 19:56 ejegg: rolled back payments-wiki to 5b909f06acce6444186ac02a494439c2ddd624aa
  • 19:55 ejegg: updated payments-wiki from 5b909f06acce6444186ac02a494439c2ddd624aa to fdea9fa9a5951d7ae57bb4d54aa9374f236638d6
  • 19:53 urandom: upgrade Cassandra to 2.1.13 on restbase-test200[1-3].codfw.wmnet (restbase staging) : T126629
  • 19:50 chasemp: reboot labvirt1004
  • 19:40 paravoid: cr2-knams: deactivating BGP with 1257
  • 19:21 chasemp: reboot labvirt1003
  • 19:06 moritzm: restarting apache on bohrium
  • 19:04 subbu: finished deploying parsoid dfbafb60
  • 18:58 subbu: synced code; restarted parsoid on wtp1001 as a canary
  • 18:56 logmsgbot: ori@mira Synchronized wmf-config/mobile.php: Ib6fff26be162: Modify $wgRenderHashAppend when disabling responsive images on mobile (duration: 02m 24s)
  • 18:56 subbu: starting parsoid deploy
  • 18:53 twentyafterfour: tagged phabricator hotfixes as release/2016-02-18/2 in the phabricator/phabricator repository. This includes fixes T127290 and T127349
  • 18:50 logmsgbot: ori@mira Synchronized wmf-config/InitialiseSettings.php: I02dbbdb79ea9a: Re-set default vega version to 2 (duration: 03m 24s)
  • 18:49 jynus: stopping, upgrading and reconfiguring dbstore1001
  • 18:24 chasemp: rebooting labvirt1001
  • 18:22 yurik: deployed graphoid https://gerrit.wikimedia.org/r/#/c/271563/
  • 18:17 elukey: re-enabled puppet on analytics1027 after maintenance
  • 18:06 hoo: Updated Wikidata's property suggester with data from Monday's json dump
  • 18:04 mobrovac: restbase deploy end of a42976cc82
  • 18:03 twentyafterfour: applied a hotfix from https://secure.phabricator.com/D15306 on iridium to test a fix for https://phabricator.wikimedia.org/T127290
  • 18:00 godog: reenable puppet on restbase1008
  • 17:49 mobrovac: restbase deploy start of a42976cc82
  • 17:47 elukey: manual failover of hadoop master node (analytics1001) to secondary (analytics1002) for maintenance (plus service restarts)
  • 17:41 urandom: upgrading Cassandra to 2.1.13 on cerium.eqiad.wmnet (restbase staging) T126629
  • 17:28 mobrovac: restbase deploying a42976cc82 to restbase1002
  • 17:27 urandom: Cassandra on xenon.eqiad.wmnet killed by kernel after Cassandra package upgrade (coincidence?): [1482254.046078] Out of memory: Kill process 21854 (java) score 595 or sacrifice child : T126629
  • 17:26 urandom: Cassandra on xenon.eqiad.wmnet killed by kernel after Cassandra package upgrade (coincidence): [1482254.046078] Out of memory: Kill process 21854 (java) score 595 or sacrifice child
  • 17:22 urandom: upgrading Cassandra to 2.1.13 on xenon.eqiad.wmnet (restbase staging) T126629
  • 17:20 elukey: disabled puppet on analytics1027 to avoid any Camus job to run
  • 17:04 dcausse: updating completion suggester indices in eqiad
  • 16:54 elukey: restarting hadoop services on analytics105* nodes for security updates
  • 16:49 gehel: removing cirrus maintenance crons from mw1152 (T127322)
  • 15:52 dcausse: creating adywiki indices in codfw
  • 15:44 elukey: restarting hadoop services on analytics104* nodes for security updates
  • 15:37 elukey: restarting hadoop services on analytics102* nodes for security update
  • 15:33 moritzm: restarting apache on silver/wikitech
  • 15:10 elukey: restarting hadoop services on analytics103* hosts for security upgrades
  • 14:06 bblack: restarting apache on gallium (integration)
  • 13:13 mark: decreased raid md2 sync_speed_max to 6000 on restbase1008
  • 12:55 elukey: rebooted kafka1022.eqiad.wmnet for kernel upgrade
  • 12:52 godog: decrease raid min_speed to 8000 on restbase1008
  • 12:50 logmsgbot: hoo@tin Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch for Wikidata (duration: 01m 54s)
  • 12:41 elukey: rebooted kafka1020 for kernel upgrade.
  • 12:40 godog: decrease raid min_speed to 10000 on restbase1008
  • 12:24 godog: increase stripe_cache_size to 32470 on restbase1008
  • 12:21 godog: expand raid0 on restbase1008 to sdd and sde
  • 11:36 paravoid: upgrading mr1-ulsfo to its pre-recovery version and rebooting (T127295)
  • 11:34 hashar: Hard restarting Jenkins T127294
  • 11:32 jynus: logical import of db1021 starting for data consistency check and defragmenting purposes
  • 11:29 paravoid: mr1-ulsfo: "request system snapshot media internal slice alternate" + reboot (T127295)
  • 11:27 hashar: Jenkins web UI busy with 'jenkins.model.RunIdMigrator doMigrate' while it migrate build records. I did a bunch of cleanup yesterday. Jenkins runs jobs in the background just fine though. T127294
  • 11:12 hashar: Jenkins: reloading configuration from disk. Some metadata are corrupted T127294
  • 10:48 elukey: rebooted kafka1018 for maintenance
  • 10:17 elukey: rebooted kafka1014 for maintenance
  • 10:10 moritzm: restarting hhvm on mw1* to put glibc update into effect
  • 09:49 godog: remove old restbase metrics under restbase.* from graphite1001 and graphite2001
  • 03:13 twentyafterfour: running puppet one last time on iridium. Phabricator upgrade successful with just a few minor issues now resolved.
  • 03:01 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Feb 18 03:01:01 UTC 2016 (duration 9m 24s)
  • 02:51 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 11m 20s)
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 13m 55s)
  • 02:18 twentyafterfour: phabricator is back online, sprint extension is broken, I'm investigating
  • 01:57 mutante: powercycled frozen mw1147
  • 01:51 twentyafterfour: phab pre-upgrade: http://pastebin.com/RTmXfDhp
  • 01:49 twentyafterfour: about to bring down phabricator to do the upgrade
  • 01:49 twentyafterfour: ran puppet on iridium for testing
  • 01:08 twentyafterfour: stopped phd and started dumping phabricator's database to /srv/dumps/20160218.phabricator.sql.gz (just in case I need to roll back the update)
  • 00:34 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.13/extensions/Flow: Trying again (duration: 01m 50s)
  • 00:28 RoanKattouw: 00:28:25 64 apaches had sync errors , /usr/bin/sync-common missing
  • 00:28 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.13/extensions/Flow: SWAT (duration: 02m 06s)
  • 00:18 godog: restart cassandra-a on restbase1008 after extending /srv

2016-02-17

  • 23:53 csteipp: redeployed wmf14 patches
  • 23:30 csteipp: deployed all missing security patches from wmf14
  • 23:10 logmsgbot: csteipp@tin Synchronized php-1.27.0-wmf.14/resources/src/mediawiki/page/patrol.ajax.js: add security patches (duration: 01m 28s)
  • 23:08 logmsgbot: csteipp@tin Synchronized php-1.27.0-wmf.14/includes: add security patches (duration: 01m 35s)
  • 23:03 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.13/extensions/MobileFrontend/includes/MobileFrontend.hooks.php: live-hacked debug logging for T124356 (duration: 02m 16s)
  • 21:42 mobrovac: mathoid deploying ed98ffe9d
  • 21:35 mobrovac: restbase restarted restbase1002 on nodejs v4.3.0
  • 20:40 papaul: es201[1-9] - signing puppet certs, salt-key, initial run
  • 20:25 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings.php: Re-enable T99096 for mediawiki.org (duration: 01m 29s)
  • 20:23 logmsgbot: catrope@tin Synchronized docroot/: (no message) (duration: 01m 33s)
  • 19:18 yuvipanda: truncate 1.2T php error log file on labstore1003 from cluebot
  • 18:35 jynus: testing now that alerts still work by stopping db1024 replication (depooled)
  • 18:30 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings.php: T127194 (duration: 01m 31s)
  • 18:27 jynus: no issues found with new mysql, lag monitoring, renabling puppet again on the pending eqiad servers
  • 17:49 bblack: restarting pybal on eqiad primary LVS ( lvs100[123] )
  • 17:47 bblack: restarting pybal on codfw primary LVS ( lvs200[123])
  • 17:42 bblack: restarting pybal on ulsfo/esams primary LVS ( lvs[34]00[12])
  • 17:40 bblack: restarting pybal on eqiad backup LVS ( lvs100[456] )
  • 17:38 bblack: restarting pybal on eqiad inactive LVS clusters ( lvs1007-12 )
  • 17:38 bblack: restarting pybal on codfw backup LVS ( lvs200[456] )
  • 17:34 bblack: restarting pybal on ulsfo/esams backup LVS ( lvs[34]00[34])
  • 17:13 hoo: Updated the sites and site_identifiers table for on all non-Wikipedias (including Wikidata)
  • 17:02 ema: depooled ulsfo https://phabricator.wikimedia.org/T127094
  • 16:48 ostriches: purged ancient boardvote gpg key from mediawiki fleet. unused since forever.
  • 16:25 logmsgbot: anomie@tin Synchronized wmf-config/: SWAT: Undeploy Extension:ApiSandbox (duration: 01m 30s)
  • 16:20 logmsgbot: anomie@tin Synchronized wmf-config/CommonSettings.php: SWAT: Remove $wgMWOAuthGrantPermissions (duration: 01m 34s)
  • 16:16 urandom: restbase deploy (15a6c50) complete, sans restbase1008.eqiad.wmnet (down for maintenance during deploy)
  • 16:16 anomie: Ran namespaceDupes.php on tawiki
  • 16:14 urandom: restbase deploy (15a6c50) completet
  • 16:14 hoo: Re-populating the sites and site_identifiers table for all Wikipedias and testwikidata
  • 16:10 urandom: restbase deploy restarting at restbase1009
  • 16:09 logmsgbot: anomie@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: New userrights and configuration for cswiki (task T126931) (duration: 01m 31s)
  • 16:09 urandom: restbase deploy stalled at restbase1008 (under maintenance)
  • 16:05 logmsgbot: anomie@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace aliases for tawiki (task T126604) (duration: 01m 31s)
  • 16:00 urandom: continuing production-wide restbase deploy (15a6c50)
  • 15:58 godog: copy restbase.{v1_*,sys_*,ALL,GET,HEAD,POST,OPTIONS,_robots} to restbase.external on graphite1001 and graphite2001
  • 15:55 godog: copy restbase.private to restbase.internal on graphite1001 and graphite2001
  • 15:53 logmsgbot: aude@tin Synchronized php-1.27.0-wmf.13/extensions/Wikidata: Fix caching data types bug: T127095 (duration: 01m 44s)
  • 15:53 urandom: canary deploy of restbase to restbase1001.eqiad.wmnet (15a6c50) complete
  • 15:53 urandom: canary deploy of restbase to restbase1001.eqiad.wmnet (15a6c50)
  • 15:51 bblack: package upgrades commencing on lvs*
  • 15:43 urandom: restbase staging deploy (15a6c50) complete
  • 15:38 urandom: deploying restbase (15a6c50) in staging
  • 15:33 jynus: stopping puppet on all database hosts (db, dbstore, es, etc.) for lag alert testing
  • 14:05 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1022 after maintenance (duration: 01m 34s)
  • 14:02 bblack: package upgrades on cp* commence
  • 13:23 elukey: rebooted kafka1013 for maintenance
  • 13:06 elukey: stopping kafka on kafka1013
  • 12:51 moritzm: rolling restart of ms-fe2*
  • 12:31 moritzm: restarting apache on krypton for glibc update
  • 12:00 godog: bump stripe_cache_size to 10240 for md2 on restbase1008
  • 11:54 jynus: restarting schema change on wikidatawiki (s5) T62539
  • 11:47 moritzm: restarting hhvm on mw2* to put glibc update into effect
  • 11:29 jynus: rolling schema change on wikidatawiki (s5)
  • 10:40 jynus: refreshed dns server config
  • 09:57 godog: stop cassandra-a on restbase1008 for raid expansion
  • 09:55 godog: depool restbase1008 for raid expansion T119935
  • 08:19 akosiaris: gnt-instance reboot alsafi.wikimedia.org
  • 08:19 akosiaris: enabled puppet on sca1001, sca1002
  • 03:15 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Feb 17 03:15:40 UTC 2016 (duration 9m 30s)
  • 03:06 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 10m 31s)
  • 02:39 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 18m 59s)
  • 01:36 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.13/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.init.js: touch - https://phabricator.wikimedia.org/T125249#2012068 (duration: 01m 31s)
  • 01:33 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.13/extensions/VisualEditor/extension.json: https://gerrit.wikimedia.org/r/#/c/271174/ (duration: 01m 29s)
  • 01:26 mutante: labservices1001 - out of disk again (T126572) - moved designate-mdns.log files to /srv/var/
  • 01:13 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.13/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/271153/ (duration: 01m 31s)
  • 00:47 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.13/extensions/Math/MathValidator.php: https://gerrit.wikimedia.org/r/#/c/270981/ (duration: 01m 31s)
  • 00:33 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/271074 (duration: 01m 29s)
  • 00:29 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/266603 (duration: 01m 30s)
  • 00:26 Krenair: ... after showing rsync common finished line
  • 00:25 Krenair: sync-common taking a while to terminate
  • 00:14 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/270985 (duration: 01m 34s)

2016-02-16

  • 23:39 logmsgbot: demon@tin Finished scap: pre-deploying wmf.14 to testwiki to prime l10n cache (duration: 30m 36s)
  • 23:17 hashar: Jenkins accepting slave creations again. Root cause is /var/lib/jenkins/config-history/nodes/ has reached the 32k inode limit.
  • 23:12 mobrovac: restbase deploy end of 6f69a30
  • 23:09 logmsgbot: demon@tin Started scap: pre-deploying wmf.14 to testwiki to prime l10n cache
  • 23:06 eileen: Updating CiviCRM from b2566a9849c2b3a2a0c8f0b524c69bbc2dca5f8f to bccb044c39b1eb9f270a60df69771db59623599f
  • 23:04 logmsgbot: demon@tin Synchronized w/health-check.php: Ensure procinfo exists (duration: 00m 58s)
  • 23:02 hashar: Nodepool can not authenticate with Jenkins anymore. Thus it can not add slaves it spawned.
  • 22:58 mobrovac: restbase deploy start of 6f69a30
  • 22:56 hashar: contint: Nodepool instances pool exhausted
  • 22:02 hashar: tin/mira : git -C /srv/patches/ config core.sharedRepository true
  • 21:55 chasemp: "chgrp -R mwdeploy /srv/mediawiki-staging/.git/objects && chmod -R g+w /srv/mediawiki-staging/.git/objects" on tin
  • 20:50 logmsgbot: demon@tin Finished scap: removing expired wmf.9 branch (duration: 28m 03s)
  • 20:22 logmsgbot: demon@tin Started scap: removing expired wmf.9 branch
  • 20:14 twentyafterfour: restarted phd on iridium to kick-start the outbound email queue
  • 19:38 logmsgbot: hashar@tin Synchronized multiversion/updateBranchPointers: Missing require_once MWWikiversions (duration: 00m 57s)
  • 19:16 logmsgbot: ori@mira Synchronized wmf-config/InitialiseSettings.php: I2926f73b78fa0: Revert "Test HTML stripping in production mobile beta" (duration: 01m 29s)
  • 19:01 hashar: tin: checking out mw 1.27.0-wmf.14
  • 18:16 eileen: Updating CiviCRM from b499564de8a2239b1c20acba0f8d52493c9a9c22 to b2566a9849c2b3a2a0c8f0b524c69bbc2dca5f8f
  • 18:15 logmsgbot: ori@tin Synchronized wmf-config/mobile.php: I5818f0350925: Don't serve HiDPI thumbs on mobile web (duration: 00m 58s)
  • 18:06 logmsgbot: aude@tin Synchronized php-1.27.0-wmf.13/extensions/Wikidata/extensions/Wikibase/repo/resources/dataTypes/wikibase.dataTypeStore.js: Try to fix T127095 (duration: 00m 57s)
  • 18:03 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.13/includes/mime.types: Fix .htc static (T99096) (duration: 00m 58s)
  • 17:48 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings.php: Enable wmfstatic (wgResourceBasePath) for group0 wikis (duration: 01m 00s)
  • 17:31 logmsgbot: krinkle@tin Synchronized wmf-config/CommonSettings.php: wgLocalStylePath cleanup (use /static) (duration: 00m 59s)
  • 17:26 volans: Reboot+reimage of db1021 (T126996)
  • 17:07 volans: Disabled puppet on db1021 for reimaging: T126996
  • 17:01 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:270556 gerrit:270679 gerrit:270556 (duration: 00m 59s)
  • 16:54 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: CX: Remove ContentTranslationCorpora setting PART II gerrit:267236 (duration: 00m 59s)
  • 16:53 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Remove ContentTranslationCorpora setting PART I gerrit:267236 (duration: 00m 58s)
  • 16:24 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Enable survey at reduced sample rate" (duration: 00m 58s)
  • 16:20 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Enable survey at reduced sample rate" (duration: 01m 01s)
  • 16:18 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable survey at reduced sample rate gerrit:270344 (duration: 01m 00s)
  • 16:12 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.13/extensions/OpenStackManager: SWAT: rebase openstack submodule (duration: 00m 58s)
  • 16:08 logmsgbot: thcipriani@tin Synchronized wmf-config/throttle.php: Termporary lift of IP cap for an Edit-a-thon gerrit:270646, Cleanup: removing expired event gerrit:270648 (duration: 00m 58s)
  • 15:54 urandom: running `nodetool cleanup' on restbase{1,2,7-a}.eqiad.wmnet (bootstrap of 1007-b now complete)
  • 15:34 logmsgbot: aude@tin Synchronized wmf-config/InitialiseSettings.php: Remove other projects sidebar beta feature and per wiki settings (duration: 01m 09s)
  • 15:32 logmsgbot: aude@tin Synchronized wmf-config/Wikibase.php: Enable in other projects sidebar feature by default, out of beta (duration: 00m 58s)
  • 15:24 logmsgbot: aude@tin Synchronized wmf-config/Wikibase-production.php: Enable external identifier data type on Wikidata (duration: 00m 57s)
  • 15:17 logmsgbot: aude@tin Synchronized wmf-config/Wikibase.php: Enable identifiers section on Wikidata (duration: 01m 00s)
  • 15:13 logmsgbot: aude@tin Synchronized wmf-config/Wikibase-production.php: Link commons sidebar link to commons category (duration: 00m 59s)
  • 15:09 akosiaris: upgrades OTRS to 5.0.7
  • 14:58 moritzm: rebooting nescio
  • 12:28 logmsgbot: hoo@tin Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 58s)
  • 12:15 hoo: Restarted hhvm on mw1237
  • 12:01 paravoid: rebooting pfw-codfw for upgrade
  • 11:59 logmsgbot: hoo@tin Synchronized wmf-config/: Enable Capiunto on testwiki, test2wiki and testwikidata (T126399) (2nd try) (duration: 00m 58s)
  • 11:54 logmsgbot: hoo@tin Synchronized wmf-config/: rv (duration: 00m 58s)
  • 11:49 logmsgbot: hoo@tin Finished scap: Deploy Capiunto (master) to testwiki, test2wiki and testwikidata - T126399 (duration: 24m 46s)
  • 11:46 paravoid: upgrading pfw-codfw to newer JunOS
  • 11:25 logmsgbot: hoo@tin Started scap: Deploy Capiunto (master) to testwiki, test2wiki and testwikidata - T126399
  • 11:08 godog: repool restbase on restbase1007
  • 11:02 moritzm: installing nettle security updates
  • 10:50 godog: start swiftrepl commons thumbs for top50 popular size T125791
  • 10:14 akosiaris: disable puppet, stop salt-minion on sca100{1,2}
  • 09:52 hashar: will cut the wmf branches this afternoon starting around 14:00 CET
  • 09:31 logmsgbot: hoo@tin Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 58s)
  • 02:34 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Feb 16 02:34:30 UTC 2016 (duration 8m 34s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 12m 28s)

2016-02-15

  • 21:26 logmsgbot: krinkle@tin Synchronized w/static.php: (no message) (duration: 00m 58s)
  • 21:25 ejegg: enabled CiviCRM queue consumer jobs
  • 21:25 ejegg: set CiviCRM back online
  • 20:55 twentyafterfour: moved contents of /srv/phab/dumps into /srv/dumps
  • 20:48 twentyafterfour: backing up /srv/phab to /srv/phab.bak in case it needs to be restored in a hurry. Note: the backup excludes /srv/phab/repos which is backed up separately
  • 20:47 twentyafterfour: brought iridium up to date with a current puppet run and checked that all repositories are at the correct tag
  • 20:38 eileen: updating civicrm from c009af16944a6478bd0292422f5bb0151f7a22c1 to b499564de8a2239b1c20acba0f8d52493c9a9c22
  • 20:17 ejegg: set CiviCRM to offline maintenance mode
  • 20:15 ejegg: disabled CiviCRM queue consumer jobs
  • 18:23 volans: Upgrading db1022 (MariaDB and kernel) already depooled and put in scheduled downtime
  • 16:46 yurik: Deployed patch for T126897
  • 16:38 logmsgbot: volans@tin Synchronized wmf-config/db-eqiad.php: depool db1022 (duration: 00m 58s)
  • 16:34 bd808: Ran sync-common on mw1119 after stuck scap update was killed there
  • 16:29 bd808: Killed stuck rsync by mw1119 that was keeping l10nudpate running
  • 16:28 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 855m 18s)
  • 16:27 moritzm: installing postgres security updates on maps*
  • 15:32 _joe_: restarting the salt minions on all deployment targets
  • 15:22 apergos: disabled puppet again on iridium, had to restore sshd config from filebucket after puppet run
  • 15:18 hoo|busy: Changed email for global account "Frau pomerenke"
  • 14:51 apergos: phab role commented out on iridium and puppet re-enabled, no phab changes will be applied there
  • 14:24 godog: start restbase1007-b cassandra instance, bootstrapping T119935
  • 12:30 _joe_: deployment master is now tin
  • 11:57 _joe_: switching the deployment host back to tin
  • 11:48 _joe_: restarted hhvm on mw1130, memory exhausted
  • 11:28 jynus: offlining disk 32:5 on db1063
  • 10:58 logmsgbot: hoo@mira Synchronized wmf-config/throttle.php: Throttle exception for de:WP:Wikimedia Deutschland/WPFF Berlinale2016 (duration: 01m 28s)
  • 09:38 godog: swift codfw-prod: ms-be2020 / ms-be2021 weight to 3500
  • 09:09 _joe_: powercycled mw1131, OOM'd

2016-02-14

  • 23:18 jynus: repaired table and restarted replication on sanitarium (s3)
  • 23:06 andrewbogott: moved tools dbs off of labsdb1002 via https://gerrit.wikimedia.org/r/#/c/270650/
  • 22:40 _joe_: labsdb lvm partition /srv unavailable because /dev/sdc is apparently broken
  • 16:55 _joe_: powercycling mw1140, stuck in OOM, no ssh no console login
  • 13:57 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: vslow, dump to db1054; do not send api to db1054; 1036 only rc (duration: 01m 17s)
  • 13:14 logmsgbot: hoo@mira Synchronized wmf-config/db-eqiad.php: Depool db1021 (duration: 01m 26s)
  • 02:40 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Feb 14 02:40:27 UTC 2016 (duration 9m 8s)
  • 02:31 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.13) (duration: 12m 01s)

2016-02-13

  • 06:30 urandom: `nodetool stop -- COMPACTION && nodetool cleanup' on restbase1002.eqiad, an abundance of caution (https://phabricator.wikimedia.org/P2612)
  • 04:37 urandom: (ephemerally) dropping compactor thread count from 10 to 8 on restbase1002.eqiad
  • 01:18 mutante: omg testing this log feature that logs straight to tickets (T108720)
  • 00:42 logmsgbot: krinkle@mira Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 01m 14s)
  • 00:28 logmsgbot: krinkle@mira Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 01m 17s)
  • 00:10 mutante: ruthenium - restarting parsoid, now works out of /srv/

2016-02-12

  • 23:37 mutante: ruthenium - moving parsoid path, cleaning up old resources
  • 22:33 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.13/extensions/MobileFrontend/extension.json: I315628aef3: Don't use 'qlow' for NetSpeed=B (duration: 01m 16s)
  • 21:30 logmsgbot: tgr@mira Synchronized wmf-config/InitialiseSettings.php: T125455: log session-ip channel to logstash (duration: 01m 17s)
  • 21:22 logmsgbot: ori@mira Synchronized docroot and w: Ifc5b02cba4: Speed trials: add no-srcset variant (duration: 01m 16s)
  • 19:26 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.13/extensions/Disambiguator/Disambiguator.hooks.php: Check for array index existence (7b5f87f) (T126651) (duration: 01m 15s)
  • 19:16 bd808: Wikis back up thankfully
  • 19:12 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.13/includes/DefaultSettings.php: Log multiple IPs using the same session or the same user account (4d8b8ca) (duration: 01m 16s)
  • 18:34 logmsgbot: krenair@mira Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/270143/ (duration: 01m 15s)
  • 18:09 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.13/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/270328/ (duration: 01m 16s)
  • 17:55 logmsgbot: krenair@mira Synchronized wmf-config/interwiki.php: https://gerrit.wikimedia.org/r/#/c/270327/ (duration: 01m 18s)
  • 17:39 mutante: wikibugs broken in operations and other channels
  • 17:13 _joe_: reloading apache on all the appservers
  • 17:06 hashar: CI is processing jobs again. Nodepool instances are spawning
  • 17:03 _joe_: soft-reloading apache on half of appservers
  • 16:39 jynus: purging rows from analytics-slave as requested (eventlogging database)
  • 09:39 godog: restbase1001 nodetool cleanup && nodetool stop compaction
  • 09:36 godog: restbase1002 cleanup nodetool cleanup local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ && stop compactions
  • 07:55 moritzm: repooling elastic1023, hw problem has been fixed
  • 05:49 bblack: re-enabling cr2-knams xe-0/0/2 (Tele2)
  • 05:48 gwicke: restbase *staging*: started low-concurrency test dump run from ruthenium against xenon
  • 01:26 mobrovac: restbase deploy end of 6f6311f
  • 01:22 mobrovac: restbase2002 re-enable puppet
  • 00:45 logmsgbot: krenair@mira Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/270145/ (duration: 01m 16s)
  • 00:35 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.13: https://gerrit.wikimedia.org/r/#/c/270126/ (duration: 02m 26s)
  • 00:34 mobrovac: restbase deploy start of 6f6311f
  • 00:32 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.13/autoload.php: https://gerrit.wikimedia.org/r/#/c/270126/ (duration: 01m 16s)
  • 00:30 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.13/includes/session/MetadataMergeException.php: https://gerrit.wikimedia.org/r/#/c/270126/ (duration: 01m 14s)
  • 00:26 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/270140/ (duration: 01m 17s)
  • 00:19 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/270121/ (duration: 01m 14s)
  • 00:18 ori: Depooled mw2173
  • 00:17 urandom: `nodetool stop -- COMPACTION' on restbase1002.eqiad to free disk space (https://phabricator.wikimedia.org/P2598)
  • 00:00 mutante: add niedzielski to nda LDAP group (T106064)

2016-02-11

  • 23:53 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.13/extensions/MobileFrontend: I1fe8538eb67: Provide low-resolution NetSpeed option in Special:MobileOptions (duration: 01m 22s)
  • 23:28 logmsgbot: krenair@mira Synchronized w/static/images/project-logos/adywiki.png: https://gerrit.wikimedia.org/r/#/c/268004/ (duration: 01m 19s)
  • 23:27 XenoRyet: updated payments-wiki from fad669c99db8240b26a524aa70c85cfebd13a18c to 5b909f06acce6444186ac02a494439c2ddd624aa
  • 23:26 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/268004/ (duration: 01m 19s)
  • 23:24 logmsgbot: krenair@mira Synchronized langlist: https://gerrit.wikimedia.org/r/#/c/268004/ (duration: 01m 21s)
  • 23:22 logmsgbot: krenair@mira rebuilt wikiversions.php and synchronized wikiversions files: (no message)
  • 23:22 logmsgbot: krenair@mira Synchronized dblists: https://gerrit.wikimedia.org/r/#/c/268004/ (duration: 01m 23s)
  • 22:21 mobrovac: restbase1009 removed dpkg hold on nodejs
  • 22:17 bblack: disable port xe-0/0/2 on cr2-knams (tele2)
  • 20:08 mutante: mendelevium (OTRS) - delete unused ssl cert files, shred key
  • 20:01 logmsgbot: demon@mira rebuilt wikiversions.php and synchronized wikiversions files: remaining wikis to wmf.13
  • 19:45 godog: restart swift on ms-be1008, sdd offlined
  • 19:40 logmsgbot: demon@mira Synchronized multiversion/MWWikiversions.php: newlines redux (duration: 02m 13s)
  • 19:27 mutante: ruthenium - chmod g+w on git repo so wikidev members can deploy
  • 18:29 bblack: concluding manual experiment on cp1065 (puppet re-enabled)
  • 18:22 yurik: updated kartotherian https://gerrit.wikimedia.org/r/#/c/270016/
  • 18:12 elukey: Add mc1012/mc1013 to redis/memcached pools after maintenance.
  • 18:01 urandom: `nodetool stop -- COMPACTION' on restbase1001.eqiad to free disk space (https://phabricator.wikimedia.org/P2596)
  • 17:06 elukey: disabled puppet, memcached, redis on mc1012/mc1013 for maintenance
  • 17:04 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert enable math extension on Wikitech. Namespace configuration on ne.wikibooks (duration: 02m 15s)
  • 16:57 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Revert "Enable Math extension on Wikitech"" gerrit:269770 (duration: 02m 13s)
  • 16:51 logmsgbot: thcipriani@mira Synchronized php-1.27.0-wmf.13/extensions/SemanticForms/includes/SF_Utils.php: SWAT: Semantic form path fixes gerrit:269869 (duration: 02m 14s)
  • 16:44 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Change Nepali Wikibooks sitename and logo Part II gerrit:267170, Clean duplicate wgCopyUploadsDomains setting gerrit:268419 (duration: 02m 14s)
  • 16:41 logmsgbot: thcipriani@mira Synchronized w/static/images/project-logos/newikibooks.png: SWAT: Change Nepali Wikibooks sitename and logo Part I gerrit:267170 (duration: 02m 13s)
  • 16:32 logmsgbot: thcipriani@mira Synchronized wmf-config/CommonSettings.php: SWAT: Add $wgGraphAllowedDomains setting for future gerrit:269896 (duration: 02m 13s)
  • 16:27 elukey: Removed mc1012/mc1013 from the redis/memcached pools for maintenance.
  • 16:26 bblack: puppet disabled on cp1065 for some live and careful experimentation...
  • 16:20 logmsgbot: thcipriani@mira Synchronized wmf-config/wikitech.php: SWAT: Add wgOpenStackManagerNovaIdentityV3URI to wikitech configs gerrit:269558 (duration: 02m 11s)
  • 16:16 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Add new WEF enwiki IP rate limit exception gerrit:269862 (duration: 02m 13s)
  • 16:11 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: wgRCWatchCategoryMembership true on wikisource gerrit:264735 (duration: 02m 14s)
  • 15:43 elukey: Add mc1010/mc1011 back to the redis/memcached pools after maintenance.
  • 15:24 elukey: puppet re-enabled on mc1010/mc1011
  • 14:42 elukey: Stopped puppet, memcached, redis on mc1010/mc1011 for maintenance
  • 14:07 elukey: removed mc1010/mc1011 from the redis/memcached pools for maintenance
  • 14:01 urandom: `nodetool stop -- COMPACTION' on restbase1002.eqiad to free disk space (current state: https://phabricator.wikimedia.org/P2593)
  • 13:47 jynus: backuping, reimaging, restarting and defragmenting db1024 (old s2 master)
  • 13:37 paravoid: downgrading berkelium to Linux 3.19 & rebooting
  • 13:34 apergos: reenabled scap runs (removed flag)
  • 13:21 jynus: migrating dbstore1001 to the new s2-master
  • 13:00 paravoid: upgrading backup LVS servers to Linux 4.4.0 across all sites
  • 12:42 elukey: Add mc1008/mc1009 back into redis/memcached pools after maintenance
  • 12:40 moritzm: uploaded nodejs 4.3.0 for jessie-wikimedia to carbon
  • 12:13 elukey: re-enabled puppet on mc1009
  • 12:07 elukey: enabled puppet on mc1008
  • 11:54 _joe_: stopped scap runs for now (touched sync.flag)
  • 11:52 moritzm: updated xenon, praseodymium and cerium to nodejs 4.3.0
  • 11:46 apergos: removed scap3 deb install from all hosts in prod
  • 11:33 elukey: disabled puppet, memcached, redis on mc1008/mc1009 for maintenance
  • 11:12 jynus: moving pending s2 slaves to the new master
  • 10:52 elukey: Correction about mc1008/mc1009 - puppet not disabled, will be done later on.
  • 10:50 elukey: removed mc1008/mc1009 from redis/memcached pools. Puppet disabled.
  • 10:03 godog: ms-be2018 / ms-be2019 swift weight to 3500
  • 09:52 elukey: re-added mc1007.eqiad back into redis/memcached pools after maintenance
  • 09:39 moritzm: depooled elastic1023 for hardware problems (T126586)
  • 09:31 elukey: re-enabled puppet on mc1007.eqiad
  • 09:04 jynus: restarting HHVM on all running mediawiki job queue processors
  • 08:46 _joe_: enabled puppet on mw1017, was disabled with no reason
  • 08:45 elukey: Disabling puppet, redis, memcached on mc1007 for maintenance
  • 08:33 elukey: removed mc1007 from the redis/memcached pool for Jessie migration
  • 06:56 _joe_: removing content from sd{d,j}1 written on the root partition on ms-be1008
  • 06:39 _joe_: stipped swift on ms-be1008, disk full because object were being written to the root partition
  • 06:16 bd808: Re-enabled puppet on logstash1003 and forced pupept run
  • 06:02 bd808: Raised logstash mem limit to -Xms512m on logstash1003
  • 05:57 bd808: disabled puppet on logstash1003 to debug configuration
  • 05:40 bd808: logstash process on logstash1003 flapping; continuing to investigate
  • 05:25 bd808: Restarted logstash on logstash1003; killed by OOMKiler
  • 02:16 ori: restarted HHVM on mw1130; locked up (T89912)
  • 02:15 bd808: Fetched 1271d36 to tin:/srv/mediawiki-staging; should have been updated by sync-wikiversions
  • 01:58 logmsgbot: tgr@mira rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.13 again
  • 01:47 mutante: labservices1001 - out of disk
  • 01:46 mutante: ms-be1008 - delete/gzip large syslog, out of disk
  • 01:41 logmsgbot: tgr@mira Synchronized wmf-config/mobile.php: fix for T49647 (duration: 02m 15s)
  • 01:05 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/269883/1 (duration: 02m 11s)
  • 00:53 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263051/ (duration: 02m 13s)
  • 00:48 logmsgbot: krenair@mira Synchronized w/static/images/project-logos/wuuwiki.png: https://gerrit.wikimedia.org/r/#/c/263051/ (duration: 02m 13s)
  • 00:44 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264937/ (duration: 02m 15s)
  • 00:35 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/269827/1 (duration: 02m 13s)
  • 00:32 logmsgbot: krenair@mira Synchronized w/static/images/project-logos/hiwikiquote.png: https://gerrit.wikimedia.org/r/#/c/269827/1 (duration: 02m 13s)
  • 00:12 logmsgbot: krenair@mira Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/259439/ (duration: 02m 20s)

2016-02-10

  • 23:49 tgr: switched mw1017 to wmf.13 (all groups)
  • 23:38 logmsgbot: ori@mira rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis back to php-1.27.0-wmf.12
  • 23:34 ori: Restarted HHVM on mw1017
  • 23:27 urandom: performing rolling restart of Cassandra in restbase staging (experimental gc settings)
  • 23:15 logmsgbot: ori@mira Synchronized wmf-config/mobile.php: I6946eccf9c: Better hack for T49647 (duration: 02m 19s)
  • 23:10 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.12/includes/WebResponse.php: I13fcc3ce4: Allow changing cookie options in WebResponseSetCookie hook (duration: 01m 30s)
  • 23:08 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.13/includes/WebResponse.php: I13fcc3ce4: Allow changing cookie options in WebResponseSetCookie hook (duration: 01m 37s)
  • 22:49 logmsgbot: krinkle@mira Synchronized w/static.php: (no message) (duration: 02m 18s)
  • 22:40 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.13/extensions/MobileFrontend/resources/mobile.loggingSchemas/SchemaEdit.js: https://gerrit.wikimedia.org/r/#/c/269755/ (duration: 02m 23s)
  • 22:36 urandom: rolling restart of Cassandra staging complete (experimental gc settings)
  • 22:35 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/extensions/MobileFrontend/resources/mobile.loggingSchemas/SchemaEdit.js: https://gerrit.wikimedia.org/r/#/c/269848/ (duration: 02m 18s)
  • 22:32 yurik: deployed and restarted kartotherian services
  • 22:28 urandom: performing rolling restart of Cassandra in restbase staging (experimental gc settings)
  • 22:23 yurik: deployed and restarted tilerator & tileratorui services
  • 22:15 ori: Restarted apache on palladium and strontium
  • 21:21 subbu: finished deploying parsoid version 8976ab93
  • 21:16 hashar: CI dust have settled. Krinkle and I have pooled a lot more Trusty slaves to accommodate for the overload caused by switching to php55 (jobs run on Trusty)
  • 21:14 subbu: synced code + restarted parsoid on wtp1001 as a canary
  • 21:10 subbu: starting parsoid deploy
  • 20:54 ejegg: updated DjangoBannerStats from 71df14d4d8b11f3ca0ef1eeb6c6e2db9be79103a to c143b8dd41ae413dab494ae3e194f40b8cd04bb1
  • 20:24 chasemp: tc per client shaping for labstore1001 test
  • 20:21 logmsgbot: demon@mira Synchronized php-1.27.0-wmf.13/includes/interwiki/Interwiki.php: fix cache stuff (duration: 02m 18s)
  • 20:17 apergos: cleaned up integrations slave trusty 1001,10012,10013, 1016, missed in first round.
  • 20:10 logmsgbot: demon@mira rebuilt wikiversions.php and synchronized wikiversions files: rebuild
  • 20:10 logmsgbot: demon@mira Synchronized multiversion/MWWikiversions.php: rm newline addition (duration: 02m 19s)
  • 20:03 logmsgbot: demon@mira rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.13
  • 19:45 apergos: did cleanup across all integration slaves, some were very close to out of room. results: https://phabricator.wikimedia.org/P2587
  • 19:26 mutante: ms-be1008 - powercycling - the known XFS issue
  • 19:04 mobrovac: restbase disabled puppet in staging, testing brotli compression which requires JAVA_OPTS tuning
  • 18:30 mutante: puppetstoredconfigclean.rb iodine.wikimedia.org, revoke puppet cert, delete salt key on new master
  • 18:15 mutante: iodine - shutdown, decom
  • 18:10 mutante: iodine - schedule downtime, stop puppet, stop salt, ..
  • 17:21 logmsgbot: demon@mira Synchronized multiversion/MWWikiversions.php: newlines in wikiversion.json (duration: 02m 21s)
  • 17:15 logmsgbot: demon@mira Synchronized errorpages/404.html: minor html fix (duration: 02m 17s)
  • 17:12 logmsgbot: thcipriani@mira Finished scap: VE, Cite, and Citoid bumps gerrit:269592 gerrit:269593 gerrit:269575 gerrit:269590 (duration: 29m 53s)
  • 17:02 elukey: readded mc1006 back into redis/memcached pool
  • 16:42 logmsgbot: thcipriani@mira Started scap: VE, Cite, and Citoid bumps gerrit:269592 gerrit:269593 gerrit:269575 gerrit:269590
  • 16:35 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Enable Math extension on Wikitech" (duration: 02m 14s)
  • 16:24 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Math extension on Wikitech gerrit:269694 (duration: 02m 14s)
  • 16:18 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespaces configuration on ru.wikisource gerrit:269702 (duration: 02m 10s)
  • 16:09 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable specialcx campaign gerrit:269416 (duration: 02m 22s)
  • 15:51 jynus: recreating labsdb heartbeat views to correctly measure lag from the new s2-master
  • 15:46 elukey: re-enabled puppet on mc1006.eqiad
  • 15:42 ostriches: gerrit: flushed all caches, things will be slow for a bit while they warm
  • 15:22 elukey: disabled puppet/memcached/redis on mc1006.eqiad
  • 15:02 bblack: SPDY disable for cache_text: test starts in a few minutes! - https://phabricator.wikimedia.org/T125979
  • 14:48 elukey: removed mc1006 from the redis/memcached pool for Jessie migration
  • 13:48 godog: restbase1002 nodetool setstreamthroughput 500
  • 13:24 elukey: adding mc1005.eqiad back into service (redis/memcached)
  • 12:51 jynus: restarting hhvm at mw1015 - db errors continue
  • 12:42 Krinkle: Purged MediaWiki/wmfstatic/* metrics in Graphite (spurious test data)
  • 12:35 logmsgbot: krinkle@mira Synchronized w/static.php: (no message) (duration: 02m 11s)
  • 12:33 moritzm: disabling puppet on mw1001-1009, mw1011-1016 to enable ferm in batches
  • 11:51 jynus: restarting mw1015 jobqueue/chron processing again
  • 11:44 godog: restbase1001 nodetool setstreamthroughput 500
  • 11:37 elukey: re-enabled puppet on mc1005.eqiad
  • 11:36 godog: restbase1001 nodetool setstreamthroughput 400
  • 11:16 godog: restbase1007 nodetool-a setstreamthroughput 350
  • 11:02 jynus: enabling semi-sync replication fo s2 slaves
  • 10:58 elukey: disabling puppet, redis and memcached on mc1005 (preparation for Jessie migration)
  • 10:44 _joe_: removing mw1051-mw1070 from the appservers pool (T126242)
  • 10:26 elukey: Removing mc1005.eqiad from the redis/memcached pools
  • 10:20 godog: ms-be2016 / ms-be2017 swift weight to 3500
  • 08:24 godog: removenode restbase1007-a finished, start cassandra-a on restbase1007 for bootstrap
  • 07:54 apergos: cleared out a bunch of clones from /mnt/jenkins-workspace/workspace on integration-slave-trusty-1016, /mnt was full preventing jenkins from completing e.g. https://integration.wikimedia.org/ci/job/operations-puppet-typos/50458/console
  • 07:20 ottomata: puppet disabled on analytics1027 til tomorrow while cron is disabled and CirrusSearchRequestSet backfills into Hadoop from kafka
  • 06:19 mutante: integration-slave-trusty-1014 - out of disk ? jenkins voted things -1 because it had no space left on device
  • 02:01 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Reducing new s2 master weight for reads (duration: 02m 15s)
  • 01:49 logmsgbot: jynus@mira Synchronized wmf-config/db-codfw.php: Updating new master on codfw configuration (duration: 02m 15s)
  • 01:41 jynus: starting pt-heartbeat on db1018
  • 01:23 jynus: started db1048 replication. For some reason, replication was stopped. Need further investigation.
  • 01:14 logmsgbot: catrope@mira Synchronized wmf-config/InitialiseSettings.php: Set $wgPageLanguageUseDB = true on testwii (duration: 02m 14s)
  • 01:12 logmsgbot: catrope@mira Synchronized docroot/noc/conf/highlight.php: Remove ob_start() from highlight.php (duration: 02m 13s)
  • 00:46 RoanKattouw: Running updateCollation.php on nlwiki
  • 00:31 logmsgbot: catrope@mira Synchronized wmf-config/CommonSettings.php: BetaFeatures wmg->wg rename, part 2 (duration: 02m 13s)
  • 00:28 logmsgbot: catrope@mira Synchronized wmf-config/InitialiseSettings.php: Set collation to uca-nl on nlwiki; add Recherche: to content namespaces on frwikiversity; BetaFeatures wmg->wg rename (duration: 02m 12s)
  • 00:20 logmsgbot: catrope@mira Synchronized wmf-config/InitialiseSettings.php: Increase completion suggester replicas for busy wikis (duration: 02m 11s)
  • 00:18 logmsgbot: catrope@mira Synchronized wmf-config/logging.php: Reduce Kafka timeouts (duration: 02m 13s)
  • 00:11 logmsgbot: catrope@mira Synchronized wmf-config/InitialiseSettings.php: Test HTML stripping in production mobile beta (duration: 02m 12s)

2016-02-09

  • 23:55 jynus: restarting jobrunner and jobchron
  • 23:38 jynus: setting db1018's binlog_format as STATEMENT
  • 23:30 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Disable read only mode for s2 after its master failover (duration: 02m 09s)
  • 23:27 _joe_: disabled puppet on mc1004, added "bind 0.0.0.0" to its redis config, restarted redis (T126395)
  • 23:23 jynus: setting db1018 in read/write mode
  • 23:22 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Actual mediawiki master failover (duration: 02m 14s)
  • 23:19 bd808: Changed /src/mediawiki/wikiverisons.php on mw1017 (X-Wikimedia-Debug) to set all wikis to 1.27.0-wmf.13
  • 23:18 jynus: setting db1024 in read only mode
  • 23:16 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Enabling read only mode for s2 before its master failover (duration: 02m 14s)
  • 23:09 jynus: setting up circular replication between db1018 and db1024 for potential rollback
  • 23:02 jynus: changing topology of s2 slaves in preparation for master failover
  • 22:04 logmsgbot: bd808@mira Synchronized wmf-config/logging.php: Monolog: reorder Monolog processors (b356eeb) (duration: 02m 15s)
  • 21:33 Krenair: ran package upgrades on wikitech-static
  • 20:37 bblack: restarting nginx for libssl update on cp1049.eqiad.wmnet,cp4008.ulsfo.wmnet,cp3042.esams.wmnet,cp3049.esams.wmnet
  • 20:32 logmsgbot: demon@mira Finished scap: all group0 to wmf.13 (duration: 29m 45s)
  • 20:25 bblack: cache kernel reboots done (all on '3.19.0-2-amd64 #1 SMP Debian 3.19.3-9 (2016-01-04)', except 4x canaries on '4.4.0-1-amd64 #1 SMP Debian 4.4-1~wmf1 (2016-01-26)')
  • 20:11 bblack: cp1067, cp1071 (text, upload in eqiad) -> 4.4 canaries (rebooting over the next ~8 mins or so)
  • 20:02 logmsgbot: demon@mira Started scap: all group0 to wmf.13
  • 19:55 hoo: Updated operations/dumps/dcat on snapshot1003 from 0a71deb232 to 92ab37d94e
  • 19:37 logmsgbot: demon@mira Finished scap: pruning tons of stale branches + sync wmf.13 files for later + testwiki to wmf.13 to build l10n cache (try 2) (duration: 27m 24s)
  • 19:35 bblack: cp3048 (upload esams) rebooting -> kernel 4.4 canary
  • 19:13 mutante: gerrit - add ppchelko to mediawiki-services
  • 19:11 bblack: cp4006 (upload ulsfo) rebooting -> kernel 4.4 canary
  • 19:09 logmsgbot: demon@mira Started scap: pruning tons of stale branches + sync wmf.13 files for later + testwiki to wmf.13 to build l10n cache (try 2)
  • 19:07 logmsgbot: demon@mira scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_2315818744" --threads=10 --lang en --quiet' returned non-zero exit status 255 (duration: 00m 34s)
  • 19:07 logmsgbot: demon@mira Started scap: pruning tons of stale branches + sync wmf.13 files for later + testwiki to wmf.13 to build l10n cache
  • 18:57 yurik: deployed graphoid
  • 18:05 jynus: bringing down db1048's mysql for cloning to db2012
  • 17:54 Krenair: ssh: connect to host mw1037.eqiad.wmnet port 22: Connection timed out
  • 17:53 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/extensions/OpenStackManager: https://gerrit.wikimedia.org/r/#/c/269439/ (duration: 03m 15s)
  • 17:26 elukey: mc1004.eqiad put back into redis/memcached pool
  • 17:23 godog: nodetool-a removenode ec0c5a3d-2648-4933-8434-a8d163b92188 in preparation for restbase1007 bootstrap
  • 17:22 bblack: rebooting cp1008/pinkunicorn for 4.4-rt kernel test
  • 17:19 _joe_: powered down mw1037
  • 17:07 godog: start cassandra-a on restbase1007 with replace_address=10.64.0.230
  • 16:57 logmsgbot: thcipriani@mira Finished scap: SWAT: Clarify and expand messages mentioning loss of session data gerrit:269424 (duration: 27m 36s)
  • 16:53 bblack: rebooting cp1008/pinkunicorn for 4.4 kernel
  • 16:34 jynus: reimage db2012
  • 16:30 logmsgbot: thcipriani@mira Started scap: SWAT: Clarify and expand messages mentioning loss of session data gerrit:269424
  • 16:18 logmsgbot: thcipriani@mira Synchronized wmf-config: SWAT: Enable ArticlePlaceholder on test wikis gerrit:269399 (duration: 01m 19s)
  • 16:15 thcipriani: mw1037.eqiad.wmnet error during SWAT rsync: failed to set times on "/srv/mediawiki/.": Read-only file system (30)
  • 16:09 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable math data type on Wikidata and everywhere gerrit:269398 (duration: 02m 31s)
  • 15:59 elukey: puppet re-enabled on kafka1012
  • 15:56 paravoid: "power"cycling alsafi
  • 15:55 moritzm: uploaded linux 4.4-1~wmf1 (jessie-wikimedia/experimental) to carbon
  • 15:48 _joe_: re-removed the puppet facts for protactinium
  • 15:40 paravoid: echo 1 > /proc/sys/net/ipv4/vs/schedule_icmp on lvs3001
  • 15:36 elukey: disabled puppet on kafka1012, changing temporary kafka retention to purge some extra logs
  • 15:17 cmjohnson1: snapshot1002 mistakenly taken offline -- booting now
  • 15:15 paravoid: upgrading lvs4001/4002 to linux 4.4.0
  • 15:07 godog: stop cassandra on restbase1007, cpu/mem upgrade and reimage
  • 14:59 paravoid: upgrading lvs3001/3002 to linux 4.4.0
  • 14:53 godog: reboot ms-be1004, xfs hosed
  • 14:51 hashar: Cutting branches 1.27.0-wmf.13
  • 14:46 elukey: re-enabled puppet on mc1004.eqiad
  • 14:45 bblack: resuming cpNNNN rolling kernel reboots
  • 14:41 _joe_: setting mw1026-1050 as inactive in the appservers pool (T126242)
  • 13:58 hashar: shutting down jenkins finally, and restarting it
  • 13:51 hashar: Restarting Jenkins. It can not manage to add slaves
  • 13:15 paravoid: upgrading lvs1001/lvs1007/lvs1002/lvs1008/lvs1003/lvs1009 to 4.4.0
  • 13:11 akosiaris: reboot serpens to apply memory increase of 2G
  • 13:07 paravoid: installing linux 4.4.0 on lvs1001
  • 13:01 hashar: Jenkins disabled again :(
  • 12:53 akosiaris: reboot seaborgium to apply memory increase of 2G
  • 12:47 hashar: Updated faulty script that caused 'php' too loop infinitely. Jenkins back up.
  • 12:36 hashar: Jenkins no more accept new jobs until the slaves are fixed :/
  • 12:33 hashar: all CI slaves looping to death because of a php loop
  • 11:43 paravoid: upgrading lvs2001, lvs2002, lvs2003 to kernel 4.4.0
  • 11:36 paravoid: reverting lvs2005 to 3.19 and rebooting, test is over and was successful
  • 11:19 paravoid: stopping pybal on lvs2002
  • 11:05 paravoid: installing linux-image-4.4.0 on lvs2005 and rebooting for testing
  • 10:53 apergos: salt minions on labs instances that respond to labcontrol1001 will be coming back up over the next 1/2 hour as puppet runs (salt master key fixes)
  • 10:45 elukey: disabled puppet, redis and memcached on mc1004 for jessie migration
  • 10:33 _joe_: pybal updated everywhere
  • 10:32 gehel: elasticsearch codfw: cleanup leftover logs /var/log/elasticsearch/*.[2-7]
  • 10:24 gehel: elasticsearch eqiad: cleanup leftover logs /var/log/elasticsearch/*.[2-7]
  • 10:09 _joe_: upgrading pybal on active nodes in esams and eqiad
  • 10:04 _joe_: depooling elastic1021.eqiad.wmnet as RAM has failed
  • 09:56 jynus: running table engine conversion script on db1069 (potential small lag on labs for 1 day)
  • 09:40 moritzm: restarted cassandra-a service on praseodymium
  • 09:21 ema: restarted hhvm on mw1132
  • 08:49 _joe_: installing the new pybal package in esams and eqiad backups
  • 08:23 moritzm: restarted cassandra-a service on praseodymium
  • 07:11 _joe_: manually touched (with -h) the wmf-config/PrivateSettings.php symlink on all mw* hosts
  • 07:02 logmsgbot: tgr@mira Synchronized wmf-config/PrivateSettings.php: Mass logout via $wgAuthenticationTokenVersion - T124440#2010709 (duration: 01m 20s)
  • 01:58 legoktm: added SMalyshev to wikidata-query gerrit group
  • 01:34 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/extensions: https://gerrit.wikimedia.org/r/#/c/269344/ and https://gerrit.wikimedia.org/r/#/c/269293/1 (duration: 01m 51s)
  • 01:27 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/extensions/OAuth/frontend/specialpages/SpecialMWOAuthManageConsumers.php: https://gerrit.wikimedia.org/r/#/c/269333/ (duration: 01m 19s)
  • 01:24 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/resources/src: https://gerrit.wikimedia.org/r/#/c/269140/ (duration: 01m 19s)
  • 00:54 logmsgbot: krenair@mira Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/266509/ (duration: 01m 17s)
  • 00:52 logmsgbot: krenair@mira Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/266509/8 (duration: 01m 17s)
  • 00:50 logmsgbot: krenair@mira Synchronized wmf-config/ProductionServices.php: https://gerrit.wikimedia.org/r/#/c/266509/8 (duration: 01m 18s)
  • 00:42 hashar: killed Zuul scheduler. On gallium edited /usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/trigger/gerrit.py and modified: replication_timeout = 300 -> replication_timeout = 10 . Started Zuul
  • 00:26 logmsgbot: krenair@mira Synchronized portals: https://gerrit.wikimedia.org/r/#/c/268849/ (duration: 01m 18s)
  • 00:25 logmsgbot: krenair@mira Synchronized portals/prod/wikipedia.org/assets: https://gerrit.wikimedia.org/r/#/c/268849/ (duration: 01m 18s)
  • 00:17 logmsgbot: krenair@mira Synchronized portals: https://gerrit.wikimedia.org/r/#/c/268849/ (duration: 01m 17s)
  • 00:15 logmsgbot: krenair@mira Synchronized portals/prod/wikipedia.org/assets: https://gerrit.wikimedia.org/r/#/c/268849/ (duration: 01m 16s)

2016-02-08

  • 23:30 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.12/includes/interwiki/Interwiki.php: ac6e170fa5: Fix-up for I5a979f047031e (duration: 01m 18s)
  • 23:24 logmsgbot: bd808@mira Synchronized wmf-config/logging.php: logging: Collect mw1017 logs for debugging (9d6d0e0) (duration: 01m 18s)
  • 23:20 logmsgbot: bd808@mira Synchronized wmf-config/logging.php: logging: Send all udp2log eligible messages to $wmgDefaultMonologHandler (cd25586) (duration: 01m 17s)
  • 23:16 logmsgbot: bd808@mira Synchronized wmf-config/logging.php: Monolog: Add mwversion to udp2log log events (9b54967) (duration: 01m 18s)
  • 23:05 logmsgbot: bd808@mira Synchronized wmf-config/logging.php: Monolog: normalize messages before PSR3 expansion (e5ee5d8) (duration: 01m 18s)
  • 22:52 logmsgbot: ori@mira Synchronized wmf-config/missing.php: Ib5407c560: Update missing.php for interwiki.php (duration: 01m 18s)
  • 22:51 logmsgbot: ori@mira Synchronized docroot and w: Ifd7fe8c3c: createTxtFileSymlinks.sh: drop interwiki.cdb; add interwiki.php (duration: 01m 21s)
  • 22:43 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.12/includes/debug/logger/monolog/WikiProcessor.php: Add $wgVersion to MediaWiki\Logger\Monolog\WikiProcessor (3cea726) (duration: 01m 19s)
  • 21:51 logmsgbot: ori@mira Synchronized wmf-config/CommonSettings.php: Ie9bdd77fb: Use interwiki.php on all wikis; delete unused interwiki.json (duration: 01m 19s)
  • 21:31 logmsgbot: ori@mira Synchronized wmf-config/CommonSettings.php: I39c9ecd4b: Enable static PHP interwiki cache on mediawikiwiki and testwiki (duration: 01m 18s)
  • 21:28 ori: Restarting HHVM on mw1017 to wipe APC cache
  • 21:23 logmsgbot: ori@mira Synchronized wmf-config/CommonSettings.php: Ib599f9984a: Add interwiki.php; use it on mw1017 & on labs (2/2) (duration: 01m 16s)
  • 21:22 logmsgbot: ori@mira Synchronized wmf-config/interwiki.php: Ib599f9984a: Add interwiki.php; use it on mw1017 & on labs (1/2) (duration: 01m 20s)
  • 21:19 subbu: finished deploying parsoid version 4d44fcc7
  • 21:10 subbu: synced code; restarted parsoid on wtp1003 as a canary
  • 21:04 subbu: starting parsoid deploy
  • 21:02 bblack: resuming rolling reboots of cpNNNN caches for kernel updates
  • 20:34 mobrovac: restbase deploy end of c929ceb
  • 20:26 mobrovac: restbase deploy start of c929ceb
  • 19:36 logmsgbot: mattflaschen@mira Synchronized wmf-config/InitialiseSettings-labs.php: Beta Cluster-only change (duration: 01m 20s)
  • 19:04 papaul: sarin - signing puppet certs, salt-key, initial run
  • 19:03 urandom: restart restbase on restbase-test2001.codfw (staging)
  • 18:54 mobrovac: mathoid deployed 4bdb2f18c
  • 18:43 urandom: rolling Cassandra restart in restbase staging complete
  • 18:35 MatmaRex: Reopened 54 Phabricator tasks that someone merged into one, hope I haven't made more of a mess than it was before
  • 18:30 jynus: applying ferm on dbstore1001 and dbstore1002
  • 18:29 urandom: performing rolling restart of Cassandra in staging (to pickup /usr/share/cassandra/lib/cassandra-brotli-1.0.0-a64ce47.jar in classpath)
  • 18:25 elukey: re-enabled puppet on mc1004
  • 18:17 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-common.php: dont reindex wgCirrusSearchNamespaceWeights from 0 (duration: 01m 17s)
  • 18:13 chasemp: cleanup snapshots on labstore1001
  • 17:03 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Set category collation on gd.wikipedia gerrit:267820 (duration: 01m 21s)
  • 16:59 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy Translate extension on ru.wikimedia gerrit:267822 (duration: 01m 17s)
  • 16:52 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable signature button for the Project namespace in ru.wiki gerrit:267997 (duration: 01m 19s)
  • 16:45 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespaces configuration on mai.wikipedia gerrit:268573 (duration: 01m 17s)
  • 16:39 logmsgbot: thcipriani@mira Synchronized wmf-config/mobile.php: SWAT: Use custom generator for mobile search on Wikidata Part II gerrit:254645 (duration: 01m 19s)
  • 16:37 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Use custom generator for mobile search on Wikidata Part I gerrit:254645 (duration: 01m 18s)
  • 16:32 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Do not request pageprops for mobile search/nearby on wikidata gerrit:268208 (duration: 01m 20s)
  • 16:28 logmsgbot: thcipriani@mira Synchronized wmf-config/Wikibase-production.php: SWAT: Add $wgWBRepoSettings[sparqlEndpoint] gerrit:268467 (duration: 01m 18s)
  • 16:27 elukey: restarted nutcracker in G@cluster:appserver and G@site:eqiad due to connect error issues (5 hosts per batch)
  • 16:27 _joe_: reinstalling pybal's new version (reduced) on ulsfo and codfw caches
  • 16:24 jynus: reverting slaves topology back to db1024 master
  • 16:19 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: wgRCWatchCategoryMembership true everywhere except wikisource gerrit:264734 (duration: 01m 26s)
  • 16:01 andrewbogott: restarting pdns on labservices 1001 to test loglevels
  • 15:30 elukey: stopping redis and memcached for mc1004.eqiad.wmnet due to Jessie re-image
  • 15:30 chasemp: restarting pdns and pdns-recursor on labservices1001
  • 15:01 jynus: restarting and upgrading dbstore1001 (db backups agent host)
  • 14:41 bblack: mobile LVS service decom complete (IPs now belong to text service)
  • 14:03 bblack: starting mobile LVS service decom (IPs moving to text) - puppet disabled on text caches and high-traffic1 LVSes
  • 13:56 bblack: cpNNNN rolling reboots paused (3038 still coming up)
  • 13:12 bblack: start up more rolling cache reboots for kernels (cpNNNN)
  • 13:09 elukey: updated hhvm on mw2016.codfw.wmnet, mw2161.codfw.wmnet, mw2199.codfw.wmnet, mw1259.eqiad.wmnet, mw1260.eqiad.wmnet
  • 13:05 _joe_: roll back installation of pybal, issues with upd and ipv6
  • 12:56 elukey: updated hhvm on mw1080, mv1084, mw1241
  • 12:32 elukey: restarting hhvm on mw1052, mw1075, mw1080, mw1081, mw1094, mw1095 to rollout the new version
  • 12:32 _joe_: uploaded a new pybal package; installing on codfw and ulsfo backups
  • 12:05 _joe_: restarted cron on tin, to catch up with the uid change for the l10nupdate user
  • 11:53 bblack: rebooting cp1074, cp3047 (for kernels, also to compare bios/drac settings...)
  • 11:26 jynus: stopping mysql at db2012
  • 11:25 jynus: starting mysql at db2012
  • 11:05 moritzm: rebooting db2012 for kernel update
  • 11:00 moritzm: rebooting terbium for kernel update
  • 10:26 moritzm: rebooting es2006,es2008 for kernel update
  • 10:25 moritzm: upgrading jobrunners/imagescalers in eqiad for hhvm float timeout fix
  • 10:20 jynus: changing s2 replication topology in preparation for master failover
  • 09:45 jynus: starting es2004
  • 09:29 moritzm: rebooting es2005,es2007,es2009,es2010 for kernel update
  • 09:15 elukey: hhvm restarted on mw1044.eqiad.wmnet due to hhvm package update
  • 09:15 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Feb 8 09:15:11 UTC 2016 (duration 8m 10s)
  • 09:12 elukey: hhvm restarted on mw1034.eqiad.wmnet due to hhvm package update
  • 09:07 logmsgbot: oblivian@tin sync-l10n completed (1.27.0-wmf.12) (duration: 11m 55s)
  • 08:42 _joe_: trying a manual run of l10nupdate since it failed last night again
  • 08:25 moritzm: rebooting es2001 to es2004 for kernel update

2016-02-07

  • 04:54 andrewbogott: upgraded python-openstackclient python-glanceclient python-novaclient python-keystoneclient on silver

2016-02-06

  • 05:43 bblack: rebooted cp2006 via racadm after crash - no crash data in logs...

2016-02-05

  • 23:54 chasemp: nfs shaping is really writes :)
  • 23:54 chasemp: tc to shape some nfs read traffic in tools for labs (also logged there) can be cancelled with: /sbin/tc qdisc del dev eth0 root
  • 23:51 YuviPanda: dropped old nfs snapshots from labstore1001
  • 23:30 logmsgbot: maxsem@mira Synchronized portals: (no message) (duration: 01m 18s)
  • 23:29 logmsgbot: maxsem@mira Synchronized portals/prod/wikipedia.org/assets: (no message) (duration: 01m 19s)
  • 22:56 jynus: reimaging db1018
  • 22:48 jynus: restarting slave on m2/codfw (db2011)
  • 22:41 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/268818/ (duration: 01m 22s)
  • 22:10 bblack: cache rolling reboots stopped for the weekend, can pick up the other half monday
  • 20:36 bblack: resuming rolling cache reboots
  • 20:07 mutante: cygnus - reboot VM
  • 19:28 bblack: halted rolling cache reboots, we seem to be having problems with a batch of them coming back...
  • 18:23 logmsgbot: demon@mira Synchronized wmf-config/InitialiseSettings.php: comment stuff, gerrit 267994 (duration: 01m 19s)
  • 18:15 jynus: stopping mysql@db1018 and starting to clone it for reimaging
  • 18:10 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1018 for maintenance (duration: 02m 12s)
  • 17:31 cmjohnson1: trouble shooting elastic1021
  • 17:08 bblack: rolling cpNNNN reboots are 27% complete, only two hosts so far failed to reboot on their own (but came up fine after manual racadm powercycle)
  • 16:20 ottomata: reenabling kafka1012 in analytics-eqiad kafka cluster
  • 16:03 jynus: reimaging db2030 to test jessie installer
  • 15:53 logmsgbot: oblivian@tin sync-l10n completed (1.27.0-wmf.12) (duration: 00m 08s)
  • 15:47 urandom: performing rolling restbase restart in staging env
  • 15:35 andrewbogott: rebooting silver for kernel update - wikitech outage will ensue
  • 15:33 urandom: re-restarting restbase on restbase1002.eqiad.wmnet,restbase1005.eqiad.wmnet,restbase1006.eqiad.wmnet,restbase1009.eqiad.wmnet (prior restarts may have happened before puppet run)
  • 15:29 andrewbogott: rebooting holmium for kernel update
  • 15:27 andrewbogott: rebooting labcontrol1002 for kernel update
  • 15:24 bblack: cp3005 didn't come back online during rolling reboot, investigating (remains depooled)
  • 15:22 _joe_: initializing mediawiki repos on tin
  • 15:22 andrewbogott: rebooting labnet1001 for kernel update
  • 15:15 urandom: restbase rolling restart complete
  • 15:08 urandom: performing rolling restbase restart to apply config change (https://gerrit.wikimedia.org/r/#/c/268611/)
  • 14:56 urandom: forcing puppet run and bouncing restbase on restbase1001.eqiad.wmnet (https://gerrit.wikimedia.org/r/#/c/268611/)
  • 14:41 elukey: confctl mw1228.eqiad.wmnet: weight changed 10 => 20
  • 14:24 moritzm: rebooting db2065 to db2070 for kernel update
  • 14:20 jynus: reimporting nlwiktionary revision into labs (expect some temporary lag on labs-s3)
  • 14:06 moritzm: rebooting db2060 to db2064 for kernel update
  • 13:34 bblack: starting rolling reboots of cp* (traffic cache hosts) for kernel updates
  • 12:51 moritzm: rebooting db2055 to db2059 for kernel update
  • 12:38 elukey: repooled mw1228.eqiad.wmnet
  • 12:34 moritzm: rebooting db2050 to db2054 for kernel update
  • 12:15 moritzm: rebooting db2045 to db2049 for kernel update
  • 12:07 jynus: reimporting nlwiktionary pages into labs
  • 12:05 logmsgbot: l10nupdate@tin LocalisationUpdate failed: git pull of core failed
  • 12:05 logmsgbot: l10nupdate@tin LocalisationUpdate failed: git clone of core failed
  • 11:54 moritzm: rebooting db2041 to db2044 for kernel update
  • 11:37 moritzm: rebooting db2038 to db2040 for kernel update
  • 11:36 godog: start swiftrepl replication pass of common thumbs eqiad -> codfw
  • 10:15 moritzm: rolling reboot of ocg* cluster
  • 02:27 mobrovac: restbase deploy end of caae1f7
  • 02:20 mobrovac: restbase deploy start of caae1f7
  • 01:52 Tim: deploying apache log format change following successful test on deployment-prep
  • 01:47 logmsgbot: aude@mira Finished scap: Re-add user rights messages in Echo (duration: 24m 37s)
  • 01:22 logmsgbot: aude@mira Started scap: Re-add user rights messages in Echo
  • 01:20 logmsgbot: aude@mira Synchronized php-1.27.0-wmf.12/extensions/Echo: Re-add user rights messages in Echo (duration: 01m 20s)
  • 01:06 logmsgbot: aude@mira Synchronized wmf-config/: Sync wikidata config changes for beta (duration: 01m 15s)
  • 01:00 mobrovac: restbase deploy end of 2aef1b67a0
  • 00:58 logmsgbot: aude@mira Synchronized wmf-config/InitialiseSettings.php: Set $wgEnotifMinorEdits to true on huwiki (duration: 01m 16s)
  • 00:53 logmsgbot: aude@mira Synchronized wmf-config/InitialiseSettings.php: Add museumvictoria.com.au to $wgCopyUploadsDomains (duration: 01m 17s)
  • 00:46 logmsgbot: aude@mira Synchronized wmf-config/InitialiseSettings.php: Re-enable ShortUrl on maiwiki, bhwiki and orwikisource, after creating db table (duration: 01m 18s)
  • 00:36 logmsgbot: aude@mira Synchronized wmf-config/InitialiseSettings.php: revert shorturl changes (duration: 01m 17s)
  • 00:29 logmsgbot: aude@mira Synchronized wmf-config/InitialiseSettings.php: Enable ShortUrl on maiwiki, bhwiki and orwikisource (duration: 01m 17s)
  • 00:26 akosiaris: restart apache on mendelevium.eqiad.wmnet. seems there's a memory leak, need to investigate tomorrow
  • 00:23 logmsgbot: aude@mira Synchronized wmf-config/InitialiseSettings.php: Re-enable category watch on wikipedia and commons (duration: 01m 18s)
  • 00:19 mobrovac: restbase deploy start of 2aef1b67a0 on rb1001

2016-02-04

  • 23:54 mobrovac: restbase disabled temporarily puppet in prod to test https://gerrit.wikimedia.org/r/#/c/268597/
  • 23:29 mutante: rcs1001 - rebooting
  • 22:45 bblack: rebooting cp1060 to test traffic-pool stuff
  • 22:30 logmsgbot: demon@mira Synchronized wmf-config/: undo my cleanup grumble grumble (duration: 01m 16s)
  • 22:12 logmsgbot: demon@mira Synchronized wmf-config/: touch (duration: 01m 14s)
  • 22:10 logmsgbot: demon@mira Synchronized private/: touch (duration: 01m 15s)
  • 22:01 logmsgbot: demon@mira Synchronized wmf-config/PrivateSettings.php: touch symlink (duration: 01m 15s)
  • 21:57 logmsgbot: demon@mira Synchronized wmf-config/: gerrit 268471, 268454 (duration: 01m 18s)
  • 21:52 mutante: mira chgrp -R wikidev /srv/mediawiki-staging/.git/objects
  • 21:29 mutante: rcs1001 started redis
  • 21:21 paravoid: setting up OSPF/OSPF3/PIM between ulsfo and codfw (cr2-ulsfo/cr1-codfw)
  • 21:19 mutante: rcs1002 - start redis
  • 21:15 mutante: rcs1002 - reboot for kernel
  • 20:45 mutante_: rsc1001 - schedule downtime, reboot
  • 20:30 paravoid: cr1-ulsfo: deactivating BGP peering with GTT
  • 20:26 mutante: eeden service ntp restart
  • 20:26 hashar: All wikis to 1.27.0-wmf.12 No troubles so far congratulations to everyone involved @wikimedia #wikimedia
  • 20:23 mutante: mw1115 service hhvm restart
  • 20:16 mutante: mw1117 - powercycled
  • 20:15 paravoid: cr1-ulsfo: turning up BGP with Zayo
  • 20:15 mutante: scb1001/scb1002 service mathoid restart
  • 20:03 logmsgbot: hashar@mira rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.12
  • 20:03 hashar: all wikis to 1.27.0-wmf.12 (yeah really)
  • 20:00 mobrovac: restbase rolling restart after merging https://gerrit.wikimedia.org/r/#/c/268016/
  • 19:54 logmsgbot: demon@mira Synchronized private/PrivateSettings.php: (no message) (duration: 02m 05s)
  • 19:43 ottomata1: rebooting kafka1012
  • 19:32 logmsgbot: demon@mira Synchronized wmf-config/InitialiseSettings.php: T125850 (duration: 02m 11s)
  • 19:00 logmsgbot: demon@mira Synchronized wmf-config/InitialiseSettings-labs.php: prod no op for completeness (duration: 03m 02s)
  • 18:49 logmsgbot: demon@mira Synchronized wmf-config/: removing old mwblocker.log (duration: 02m 07s)
  • 18:45 logmsgbot: demon@mira Synchronized private/mwblocker.log: (no message) (duration: 02m 10s)
  • 18:40 yurik: deployed and reenabled tilerator & tileratorui
  • 17:30 logmsgbot: hoo@mira Synchronized php-1.27.0-wmf.12/extensions/Wikidata: Fix editing terms in languages other than the interface language via the term box (duration: 02m 18s)
  • 17:29 hoo: (Re)started wdqs-updater on wdqs1001, but seems it doesn't work
  • 16:53 twentyafterfour: restarted phd to synchronize settings with phabricator
  • 16:52 twentyafterfour: restarted apache2 on iridium so that phabricator recognizes sprint.phragile-uri
  • 16:45 logmsgbot: thcipriani@mira Synchronized php-1.27.0-wmf.12/includes/media/Bitmap.php: SWAT: BitmapHandler: Implement validateParam() gerrit:268407 (duration: 02m 08s)
  • 16:07 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable confirmed group at nowiki gerrit:267804 (duration: 02m 15s)
  • 15:53 urandom: rolling restbase restart complete
  • 15:47 urandom: restbase cluster puppet run complete; performing rolling restart of restbase (applying https://gerrit.wikimedia.org/r/#/c/266297/)
  • 15:44 moritzm: rebooting db203[678] for kernel update
  • 15:35 urandom: forcing puppet run on restbase cluster (config deploy)
  • 15:34 urandom: reenabling puppet on restbase cluster (continue config deploy)
  • 15:18 urandom: restarting restbase on restbase1001.eqiad.wmnet (config deploy)
  • 15:15 urandom: re-enabling puppet and forcing run on restbase1001.eqiad.wmnet (canary config deploy)
  • 15:11 _joe_: restarting pybal on lvs200{3,6}
  • 15:08 urandom: disabling puppet on restbase cluster in preparation for configuration deploy (https://gerrit.wikimedia.org/r/#/c/266297/)
  • 15:07 moritzm: rebooting oxygen for kernel update
  • 14:58 ottomata: stopping eventlogging to reboot eventlog1001 for kernel update
  • 14:41 moritzm: rebooting db203[45] for kernel update
  • 14:09 hoo: Restarted blazegraph on wdqs1001
  • 13:57 godog: powercycle ms-be2020
  • 13:39 moritzm: continue rolling reboot of maps cluster for kernel update (2002-2004)
  • 12:21 jynus: starting mysql at db2009
  • 12:08 moritzm: rebooting db2001 to db2019 for kernel update
  • 11:44 jynus: dropping echo_* tables from labs
  • 11:19 dcausse: elastic codfw: resuming writes and setting cluster.routing.allocation.balance.threshold back to default (1%)
  • 10:35 dcausse: elastic codfw: freezing writes and setting cluster.routing.allocation.balance.threshold to 100% (fast recovery test)
  • 10:35 logmsgbot: hashar@mira Synchronized php-1.27.0-wmf.12/.gitmodules: Set branch in .gitmodules for extensions/Wikidata https://gerrit.wikimedia.org/r/#/c/268218/ (duration: 02m 08s)
  • 10:16 moritzm: rolling reboot of maps cluster for kernel update
  • 10:14 jynus: testing new replication filters from production's testwiki
  • 10:13 elukey: running smartctl -t long on kafka1012 (kafka not running, host de-pooled from the broker list)
  • 10:11 moritzm: repooling restbase2006
  • 10:01 jynus: applying live on the 7 sanitarium instance the newly puppet-configured labs replication filters
  • 09:57 moritzm: repooling restbase2005, depooling restbase2006 for kernel reboot/Java update
  • 09:46 dcausse: elastic in codfw: reducing the number of replicas from 0-3 to 0-2 for commonswiki_file
  • 09:46 moritzm: repooling restbase2004, depooling restbase2005 for kernel reboot/Java update
  • 09:39 ema: re-enabling puppet on mw1161
  • 09:34 moritzm: depooling restbase2004 for kernel reboot/Java update
  • 09:11 jynus: converting remaining InnoDB tables (s3) to TokuDB on db1069
  • 08:14 chasemp: iridium puppet agent --enable && puppet agent --disable "DO NO ENABLE AS IT WILL BREAK THINGS CONTACT MUKUNDA"
  • 07:51 twentyafterfour: phabricator repositories checked out to these revisions: http://pastebin.com/JxEaYKiW
  • 07:49 chasemp: git checkout tag release/2015-11-18/1 for phab & libphutil on iridiuum
  • 07:35 andrewbogott: disabling puppet on iridium to prevent it from smashing phabricator (as it seems to do now and then)
  • 07:00 andrewbogott: on iridium in /srv/deployment/phabricator/deploy/phabricator, naming the currently detached git branch ‘andrewfounditlikethis'
  • 06:49 robh: phabricator down with errors during repo updates in phd daemon log
  • 02:12 mutante: OTRS - changed motd message in /opt/otrs/Kernel/Output/HTML/Templates/Standard/Motd.tt - admins can turn it on and off
  • 01:04 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/tests: https://gerrit.wikimedia.org/r/#/c/268332/ (duration: 02m 08s)
  • 01:01 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/includes/parser: https://gerrit.wikimedia.org/r/#/c/268332/ (duration: 02m 25s)
  • 01:00 moritzm: rebooting iridium (phabricator host) for kernel update
  • 00:42 YuviPanda: yuvipanda@labstore2001:~$ sudo lvremove backup/maps20160121040005
  • 00:41 YuviPanda: yuvipanda@labstore2001:~$ sudo lvremove backup/tools20160121020007
  • 00:04 logmsgbot: thcipriani@mira rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.12

2016-02-03

  • 23:53 moritzm: repooling restbase2002 , depooling restbase2003 for kernel/Java update
  • 23:39 moritzm: repooling restbase2001 , depooling restbase2002 for kernel/Java update
  • 23:36 logmsgbot: thcipriani@mira rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.12
  • 23:29 hashar: passing wmf12 responsibility to thcipriani . Crashing to bed myself.
  • 23:22 moritzm: depooling restbase2001 for kernel/Java update
  • 23:15 moritzm: rebooting wdqs1002 for kernel update
  • 23:08 hashar: Full script of my deployment session is on mira.codfw.wmnet:/home/hashar/wmf12-deploy.script
  • 23:07 logmsgbot: hashar@mira rebuilt wikiversions.php and synchronized wikiversions files: Clarify only testwiki and test2wiki are on php-1.27.0-wmf.12
  • 23:07 moritzm: rebooting wdqs1001 for kernel update
  • 22:51 hashar: test / test2 wikis are incredibly slow . Filled https://phabricator.wikimedia.org/T125727
  • 22:47 subbu: finished deploying parsoid sha 98619f7f
  • 22:43 logmsgbot: hashar@mira rebuilt wikiversions.php and synchronized wikiversions files: test2wiki to php-1.27.0-wmf.12
  • 22:43 hashar: sync-wikiversions "test2wiki to php-1.27.0-wmf.12"
  • 22:41 moritzm: repooling restbase1009
  • 22:38 logmsgbot: hashar@mira Finished scap: to properly sync other master tin due to l10nupdate ui mismatch (duration: 24m 27s)
  • 22:34 moritzm: repooling restbase1006 , depooling restbase1009 for kernel/Java update
  • 22:34 hashar: Still looking at test.wikipedia.org being super "slow" . scap still rebuilding though
  • 22:32 ejegg: updated payments-wiki from 1817327b4b0919ebe26bbd8b9d84fac1bd7ddb03 to fad669c99db8240b26a524aa70c85cfebd13a18c
  • 22:21 moritzm: repooling restbase1005 , depooling restbase1006 for kernel/Java update
  • 22:14 ejegg: rolled payments-wiki back to 1817327b4b0919ebe26bbd8b9d84fac1bd7ddb03
  • 22:14 hashar: https://test.wikipedia.org/ switched to 1.27.0-wmf.12
  • 22:13 logmsgbot: hashar@mira Started scap: to properly sync other master tin due to l10nupdate ui mismatch
  • 22:13 subbu: restarted parsoid on wtp1002 as a canary
  • 22:13 logmsgbot: hashar@mira Finished scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (with proper branches for special_extensions) (duration: 20m 23s)
  • 22:07 moritzm: repooling restbase1004 , depooling restbase1005 for kernel/Java update
  • 22:06 ejegg: updated payments-wiki from 1817327b4b0919ebe26bbd8b9d84fac1bd7ddb03 to 52afbc735ef5d759fd42bef072bed286fe3a5581
  • 22:06 subbu: starting parsoid deploy
  • 22:03 mutante: mira, tin: find /srv/mediawiki-staging/ -uid 1001 -exec chown 10002 {} \;
  • 21:53 hashar: reopened https://phabricator.wikimedia.org/T119165 l10nupdate user uid mismatch between tin and mira
  • 21:52 logmsgbot: hashar@mira Started scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (with proper branches for special_extensions)
  • 21:51 mutante: tin - find / -uid 1001 -exec chown 10002 {} \;
  • 21:49 mutante: tin - fixing UID of l10nupdate user (T119165)
  • 21:45 moritzm: depooling restbase1004 for kernel/Java update
  • 21:45 moritzm: repooling restbase1003
  • 21:35 hashar: mismatching uid for l10nupdate user between mira and tin
  • 21:34 logmsgbot: hashar@mira scap aborted: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (with proper branches for special_extensions) (duration: 07m 41s)
  • 21:32 moritzm: depooling restbase1003 for kernel/Java update
  • 21:27 moritzm: repooling restbase1008
  • 21:26 logmsgbot: hashar@mira Started scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (with proper branches for special_extensions)
  • 21:25 hashar: mira had to hard reset CentralNotice / SemanticMediaWiki / SemanticResultFormats / Validator after we pointed them from master to their proper branch, submodule attempted a rebase automatically.. That is a no no
  • 21:14 moritzm: depooling restbase1008 for kernel/Java update
  • 21:08 hashar: waiting for the submodule patch https://gerrit.wikimedia.org/r/#/c/268214/ to land and will scap again
  • 20:33 logmsgbot: hashar@mira scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.bTBpxD6CuI" ' returned non-zero exit status 1 (duration: 01m 13s)
  • 20:32 logmsgbot: hashar@mira Started scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (after RandomRootPage had a dummy entry point added)
  • 20:31 logmsgbot: demon@mira Synchronized php-1.27.0-wmf.12/extensions/RandomRootPage/: unbreak (duration: 01m 19s)
  • 20:23 logmsgbot: hashar@mira scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_2188303825" --threads=10 --lang en --quiet' returned non-zero exit status 255 (duration: 01m 49s)
  • 20:21 logmsgbot: hashar@mira Started scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache
  • 20:20 hashar: Hacked wikiversions.json to only have testwiki on .12
  • 19:58 logmsgbot: demon@mira Synchronized wmf-config/InitialiseSettings.php: touch (duration: 01m 19s)
  • 19:49 logmsgbot: demon@mira Synchronized wmf-config/: fix wikibase/mobilefrontend config (duration: 01m 19s)
  • 19:48 robh: halting puppet on carbon for a few minutes to livehack a partition recipe change in netboot.cfg
  • 19:45 hashar: https://phabricator.wikimedia.org/T125672 blocking wmf.12 "Notice: Undefined variable: wgMFQueryPropModules in /srv/mediawiki/wmf-config/Wikibase.php on line 120"
  • 19:39 akosiaris: hot patch OTRS installation with https://github.com/OTRS/otrs/commit/c7ea6d64e02518e166fbac02f42f25dacad54342
  • 19:35 hashar: mira: manually fixed /php and /w/static/current symlinks to point back to .10 (wikiversions migrated them to .11 which we skip)
  • 19:30 moritzm: repooling restbase1002
  • 19:29 hashar: Create patches to update wikiversions.json
  • 19:24 hashar: Applying security patches on mira
  • 19:24 hashar: starting train deployment of 1.27.0-wmf.12
  • 19:09 csteipp: deployed patch for T125684
  • 19:08 moritzm: depooling restbase1002 for kernel/Java update
  • 18:38 logmsgbot: bd808@mira Synchronized wmf-config/InitialiseSettings-labs.php: Experiment one: Labs stripping HTML in beta (360e5af) (duration: 01m 19s)
  • 18:34 moritzm: rebooting californium for kernel update
  • 18:16 bblack: restarting pybal on lvs1001
  • 18:04 jynus: previous announcement was for db2011, not db2010
  • 18:02 jynus: starting slave IO thread on db2010
  • 17:32 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1060 after maintenance (duration: 01m 20s)
  • 17:22 mobrovac: restbase restarting rb1001
  • 17:03 mdholloway: mobileapps deployed 68e38ec
  • 17:02 bblack: restarting pybal on lvs1004 (not 1003!) T125397
  • 17:02 bblack: restarting pybal on lvs1003 T125397
  • 16:57 hashar: mira: updating /srv/mediawiki-staging/php-1.27.0-wmf.12 (prep deployment train)
  • 16:55 logmsgbot: thcipriani@mira Synchronized wmf-config/CirrusSearch-production.php: SWAT: Return more like search queries to codfw gerrit:268097 (duration: 01m 17s)
  • 16:45 logmsgbot: thcipriani@mira Synchronized wmf-config/CommonSettings.php: SWAT: Remove unused/no longer existing item-create oauth grant gerrit:265447 (duration: 01m 18s)
  • 16:39 logmsgbot: thcipriani@mira Synchronized wmf-config: SWAT: Enable math data type on test wikidata + test wikipedias gerrit:268086 (duration: 01m 18s)
  • 16:32 logmsgbot: thcipriani@mira Synchronized wmf-config/mobile.php: SWAT: Remove section collapsing config gerrit:267776 (duration: 01m 18s)
  • 16:28 akosiaris: OTRS migration to 4.0 completed, starting upgrade to 5.0
  • 16:24 logmsgbot: thcipriani@mira Synchronized wmf-config/CommonSettings.php: SWAT: MW parsoid URLs: s/parsoidcache/parsoid/ gerrit:267234 (duration: 01m 18s)
  • 16:18 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Add 2 sites to $wgCopyUploadsDomains gerrit:262893 (duration: 01m 18s)
  • 16:13 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Just use the default MobileFrontend specified page actions. Part II gerrit:267807 (duration: 01m 18s)
  • 16:11 logmsgbot: thcipriani@mira Synchronized wmf-config/mobile.php: SWAT: Just use the default MobileFrontend specified page actions. Part I gerrit:267807 (duration: 02m 14s)
  • 15:41 hashar: mira symlink pointing to current version got changed to wmf.11 by the checkoutMediaWiki script. Manually changed to proper wmf.10 https://phabricator.wikimedia.org/T125475#1994078
  • 15:32 jynus: restart and reconfigure mysql in db1060
  • 15:30 hashar: MediaWiki 1.27.0-wmf.12, from 1.27.0-wmf.12, successfully checked out.
  • 15:23 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1060 (duration: 00m 43s)
  • 15:21 hashar: mira: cloning 1.27.0-wmf.12 (no link updates)
  • 15:15 bblack: rebooting cp1060 (depooled/downtimed)
  • 15:11 bblack: depooling cp1060 temporarily from cache_mobile varnish backends
  • 14:56 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1054 with low weight, repool db1067 with original weight (duration: 01m 22s)
  • 14:50 bblack: rebooting cp1008 for kernel
  • 14:28 godog: investigating uwsgi processes for graphite-web not coming up after reboot
  • 14:10 moritzm: rebooting graphite1001 for kernel update
  • 13:41 godog: powercycle ms-be2015
  • 13:39 jynus: restarting and reconfiguring mysql at db1054
  • 13:27 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1067 at low weight; depool db1054 (duration: 01m 16s)
  • 11:45 jynus: restarting and reconfiguring mysql at db1067
  • 11:11 moritzm: repooling restbase1001
  • 11:04 akosiaris: OTRS database upgraded to 3.3, moving on with 4.0
  • 11:00 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1063 at 100% load; depool db1067 for maintenance (duration: 01m 16s)
  • 10:48 moritzm: depooling restbase1001 for kernel/Java update
  • 10:37 _joe_: ending the load test on the eqiad apaches
  • 10:11 moritzm: reboot francium for kernel update
  • 09:53 jynus: m2 backup finished on /srv/backups/2016-02-03_08-51-06, filename 'db1020-bin.000842', position 220103947
  • 09:50 moritzm: restarting neodymium for kernel update
  • 09:49 _joe_: doing some basic load test on appservers in eqiad
  • 08:52 akosiaris: stop otrs-daemon on mendelevium
  • 08:51 jynus: starting mysql backup on db1020 (/srv/backups)
  • 08:44 akosiaris: stop slave on db2011, db1020's (m2-master) slave, for OTRS migration. DO NOT ENABLE
  • 08:40 akosiaris: stop exim4, cron, apache2 on iodine, mendelevium
  • 08:39 akosiaris: disabling puppet on iodine, mendelevium, OTRS migration
  • 08:24 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1063 with low weight (duration: 01m 20s)

2016-02-02

  • 23:13 logmsgbot: demon@mira Finished scap: everything re-sync one more time for good measure (duration: 17m 04s)
  • 22:56 logmsgbot: demon@mira Started scap: everything re-sync one more time for good measure
  • 22:50 bblack: repooling scap proxies: mw10033, mw1070, mw1097, mw1216
  • 22:45 chasemp: restart hhvm & apache2 on mw1235.eqiad.wmnet
  • 22:44 _joe_: restarted hhvm on mw1231, stat_cache again
  • 22:42 logmsgbot: demon@mira Finished scap: resync final batch with master (duration: 06m 48s)
  • 22:35 logmsgbot: demon@mira Started scap: resync final batch with master
  • 22:31 logmsgbot: demon@mira Finished scap: re-sync batch of mw1136-50, mw1190-1220, mw2150-mw2200 with master (duration: 09m 33s)
  • 22:22 logmsgbot: demon@mira Started scap: re-sync batch of mw1136-50, mw1190-1220, mw2150-mw2200 with master
  • 22:20 ori: restarted HHVM on mw1243. Lock-up. Backtrace in /tmp/hhvm.2897.bt
  • 22:20 logmsgbot: demon@mira Finished scap: re-sync batch of mw1101-1135,1240-1260, 2101-2150 with master (duration: 12m 51s)
  • 22:07 logmsgbot: demon@mira Started scap: re-sync batch of mw1101-1135,1240-1260, 2101-2150 with master
  • 22:00 logmsgbot: demon@mira Finished scap: re-sync batch of mw1151-mw1225, mw2174-mw2214 with master (duration: 11m 24s)
  • 21:49 logmsgbot: demon@mira Started scap: re-sync batch of mw1151-mw1225, mw2174-mw2214 with master
  • 21:45 logmsgbot: demon@mira Finished scap: re-sync batch of mw1051-1100, mw2051-2100 with master (duration: 13m 41s)
  • 21:31 logmsgbot: demon@mira Started scap: re-sync batch of mw1051-1100, mw2051-2100 with master
  • 21:28 logmsgbot: demon@mira Finished scap: re-sync batch of mw1025-1050 and mw2007-mw2050 with master (2nd try) (duration: 14m 33s)
  • 21:27 _joe_: depooling eqiad scap-proxies
  • 21:13 logmsgbot: demon@mira Started scap: re-sync batch of mw1025-1050 and mw2007-mw2050 with master (2nd try)
  • 21:04 logmsgbot: demon@mira scap aborted: re-sync batch of mw1025-1050 and mw2007-mw2050 with master (duration: 10m 11s)
  • 20:54 logmsgbot: demon@mira Started scap: re-sync batch of mw1025-1050 and mw2007-mw2050 with master
  • 20:32 hashar: mw1114-mw1119 are canary api appservers Finished syncing
  • 20:28 ori: restarted hhvm on mw1116
  • 20:17 hashar: Running sync-common on mw1114-mw1119 (canary api appservers)
  • 20:16 ostriches: mira: removed untracked wmf-config/x.php testing file
  • 20:11 ori: Running sync-common on canary app servers (mw1017-mw1025)
  • 19:46 hashar: Running sync-common on mw1260 (video scaler)
  • 19:40 ori: Running sync-common on all jobscalers
  • 19:35 ori: Running sync-common on mw1259 (video scaler) and mw1153 (image scaler) too
  • 19:29 ori: Running sync-common on mw100[123]
  • 18:59 _joe_: running sync-common on mw1020
  • 18:54 _joe_: repooled mw1119
  • 17:45 hashar: mira /srv/mediawiki-staging git submodule update --init --recursive
  • 17:43 hashar: mw1119 sync-common
  • 17:37 godog: disable unused swift container-sync for wikibooks-ka-local-thumb wikibooks-hr-local-thumb wikibooks-km-local-thumb wikibooks-sk-local-thumb wikibooks-tr-local-thumb wikipedia-it-local-thumb.fc
  • 17:36 hashar: mw1119:/srv/mediawiki/wmf-config/event-schemas is empty
  • 17:31 _joe_: depooled mw1119, partial sync
  • 16:59 hashar: files were /srv/mediawiki/docroot/wikimedia.org/WikipediaMobileFirefoxOS/.git and /srv/mediawiki/docroot/wikimedia.org/WikipediaMobileFirefoxOS/js/lib/MobileFrontend/.git
  • 16:58 ostriches: mw1017: removed stray .git directory from WikipediaFirefoxMobileOS or w/e. It shouldn't be there anyway. sync-common is happy again on it
  • 16:48 hashar: tin /srv/mediawiki-staging  : running git submodule update --init --recursive
  • 16:47 hashar: tin /srv/mediawiki-staging  : running git submodule update --init
  • 16:40 hashar: mw1017 sync-common --verbose
  • 16:35 _joe_: sync-common on mw2030 and mw1161; re-enable puppet, jobrunner, jobchron on mw1161
  • 16:34 _joe_: restarted puppet and rsync on both tin and mira, removed comments on the l10nupdate job on tin
  • 16:23 logmsgbot: thcipriani@mira rebuilt wikiversions.php and synchronized wikiversions files: rebuild wikiversion.php
  • 14:57 godog: disable swift container-sync for wikipedia-it-local-public.a7
  • 14:43 hashar: tin /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.10 php-1.27.0-wmf.10
  • 14:43 hashar: tin /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.9 php-1.27.0-wmf.9
  • 14:43 hashar: tin /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.8 php-1.27.0-wmf.8
  • 14:21 hashar: starting rebuilding /srv/mediawiki-staging from scratch on tin (not mira)
  • 14:20 hashar: starting rebuilding /srv/mediawiki-staging from scratch on mira
  • 14:04 bblack: nevermind, not looking at eeden
  • 14:04 bblack: looking at eeden
  • 13:58 moritzm: rebooting eeden for kernel update
  • 13:09 moritzm: rolling reboot of scb* (for kernel update)
  • 13:02 akosiaris: reboot dubnium for kernel upgrades
  • 13:01 akosiaris: reboot pollux for kernel upgrades
  • 12:45 moritzm: rebooting baham for kernel update
  • 12:20 _joe_: stopping rsync on mira too, to avoid accidental deploys
  • 12:15 _joe_: stopped puppet on mira, added a big warning in the motd
  • 12:15 _joe_: stopped rsync, puppet, l10nupdate cronjob on tin
  • 12:06 _joe_: stopped rsync on tin to avoid problems
  • 11:38 moritzm: rolling reboot of aqs* (for kernel update)
  • 11:24 hashar_: Restarting Zuul. Stuck in a dependency loop :(
  • 11:12 jynus: restarting and reconfiguring mysql at db1063
  • 10:51 _joe_: stopped jobrunner on mw1161 after failed sync-common
  • 10:44 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1063, repool db1036 (duration: 00m 21s)
  • 10:00 jynus: reconfigure and upgrade db1036
  • 09:51 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Testing scap-reduce db1018 weight (duration: 00m 21s)
  • 09:42 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1036, repool db1021 (duration: 00m 22s)
  • 09:38 hashar: Jenkins is fully up and operational
  • 09:36 jynus: armed keyholder on tin
  • 09:34 dcausse: elastic (codfw and eqiad): unfreezing indices
  • 09:33 moritzm: restarting gerrit on ytterbium for java security update
  • 09:33 _joe_: re-syncing tin homes
  • 09:32 hashar: gallium: apt-get upgrade | Restarting Jenkins
  • 09:12 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1036, repool db1021 (duration: 00m 21s)
  • 09:08 dcausse: elastic (codfw and eqiad): freezing indices to stop titlesuggest maint scripts
  • 09:03 godog: repool restbase1007 via confctl
  • 08:13 jynus: restarting and upgrading db1021
  • 08:02 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Pool db1018; Depool db1021 (duration: 00m 20s)
  • 07:46 jynus: https://phabricator.wikimedia.org/rOMWC2ea9167221d11eb1880e4d26eae64a85cb9b2697 and https://phabricator.wikimedia.org/rOMWCa55d2bf8cd3a2853fac35d5b8239b8e8c2fe6a0f merged but not deployed
  • 06:58 _joe_: reimaging tin.eqiad.wmnet
  • 01:30 logmsgbot: ebernhardson@mira Finished scap: Add Cookie statement link to footer of all WMF wikis per legal (duration: 19m 42s)
  • 01:11 logmsgbot: ebernhardson@mira Started scap: Add Cookie statement link to footer of all WMF wikis per legal
  • 01:07 logmsgbot: ebernhardson@mira scap failed: CalledProcessError Command '/srv/deployment/scap/scap/bin/refreshCdbJsonFiles --directory="/srv/mediawiki-staging/php-1.27.0-wmf.10/cache/l10n" --threads=10 ' returned non-zero exit status 255 (duration: 03m 31s)
  • 01:03 logmsgbot: ebernhardson@mira Started scap: Add Cookie statement link to footer of all WMF wikis per legal
  • 00:31 logmsgbot: ebernhardson@mira scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="cawikibooks" --outdir="/tmp/scap_l10n_1684485672" --threads=10 --quiet' returned non-zero exit status 255 (duration: 02m 35s)
  • 00:30 mobrovac: restbase deploy end of c3bd864
  • 00:29 logmsgbot: ebernhardson@mira Started scap: Add Cookie statement link to footer of all WMF wikis per legal
  • 00:26 logmsgbot: ebernhardson@mira Synchronized wmf-config/logging.php: Revert "monolog: Ensure that context data added by WebProcessor is utf-8 safe" (duration: 01m 27s)
  • 00:23 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-production.php: Move morelike query load back to eqiad to allow load testing on codfw (duration: 01m 38s)

2016-02-01

  • 23:51 mobrovac: restbase deploy start of c3bd864 on canary rb1001
  • 19:28 logmsgbot: ori@mira Synchronized docroot/wikipedia.org/speed-tests: I5b48a491390: Speed trials: add preconnect (duration: 01m 27s)
  • 18:54 bblack: banned obj.http.Content-Length == 13817 on all cache_text
  • 18:54 mutante: LDAP - added elukey to "ops" group
  • 18:11 mutante: planet1001 - rebooting for upgrade
  • 17:54 hoo: restarted hhvm on mw1253
  • 17:06 logmsgbot: thcipriani@mira Synchronized wmf-config: SWAT: Use extension registration for Graph gerrit:266433 (duration: 01m 29s)
  • 16:59 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable SandboxLink on or.wikipedia.org gerrit:267194 (duration: 01m 31s)
  • 16:54 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable WikidataPageBanner on es.wikivoyage gerrit:267195 (duration: 01m 29s)
  • 16:52 _joe_: restarted pybal on lvs1001
  • 16:47 _joe_: installing the new HHVM package to the api appserver cluster in eqiad
  • 16:38 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Set WikidataPageBanner namespaces on fr.wikivoyage gerrit:266541 (duration: 01m 26s)
  • 16:32 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration on cu.wikipedia gerrit:265885 (duration: 01m 26s)
  • 16:26 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Centralise all VisualEditor feedback pages except for a few wikis gerrit:258206 (duration: 01m 30s)
  • 16:22 logmsgbot: thcipriani@mira Synchronized dblists/visualeditor-default.dblist: SWAT: Enable VisualEditor by default for some other wikis gerrit:264765 (duration: 01m 58s)
  • 16:05 ema: hhvm restarted on mw1072
  • 15:54 logmsgbot: krenair@mira Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 01m 52s)
  • 15:48 bblack: restarted pybal on lvs1004 (lvs1003 above was a bad log message!)
  • 15:42 bblack: restarted pybal on lvs1003
  • 15:13 bblack: cp3042 repooled
  • 15:10 ema: restarting hhvm on mw1057
  • 14:33 chasemp: labstore1002 cfg scheduling
  • 14:04 godog: set ms-be1019 swift weight to 4000
  • 13:33 moritzm: rolling reboot of xenon/cerium/praseodymium for kernel update (and updating to new openjdk-8)
  • 12:40 _joe_: depooling cp3042 from esams uploads
  • 12:15 _joe_: backing up tin homes before reimaging
  • 11:59 moritzm: rolling reboot of ms-be1016 to ms-be1021 for kernel update
  • 11:39 moritzm: uploaded openjdk-8 8u72-b15-1~bpo8+1 for jessie-wikimedia to carbon
  • 11:34 moritzm: uploaded openssl 1.0.2f for jessie-wikimedia to carbon
  • 11:19 godog: repool restbase1007
  • 10:32 godog: reboot ms-be1010, xfs
  • 10:27 jynus: partitioning revision and logging for db2037 and db2044 (s4)
  • 00:04 logmsgbot: tstarling@mira Synchronized php-1.27.0-wmf.11/includes: (no message) (duration: 01m 31s)

2016-01-31

  • 23:58 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/VisualEditor/extension.json: https://gerrit.wikimedia.org/r/#/c/267617/ (duration: 01m 28s)
  • 22:31 ori: restarted parsoid-rt-client.service
  • 22:14 ori: Updated parsoid on ruthenium and restarted parsoid-rt-client on ruthenium, per subbu's request.
  • 22:03 bd808: backfilled missing data in https://tools.wmflabs.org/sal/production from https://wikitech.wikimedia.org/wiki/Server_Admin_Log
  • 21:37 bd808: https://tools.wmflabs.org/sal/production missing data from 2016-01-30 until now
  • 21:33 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.10/includes/jobqueue/jobs/HTMLCacheUpdateJob.php: Live-hacked wfDebugLog() call for T124418 (duration: 01m 31s)
  • 16:01 tgr: changed wikiversions.php on mw1017 to serve wmf.10 for SessionManager-related debugging
  • 05:35 legoktm: restarted extensions/CentralAuth/maintenance/resetGlobalUserTokens.php
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Jan 31 02:32:12 UTC 2016 (duration 7m 11s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 10m 14s)

2016-01-30

  • 23:20 logmsgbot: bd808@mira rebuilt wikiversions.php and synchronized wikiversions files: Revert all wikis to 1.27.0-wmf.10 (again)
  • 23:01 logmsgbot: bd808@mira Synchronized wmf-config/InitialiseSettings.php: Revert Enable debug level session logging to fluorine (17bfb06) (duration: 01m 28s)
  • 22:36 logmsgbot: bd808@mira Synchronized wmf-config/InitialiseSettings.php: Enable debug level session logging to fluorine (5ac9412) (duration: 01m 26s)
  • 18:43 _joe_: updated visualdiff, restarted parsoid-vd
  • 13:00 godog: discard preserved cache on ms-be2003, powercycle
  • 03:40 Krenair: Deleted old /srv/mediawiki/php-1.27.0-wmf.[1-5] directories across the cluster to match the deployment tree, T124567
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 30 02:31:56 UTC 2016 (duration 7m 2s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 10m 24s)
  • 00:08 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionBackend.php: Remove proposed fix for T125267 (duration: 01m 33s)

2016-01-29

  • 23:53 jynus: restarted db1018 replication (and its codfw slaves) after a (somewhat) failed maintenance
  • 23:41 mutante: ruthenium - restart parsoid-rt-client, parsoid-vd-client
  • 23:37 mutante: ruthenium - git pull origin in /srv/visualdiff/
  • 23:22 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionBackend.php: Testing proposed fix for T125267 (duration: 01m 26s)
  • 22:52 jynus: powercycling cp3042 to test it is really the broken one
  • 22:37 jynus: powercycle cp3049, not 42
  • 22:37 jynus: powercycle cp3042
  • 22:27 mutante: cp3042 - md0: unknown partition table
  • 22:23 mutante: powercycled cp1049
  • 22:06 mutante: powercycle cp3049
  • 21:13 mutante: bromine - stop and remove rsync service
  • 20:16 logmsgbot: aaron@mira Synchronized wmf-config/CommonSettings.php: Use the logical redis definition for GettingStarted (duration: 01m 26s)
  • 19:36 jynus: reinstall db1018
  • 18:11 jynus: creating special partitioning for db2037 and db2044 (ETA:5 days, lag)
  • 18:01 jynus: creating special partitioning for db2034 and db2042 (ETA:5 days, lag)
  • 17:51 logmsgbot: bd808@mira Synchronized wmf-config/InitialiseSettings.php: Stop the first survey in fawiki and eswiki (f89621d) (duration: 01m 25s)
  • 17:44 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/includes/api/ApiMain.php: Log user-agents that are using HTTP when HTTPS is preferred (55ac0b7) (duration: 01m 26s)
  • 17:41 logmsgbot: bd808@mira Synchronized wmf-config/CommonSettings.php: Grant autocreateaccount to anons on loginwiki (d916008) (duration: 01m 27s)
  • 17:39 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: CentralAuth: Take auto-creation into account (f526ef1) (duration: 01m 28s)
  • 17:35 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionBackend.php: SessionManager: Save user name to metadata even if the user doesn't exist locally (a39b4ac) (duration: 01m 29s)
  • 17:01 jynus: restarting mysql at db1018
  • 16:50 robh: parsoid-vd restart was due to subbu irc request (i wasnt just randomly restarting things ;)
  • 16:47 robh: restarting parsoid-vd & parsoid-vd-client on ruthenium
  • 16:33 ottomata: uinstalling impala in analytics cluster
  • 15:45 bblack: upgrade packages (incl kernel) on eqiad caches hosts (cp1xxx)
  • 15:37 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1018 for maintenance (duration: 01m 49s)
  • 15:32 akosiaris: remove all networking configuration from asw-b-eqiad switch for nas1001-a, nas1001-b. Leave just descriptions
  • 15:21 bblack: upgrading packages (incl kernel) on esams cache hosts (cp3xxx) (codfw, ulsfo already done)
  • 15:11 akosiaris: powering off nas1001-a.eqiad.wmnet. https://phabricator.wikimedia.org/T124156
  • 15:08 akosiaris: powering off nas1001-b.eqiad.wmnet. https://phabricator.wikimedia.org/T124156
  • 15:01 elukey: re-enabled puppet on analytics1027
  • 14:39 elukey: stopped kafka (service) on kafka1012 (the host that caused the outage)
  • 14:24 moritzm: rebooting bohrium for kernel update
  • 14:04 _joe_: installing the new hhvm package on all the codfw appserver
  • 13:43 _joe_: installing the new HHVM package to the canary appservers (main and api)
  • 12:30 paravoid: force-rebooting pollux
  • 11:43 _joe_: uploaded hhvm_3.6.5+dfsg1-1+wm8 to trusty-wikimedia
  • 11:22 moritzm: rolling restart of swift in codfw
  • 11:14 elukey: disabled puppet on analytics1027 due to issues with Camus and HDFS
  • 10:17 moritzm: rolling restart of swift in esams
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Jan 29 02:32:56 UTC 2016 (duration 7m 28s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 10m 40s)
  • 01:31 logmsgbot: ori@mira Synchronized wmf-config: I83da57cf: Enable persistent redis connections for job runners (duration: 01m 11s)
  • 01:03 logmsgbot: krenair@mira Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/267186/ (duration: 01m 09s)
  • 01:01 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/265292/ (duration: 01m 14s)
  • 00:57 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267071/ (duration: 01m 11s)
  • 00:53 logmsgbot: krenair@mira Synchronized wmf-config/CirrusSearch-production.php: https://gerrit.wikimedia.org/r/#/c/266995/ (duration: 01m 11s)
  • 00:50 yurik: synced latest graphoid
  • 00:49 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/MobileFrontend/resources/skins.minerva.editor/init.js: https://gerrit.wikimedia.org/r/#/c/267168/ (duration: 01m 12s)
  • 00:45 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267053/ (duration: 01m 10s)
  • 00:43 logmsgbot: krenair@mira Synchronized wmf-config/CirrusSearch-common.php: https://gerrit.wikimedia.org/r/#/c/267053/ (duration: 01m 10s)
  • 00:42 logmsgbot: krenair@mira Synchronized tests/cirrusTest.php: https://gerrit.wikimedia.org/r/#/c/267053/ (duration: 01m 11s)
  • 00:35 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267025/ (duration: 01m 12s)
  • 00:25 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/Graph/modules/graph2.js: https://gerrit.wikimedia.org/r/#/c/267065/ (duration: 01m 11s)
  • 00:17 logmsgbot: krenair@mira Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/267060/ (duration: 01m 12s)
  • 00:02 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267189/2 (duration: 01m 11s)

2016-01-28

  • 23:51 mutante: caesium - stop puppet, shutdown server, remove from icinga, clean puppet cert ...
  • 23:46 Tim: on ruthenium installing build dependencies and compiling uprightdiff for test
  • 23:20 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.11/includes/api/ApiStashEdit.php: Ia4196eba9: Add ParserOutputStashForEdit hook for extension cache warming (duration: 01m 10s)
  • 23:17 logmsgbot: tgr@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionManager.php: T125161 (duration: 01m 11s)
  • 22:58 ottomata: restoring MobileWebSectionUsage_14321266 from db1047 to dbstore1002 using mysqlimport
  • 22:23 bblack: starting cache_mobile->cache_text conversion in eqiad - https://phabricator.wikimedia.org/T109286
  • 22:09 bblack: eqiad pybal->etcd conversion done
  • 22:01 logmsgbot: dduvall@mira Synchronized php-1.27.0-wmf.11/extensions/WikimediaEvents/WikimediaEventsHooks.php: deploying fix for T125151 (duration: 01m 15s)
  • 21:59 mutante: releases.wm.org - switched backend to bromine
  • 21:58 bblack: converting active eqiad LVS/pybal to etcd
  • 21:56 mutante: caesium - stopped apache
  • 21:31 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.11/extensions/AbuseFilter: I13fcc3ce4: Updated mediawiki/core Project: mediawiki/extensions/AbuseFilter 19baa3b6e51b8fe6baf6e3ce7e590060e8e6eec9 (duration: 01m 11s)
  • 21:27 bblack: converting backup/inactive eqiad LVS/pybal to etcd
  • 21:16 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.11
  • 20:54 mutante: sca1001 - stop mathoid,graphoid,citoid
  • 20:52 mutante: sca1002 - stop mathoid,graphoid,citoid
  • 20:50 logmsgbot: dduvall@mira Synchronized php-1.27.0-wmf.11: syncing 1.27.0-wmf.11 for T125114 and https://gerrit.wikimedia.org/r/#/c/267128/ (duration: 03m 30s)
  • 20:25 bblack: depool -> reboot cp4008 (ulsfo text, trying new kernel with live traffic)
  • 20:00 bblack: depool -> reboot cp4011 (ulsfo mobile, currently unused for traffic - testing local conftool-scripts depool + new kernel)
  • 19:55 logmsgbot: ori@mira Synchronized wmf-config: Iea2573ccfbe: Revert "Autopromotion: remove deprecated onView event, fix INGROUPS" (duration: 02m 13s)
  • 19:43 ori: added tgr and marxarelli to security group on phab
  • 19:26 ottomata: kafka preferred-replica-election to rebalanace analytics-eqiad brokers
  • 18:22 elukey: rebooting analytics1001 for new kernel upgrade
  • 18:21 yurik: deployed graphoid
  • 17:43 elukey: rebooting analytics1002.eqiad.wmnet (Hadoop master's slave) for kernel upgrade
  • 17:39 urandom: finished deploying configuration change (https://gerrit.wikimedia.org/r/266299) to restbase staging
  • 17:38 robh: neglected to log i ifinished icinga/neon updates and its back to normal service (never interrrupted)
  • 17:38 urandom: restarting restbase on restbase200[1-3].codfw.wmnet (restbase staging)
  • 17:34 urandom: forcing puppet run on restbase200[1-3].codfw.wmnet (restbase staging)
  • 17:30 urandom: forcing puppet run on praseodymium.eqiad.wmnet, and restarting restbase (staging env)
  • 17:27 urandom: restarting restbase on xenon.eqiad.wmnet (restbase staging)
  • 17:25 urandom: forcing puppet run on xenon.eqiad.wmnet (restbase staging)
  • 17:21 urandom: restarting restbase on cerium.eqiad.wmnet
  • 17:18 urandom: forcing puppet run on cerium.eqiad.wmnet (restbase staging)
  • 17:18 robh: pushing icinga updates (shouldnt affect service but others shouldnt also try to update neon right now)
  • 17:17 logmsgbot: krenair@mira Synchronized README: testing (duration: 02m 08s)
  • 17:15 urandom: disabling pupplet on restbase staging hosts
  • 17:01 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266957/ (duration: 02m 15s)
  • 16:52 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267040/ (duration: 02m 13s)
  • 16:48 cmjohnson1: mw1172, mw1178,mw1217, mw1257 powering off task# T124642
  • 16:45 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264219/ (duration: 02m 12s)
  • 16:42 logmsgbot: krenair@mira Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/264219/ (duration: 02m 12s)
  • 16:37 Krenair: Downloaded and `chmod +x`'d mira:/srv/mediawiki-staging/.git/hooks/commit-msg
  • 16:29 mdholloway: mobileapps deployed 7583148, reverting in part 869ec35
  • 16:25 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: rv (duration: 02m 10s)
  • 16:25 bblack: upgrading packages (incl kernel) on all codfw caches
  • 16:19 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266955/ (duration: 02m 14s)
  • 16:13 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266564/ (duration: 02m 12s)
  • 16:05 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264733/ (duration: 02m 11s)
  • 15:39 bblack: kafka1012 booted up normally
  • 15:39 mdholloway: mobileapps deployed 869ec35
  • 15:37 bblack: rebooting kafka1012
  • 15:36 bblack: kafka1012: manually edited fstab, s/sdb1/sdb3/, s/sdc3/sdc1/, and now the filesystems mount and data looks right
  • 15:23 bblack: powering up kafka1012
  • 14:09 moritzm: rebooting serpens/seaborgium for kernel update
  • 13:58 logmsgbot: faidon@mira Synchronized wmf-config/InitialiseSettings.php: depool kafka1012 (duration: 02m 10s)
  • 13:31 bblack: citoid and cxserver public hostnames moving to cache_text
  • 12:59 moritzm: rebooting rutherfordium (peopleweb) for kernel update
  • 12:53 elukey: stopping kafka on kafka1012 + host reboot for kernel upgrade
  • 12:23 jynus: generating empty schema for new codfw parsercaches
  • 12:14 logmsgbot: jynus@mira Synchronized wmf-config/db-codfw.php: New parsercache servers for codfw datacenter (duration: 03m 10s)
  • 12:11 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: New parsercache servers for codfw datacenter (duration: 02m 15s)
  • 12:07 jynus: pooling new parsercaches for codfw datacenter
  • 12:01 moritzm: powercycled mw1163, was unreachable after reboot of the jobrunners (but now up again after powercycle via mgmt)
  • 11:31 elukey: disabled puppet on analytics1027 due to some issues with camus and hdfs
  • 10:42 moritzm: rebooted parsoid systems in codfw for kernel update, rolling reboot for eqiad
  • 10:39 _joe_: rolling reboot of jobrunners in eqiad
  • 02:46 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 06m 16s)
  • 02:41 logmsgbot: tgr@mira Synchronized php-1.27.0-wmf.11/includes/: deploy SessionManager patch for T124971: gerrit 266944, 266946 (duration: 03m 20s)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 10m 21s)
  • 01:03 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264460/ (duration: 02m 30s)
  • 00:58 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264066/ (duration: 02m 26s)
  • 00:46 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/Gather/resources: https://gerrit.wikimedia.org/r/#/c/266793/ and https://gerrit.wikimedia.org/r/#/c/266792/ (duration: 02m 23s)
  • 00:41 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/266939/ (duration: 02m 27s)
  • 00:27 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.10/extensions/Flow/includes: https://gerrit.wikimedia.org/r/#/c/266938/ (duration: 02m 29s)
  • 00:09 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/266945/ (duration: 02m 36s)

2016-01-27

  • 22:36 robh: restarting parsoid-rt-client service on ruthenium
  • 22:29 ottomata: starting mysqldump of MobileWebSectionUsage_14321266 from db1047 into m4-master
  • 21:45 yurik: updated graphoid on scb*
  • 21:29 mdholloway: mobileapps deployed 6f35859
  • 21:26 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf
  • 21:26 logmsgbot: ori@mira Synchronized docroot and w: (no message) (duration: 02m 26s)
  • 19:48 YuviPanda: started nfs-exports daemon on labstore1001, had been dead for a few days
  • 19:32 mutante: stat1002 - redis.exceptions.ConnectionError: Error connecting to mira.codfw.wmnet:6379. timed out.
  • 19:31 mutante: stat1002 - running puppet, was reported as last run about 4 hours ago but not deactivated
  • 19:14 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.11
  • 19:07 ejegg: set donation queue consumer time limit back to 90 sec
  • 18:49 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool pc1006 after cloning (duration: 02m 25s)
  • 18:48 bd808: HHVM on mw1019 still dying on a regular basis with "Lost parent, LightProcess exiting"
  • 18:00 csteipp: deploy patch for T103239
  • 17:50 csteipp: deploy patch for T97157
  • 17:47 jynus: migrating ruthenium parsoid-test database to m5-master
  • 17:27 elukey: rebooting analytics105* hosts to upgrade their kernel
  • 17:16 elukey: rebooting analytics1035.eqiad.wmnet for kernel upgrade
  • 16:23 ejegg: updated SmashPig from 072c7ec6ed94e7074ba35b7986d5dde94866fe2f to 97629339994bffe8831a9067f5e9c21fa423586b
  • 16:22 logmsgbot: thcipriani@mira Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/CentralAuthUtils.php: SWAT: Preserve certain keys when updating central session gerrit:266672 (duration: 02m 28s)
  • 16:11 logmsgbot: thcipriani@mira Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: SWAT: Avoid forceHTTPS cookie flapping if core and CA are setting the same cookie gerrit:266671 (duration: 02m 26s)
  • 16:03 elukey: rebooting analytics 1043 -> 1050 for kernel upgrade.
  • 15:47 elukey: rebooting analytics 1026, 1040 -> 1042 due to kernel upgrade.
  • 14:58 jynus: cloning persercache contents from pc1003 to pc1006
  • 14:45 elukey: rebooting analytics 1036 to 1039 for kernel upgrade
  • 14:35 elukey: analytics 1035 hasn't been rebooted because it is a Hadoop Journal Node (will be restarted in the end)
  • 14:04 elukey: rebooting analytics 1032 to 1035 for kernel upgrades
  • 14:03 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool pc1003 for cloning to pc1006 (duration: 02m 30s)
  • 13:59 jynus: about to going new hardware/OS/mariadb-only for parsercache service
  • 13:32 elukey: rebooting analytics1030/1031 for kernel upgrade
  • 13:15 akosiaris: rebooting fermium for kernel upgrades
  • 13:10 elukey: rebooting analytics1029 for kernel upgrade
  • 12:29 moritzm: rebooting analytics1028 for kernel update
  • 10:25 ema: restarting apache2 and hhvm on mw1119
  • 03:19 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-production.php: Correct invalid cirrus shard configuration (duration: 02m 59s)
  • 02:55 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Jan 27 02:55:21 UTC 2016 (duration 7m 13s)
  • 02:48 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 10m 25s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 51s)
  • 01:59 logmsgbot: ori@mira Synchronized docroot and w: Icc4f6134b0: Add a speed experiment which inlines the top stylesheet (duration: 02m 28s)
  • 01:29 MaxSem: on terbium: ran mwscript namespaceDupes.php --wiki=wuuwiki --source-pseudo-namespace= --add-suffix=/renamed --fix
  • 01:26 MaxSem: Fail, trying something else...
  • 01:21 MaxSem: running mwscript namespaceDupes.php --wiki=wuuwiki --move-talk --fix
  • 00:52 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266497/ (duration: 02m 26s)
  • 00:48 logmsgbot: krenair@mira Synchronized w/static/images/project-logos/ukwikinews.png: https://gerrit.wikimedia.org/r/#/c/266497/ (duration: 02m 29s)
  • 00:44 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266161/ (duration: 02m 27s)
  • 00:15 logmsgbot: ebernhardson@mira Synchronized php-1.27.0-wmf.11/extensions/CirrusSearch/: Allow pointing morelike queries at a specific datacenter (duration: 03m 04s)
  • 00:10 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-production.php: point morelike queries back at the eqiad cluster (duration: 05m 41s)
  • 00:02 chasemp: enable puppet and codify the 192 thread count for nfsd

2016-01-26

  • 22:25 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.11, for real this time
  • 22:17 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.11
  • 22:15 logmsgbot: dduvall@mira Synchronized php-1.27.0-wmf.11: syncing wmf.11 backports of session fixes (duration: 03m 55s)
  • 21:55 logmsgbot: ori@mira Synchronized docroot and w: I9b054d847a: New set of speed experiments (duration: 01m 29s)
  • 21:41 marxarelli: filed https://phabricator.wikimedia.org/T124828 for fatal in extensions/Echo
  • 21:22 marxarelli: Fatal error: Cannot redeclare class CallbackFilterIterator in /srv/mediawiki-staging/php-1.27.0-wmf.11/extensions/Echo/includes/iterator/CallbackFilterIterator.php on line 24
  • 21:21 marxarelli: lint error found when running sync-dir 'Errors parsing /srv/mediawiki-staging/php-1.27.0-wmf.11/extensions/Echo/includes/iterator/CallbackFilterIterator.php'
  • 21:11 marxarelli: sync-dir php linting failed
  • 21:02 marxarelli: resuming sync-dir and ignoring error as a known issue
  • 20:59 marxarelli: getting 'Lost parent, LightProcess exiting' when running sync-dir
  • 20:57 chasemp: drop labstore1001 nfs threads down to 192
  • 20:46 chasemp: stopping nfs on labstore1001
  • 20:46 marxarelli: modified wikiversions.php locally on mw1017 to promote all wikis to wmf.11 for initial testing
  • 20:18 marxarelli: locally modified wikiversions.php and wikiversions.json on mw1017 for testing
  • 20:14 marxarelli: running 'sync-common --verbose deployment.eqiad.wmnet' on mw1017 to sync wmf.11 for initial testing
  • 20:02 marxarelli: proceeding with train deploy. wmf.11 to mw1017, then group0
  • 19:46 akosiaris: issuing a varnish ban on all esams mobile frontend varnish for req.http.host .*wikimedia.org
  • 19:45 akosiaris: issuing a varnish ban on all esams mobile backend varnish for req.http.host .*wikimedia.org
  • 19:44 akosiaris: issuing a varnish ban on all ulsfo mobile frontend varnish for req.http.host .*wikimedia.org
  • 19:44 akosiaris: issuing a varnish ban on all ulsfo mobile backend varnish for req.http.host .*wikimedia.org
  • 19:43 akosiaris: issuing a varnish ban on all codfw mobile frontend varnish for req.http.host .*wikimedia.org
  • 19:36 akosiaris: issuing a varnish ban on all codfw mobile backend varnish for req.http.host .*wikimedia.org
  • 19:36 akosiaris: issuing a varnish ban on all eqiad mobile frontend varnish for req.http.host .*wikimedia.org
  • 19:36 akosiaris: issuing a varnish ban on all eqiad mobile backend varnish for req.http.host .*wikimedia.org
  • 19:36 akosiaris: all of the above referred to cache_text
  • 19:29 akosiaris: all of the above already done, back logging
  • 19:29 akosiaris: issuing a varnish ban on all esams frontend varnish for req.http.host .*wikimedia.org
  • 19:29 akosiaris: issuing a varnish ban on all esams backend varnish for req.http.host .*wikimedia.org
  • 19:29 akosiaris: issuing a varnish ban on all ulsfo backend varnish for req.http.host .*wikimedia.org
  • 19:29 akosiaris: issuing a varnish ban on all ulsfo frontend varnish for req.http.host .*wikimedia.org
  • 19:28 akosiaris: issuing a varnish ban on all ulsfo backend varnish for req.http.host .*wikimedia.org
  • 19:28 akosiaris: issuing a varnish ban on all codfw frontend varnish for req.http.host .*wikimedia.org
  • 19:28 akosiaris: issuing a varnish ban on all codfw backend varnish for req.http.host .*wikimedia.org
  • 19:28 akosiaris: issuing a varnish ban on all eqiad frontend varnish for req.http.host .*wikimedia.org
  • 19:14 akosiaris: issuing a varnish ban on all eqiad backend varnish for req.http.host .*wikimedia.org
  • 19:02 marxarelli: backports to wmf.11 ready on mira but delaying train due to wikimedia.org outage
  • 18:44 _joe_: running salt --batch-size=20 -C 'G@luster:appserver and G@site:eqiad' cmd.run 'puppet agent -t --tags mw-apache-config'
  • 18:27 robh: i broke icinga, but then i fixed it, icinga back to normal.
  • 18:21 robh: icinga is broken, it seems it was from a change before mine, but my forced reload broke it
  • 18:18 legoktm: running mwscript updateArticleCount.php --wiki=jawiki --update=1
  • 18:14 cmjohnson1: starting puppet on mw cluster
  • 18:14 robh: i broke icinga, fixing
  • 18:08 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Pool new parsercache pc1005 after cloning it from pc1002 (duration: 01m 28s)
  • 17:43 thcipriani: ltwiki collation updated 503623 rows processed
  • 17:35 mutante: mw1258 - restart hhvm
  • 17:20 cmjohnson: disabling puppet on mw cluster
  • 17:02 thcipriani: running updateCollation on ltwiki
  • 17:01 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Set category collation to uca-lt on lt.wikipedia gerrit:266427 (duration: 01m 33s)
  • 16:55 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration on ur.wikipedia gerrit:265888 (duration: 07m 10s)
  • 16:36 logmsgbot: thcipriani@mira Synchronized w/static/images/project-logos/etwikiquote.png: SWAT: Update et.wikiquote logo gerrit:265623 (duration: 01m 27s)
  • 16:31 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable SandboxLink on nl.wikiquote gerrit:265666 (duration: 01m 26s)
  • 16:26 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespaces configuration on sk.wikipedia gerrit:265896 (duration: 01m 27s)
  • 16:19 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove Tranwiki namespace on wuu.wikipedia gerrit:265892 and Add Portal namespace on wuu.wikipedia gerrit:265893 (duration: 01m 27s)
  • 16:12 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration for wuu.wikipedia gerrit:265891 (duration: 01m 29s)
  • 14:57 ema: Finished migration of mobile traffic to text cluster in esams https://phabricator.wikimedia.org/T109286
  • 14:48 chasemp: RPS on eth0 on labstores
  • 14:39 bblack: upgrading packages (incl kernel) on all ulsfo caches (cp4xxx)
  • 14:21 akosiaris: migrating alsafi,mx2001 back to 2004 for testing
  • 14:14 akosiaris: migrate alsafi,mx2001 back from ganeti2004 to fix a network misconfiguration
  • 13:32 moritzm: rebooted nescio/maerlant for kernel update
  • 13:14 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool pc1002 for maintenance (clone to pc1005) (duration: 01m 39s)
  • 12:39 akosiaris: rolling reboot of ganeti200{1,2,3,4,5,6}.codfw.wmnet for kernel upgrade
  • 12:10 moritzm: rebooting mx2001/mx1001 (with a delay in between) for kernel update
  • 11:50 moritzm: rebooting etherpad1001 for kernel update
  • 11:46 moritzm: rebooting bromine for kernel update
  • 10:50 ema: Starting migration of mobile traffic to text cluster in esams https://phabricator.wikimedia.org/T109286
  • 09:30 hashar: restarting Jenkins to upgrade the gearman plugin with https://review.openstack.org/#/c/271543/
  • 09:28 _joe_: finishing reboots of appservers in eqiad
  • 04:27 legoktm: restarted resetGlobalUserTokens.php after it lost mysql connection again
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 26 02:30:58 UTC 2016 (duration 7m 0s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 36s)
  • 01:45 logmsgbot: krenair@mira Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/266453/ (duration: 01m 27s)
  • 00:45 mobrovac: mobileapps deploying c2318b6
  • 00:40 logmsgbot: ebernhardson@mira Synchronized wmf-config/CommonSettings.php: (no message) (duration: 01m 25s)
  • 00:37 logmsgbot: ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT bd808 (duration: 01m 34s)
  • 00:32 logmsgbot: ebernhardson@mira Synchronized portals/: SWAT jgirault (duration: 01m 28s)
  • 00:29 logmsgbot: ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT ebernhardson (duration: 01m 26s)
  • 00:27 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-common.php: SWAT ebernhardson (duration: 01m 26s)
  • 00:25 logmsgbot: ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT ebernhardson (duration: 01m 27s)
  • 00:15 logmsgbot: ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT AaronSchulz (duration: 01m 26s)
  • 00:13 logmsgbot: ebernhardson@mira Synchronized wmf-config/filebackend-production.php: SWAT AaronSchulz (duration: 01m 26s)
  • 00:10 logmsgbot: ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT James_F (duration: 01m 26s)
  • 00:08 logmsgbot: ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT James_F (duration: 01m 35s)

2016-01-25

  • 23:14 logmsgbot: legoktm@mira Synchronized php-1.27.0-wmf.10/includes/parser/: live hacks, now committed (duration: 01m 27s)
  • 23:07 logmsgbot: legoktm@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266410/ (duration: 01m 35s)
  • 22:52 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.10/includes/parser/ParserOutput.php: Fix-up for ParserOutput.php@263 debug logging (duration: 01m 27s)
  • 22:30 logmsgbot: legoktm@mira Synchronized php-1.27.0-wmf.10/includes/parser/: https://gerrit.wikimedia.org/r/#/c/266401/ + https://gerrit.wikimedia.org/r/#/c/266406/ + live hacks (duration: 01m 28s)
  • 22:28 logmsgbot: legoktm@mira Synchronized php-1.27.0-wmf.10/includes/content/WikitextContent.php: https://gerrit.wikimedia.org/r/#/c/266401/ (duration: 01m 29s)
  • 21:53 logmsgbot: hoo@mira Synchronized wmf-config/Wikibase-production.php: Disable (not yet deployed) commons category sidebar link overwrite in production (duration: 01m 28s)
  • 21:47 mutante: nitrogen - shutdown -h now ....
  • 21:45 mutante: alsafi - was reported down in icinga , is ganeti VM - fixed by just logging in as if it went to hibernate
  • 21:37 mdholloway: mobileapps deployed 9252a22
  • 21:30 mutante: nitrogen - stop puppet, stop salt, remove from stored configs / icinga
  • 20:19 logmsgbot: hoo@mira Synchronized wmf-config/Wikibase-labs.php: (no message) (duration: 01m 28s)
  • 20:14 chasemp: bump labstore nfs threads to 288 from 244
  • 19:32 paravoid: eqiad: removing static routes for 6to4/Teredo to nitrogen (decommissioning our own relays)
  • 19:10 bd808: Live hacking on mw1017 to debug 1.27.0-wmf.11 issues. All wikis there currently set to use 1.27.0-wmf.11.
  • 19:05 chasemp: labstore1001 temp change to CFQ scheduler on 01/22/2015
  • 19:04 chasemp: the nfsd thread change is on labstore1001
  • 19:04 chasemp: nfsd has 224 threads atm and was bumped up over the weekend
  • 18:58 ori: removed unused wikiversions.cdb on mira and tin
  • 18:28 jynus: retroactively logging the depool of mw1217, mw1178 and mw1257 3 hours ago (Jan 25 15:45:26)
  • 16:49 ema: Finished migration of mobile traffic to text cluster in ulsfo https://phabricator.wikimedia.org/T109286
  • 16:38 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Preparing ips for new parsercache deployments (third try) (duration: 01m 35s)
  • 16:26 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Preparing ips for new parsercache deployments (second try after running puppet) (duration: 03m 23s)
  • 16:25 _joe_: restarting salt-minion on all deployment targets
  • 16:24 _joe_: running salt deploy.fixurl on all deployment targets
  • 16:09 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Preparing ips for new parsercache deployments (duration: 03m 32s)
  • 15:51 ejegg: updated DjangoBannerStats from a64fe0e373a978d3df0b7f1dd74ac4cc5c78d34e to 71df14d4d8b11f3ca0ef1eeb6c6e2db9be79103a
  • 15:35 ema: Starting migration of mobile traffic to text cluster in ulsfo https://phabricator.wikimedia.org/T109286
  • 15:14 chasemp: restart of pdns and pdns-recursor on labservices1001
  • 14:56 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: deploy new parsercache hardware (pc1004) substituting pc1001 (duration: 03m 25s)
  • 13:16 elukey: ran kafka preferred-replica-election on kafka1022 to balance the leaders
  • 13:07 elukey: restarting kafka on kafka1022
  • 12:57 elukey: restarting kafka on kafka1013
  • 12:38 elukey: restarting kafka on kafka1014
  • 12:20 jynus: compressed and truncated iridium's phab daemons.log - it was taking 20% of disk space
  • 12:04 ema: restarting kafka on kafka1018
  • 11:26 jynus: stopping mysql at pc1001 and cloning to pc1004
  • 10:55 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool pc1001 for maintenance (clone to pc1004) (duration: 01m 41s)
  • 10:11 _joe_: switching the active deployment host to mira
  • 09:56 ema: limiting GCLogFileSize and restarting kafka on kafka1012
  • 09:31 _joe_: rolling reboot of the eqiad appserver cluster
  • 09:27 moritzm: installed fuse security update on labnodepool1001 (the other fuse installations are on Ubuntu, which doesn't ship the udev rule, but uses mountall instead)
  • 07:47 paravoid: stat1002: umount -f /mnt/hdfs
  • 07:34 _joe_: rebooting alsafi, unresponsive to ssh
  • 07:24 _joe_: restarting hhvm on mw1148, stuck in HPHP::Treadmill::startRequest (__lll_lock_wait)
  • 07:23 _joe_: restarting hhvm on mw1143, stuck into HPHP::SynchronizableMulti::waitImpl (__pthread_cond_wait)
  • 03:10 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.10/includes/parser/ParserCache.php: (no message) (duration: 00m 25s)
  • 03:03 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.10/includes/parser/ParserCache.php: (no message) (duration: 00m 25s)
  • 03:02 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.10/includes/parser/ParserOutput.php: (no message) (duration: 00m 27s)
  • 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Jan 25 02:30:13 UTC 2016 (duration 6m 52s)
  • 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 09s)

2016-01-24

  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Jan 24 02:31:21 UTC 2016 (duration 6m 58s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 11s)

2016-01-23

  • 19:03 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: config change to repoint morelike search from eqiad to codfw (duration: 00m 26s)
  • 19:02 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.10/extensions/CirrusSearch/: Support code for repointing morelike queries from eqiad to codfw (duration: 00m 30s)
  • 19:00 ebernhardson: repoint most expensive search queries (morelike) at codfw cluster to reduce load. 1/2 of eqiad cluster maxed on cpu
  • 16:47 Krinkle: mwscript deleteEqualMessages.php --wiki wowiki
  • 13:25 jynus: upgrading and restarting db1046
  • 13:13 jynus: db1046 maintenance finished- restarting mysql to apply latest configuration
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 23 02:32:15 UTC 2016 (duration 7m 3s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 09s)
  • 01:33 logmsgbot: bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Back to 1.27.0-wmf10 again after fixking l10n cache problems
  • 01:28 logmsgbot: bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Temporarily back to 1.27.0-wmf11; need to rebuild l10n cache
  • 01:16 logmsgbot: bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Revert all wikis to 1.27.0-wmf.10
  • 00:08 logmsgbot: bd808@tin Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/265872/ (duration: 00m 25s)
  • 00:07 logmsgbot: bd808@tin Synchronized php-1.27.0-wmf.11/includes/session/CookieSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/265871/ (duration: 00m 25s)

2016-01-22

  • 23:43 logmsgbot: legoktm@tin Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/265870/ (duration: 00m 26s)
  • 23:42 logmsgbot: legoktm@tin Synchronized php-1.27.0-wmf.11/includes/session/CookieSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/265869/ (duration: 00m 26s)
  • 23:22 mobrovac: restbase cassandra truncating local_group_wiktionary_T_term_definition.data
  • 22:33 mdholloway: mobileapps deployed 2900faa
  • 22:23 logmsgbot: twentyafterfour@tin Finished scap: deploy https://gerrit.wikimedia.org/r/#/c/263415/ and clean up old branches (duration: 07m 02s)
  • 22:16 logmsgbot: twentyafterfour@tin Started scap: deploy https://gerrit.wikimedia.org/r/#/c/263415/ and clean up old branches
  • 22:06 bblack: upgrading vhtcpd on all caches
  • 22:05 eileen: upgrade Civicrm from b9ebf3d31aeab8120143cfbf6bc2df0f617341cf to c009af16944a6478bd0292422f5bb0151f7a22c1
  • 21:49 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.11/includes/: Fix T124468, for real this time (duration: 00m 36s)
  • 21:48 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.11/includes/: Fix T124468 (duration: 00m 38s)
  • 21:17 legoktm: running migrateAccount.php --attachbroken over list of all unattached users (T74791)
  • 20:04 mutante: ruthenium - rebooting for reinstall
  • 19:42 logmsgbot: aaron@tin Synchronized wmf-config/CommonSettings.php: Revert "Bump $wgJobBackoffThrottling to lower the htmlcacheupdate backlog" (duration: 00m 32s)
  • 18:51 jynus: "repairing" enwiki.oldtable on dbstore1001
  • 18:40 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Aborting pc1001 maintenance (duration: 00m 31s)
  • 18:15 legoktm: running CentralAuth's resetGlobalUserTokens.php to force session resets for all users T124440
  • 18:02 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.11/includes/user/User.php: Fix T124414 (duration: 00m 33s)
  • 17:53 legoktm: manually attaching User:Mower Genetics and User:Themeetingplace because they made edits somehow (T74791)
  • 17:46 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Stop logging the CirrusSearchRequests channel (duration: 00m 32s)
  • 17:44 legoktm: running migrateAccount.php --attachbroken over lists on T74791
  • 17:39 _joe_: removed an archived CirrusSearchRequests.log on fluorine, now we have enough room for the weekend
  • 17:29 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes: Fix T124406 (duration: 00m 35s)
  • 17:05 mobrovac: mobileapps deploying bba45456
  • 17:00 logmsgbot: reedy@tin Synchronized docroot and w: Extra noc symlinks (duration: 00m 32s)
  • 16:58 logmsgbot: jynus@tin Synchronized wmf-config/InitialiseSettings.php: monolog: reduce on-disk logging of DBPerformance to warning (duration: 00m 32s)
  • 16:47 jynus: truncating 100GB DBPerformance.log on fluorine, compressed backup available
  • 16:46 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: Fix T124409, part 2 (duration: 00m 32s)
  • 16:46 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.11/includes/session/SessionBackend.php: Fix T124409, part 1 (duration: 00m 33s)
  • 16:41 cmjohnson1: Troubleshooting mw1228
  • 16:36 _joe_: all api appservers in eqiad have been restarted
  • 16:21 ori: restarted statsv on hafnium
  • 15:53 ema: Finished migrating mobile traffic to text cluster in codfw (Mexico + green US states on this map https://phabricator.wikimedia.org/T114659)
  • 15:39 gwicke: aqs: increased compression block size on per-article table from 128k to 256k; expectation is to further increase compression ratio & reduce seeks on rotating disks
  • 15:22 Reedy: created translate tables on ruwikimedia T121766
  • 14:18 paravoid: cr1-eqord: turning up BGP with Zayo
  • 13:08 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.10/extensions/MobileFrontend: I08cdf37a1: Use TitleSquidURLs hook to purge mobile URLs directly (Bug: T124165) (duration: 00m 33s)
  • 13:05 logmsgbot: ori@tin Synchronized wmf-config/InitialiseSettings.php: If443f3c80: monolog: explicitly declare logstash as debug for sessions (duration: 00m 34s)
  • 12:31 ema: Starting migration of mobile traffic to text cluster https://phabricator.wikimedia.org/T109286
  • 11:35 logmsgbot: oblivian@tin Synchronized wmf-config/InitialiseSettings.php: Re-synching (duration: 00m 31s)
  • 11:25 logmsgbot: oblivian@tin Synchronized wmf-config/InitialiseSettings.php: Stop writing session logs to fluorine (duration: 01m 25s)
  • 11:17 bblack: codfw LVS under etcd/conftool control now, like ulsfo
  • 10:57 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool pc1001 for maintenance (duration: 02m 48s)
  • 10:45 _joe_: rolling restarting the API cluster in eqiad
  • 10:34 _joe_: rolling restart of all api appservers in eqiad
  • 10:07 _joe_: dropping api logs from 2015 on fluorine
  • 09:10 _joe_: rolling restart of imagescalers in eqiad
  • 08:48 _joe_: powercycling ms-be1002, blank console, down
  • 08:46 _joe_: rebooting mw1001 with a new kernel
  • 08:07 _joe_: upgrading kernel on all mw hosts in eqiad
  • 05:07 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.11/includes/parser/ParserCache.php: (no message) (duration: 01m 28s)
  • 02:42 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.11/includes/parser/ParserCache.php: (no message) (duration: 01m 28s)
  • 02:40 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.11/includes/OutputPage.php: (no message) (duration: 01m 32s)
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 09m 31s)
  • 01:44 logmsgbot: catrope@tin Finished scap: Deploying OATHAuth and WikimediaMessages i18n changes (duration: 30m 52s)
  • 01:37 gwicke: restbase cassandra: increased compression chunk size from 256 to 512k on wikimedia and wikipedia html and data-parsoid
  • 01:13 logmsgbot: catrope@tin Started scap: Deploying OATHAuth and WikimediaMessages i18n changes
  • 01:08 eileen: Updating CiviCRM from cb5e20c29d7376920c45eb5c343e6ee464217833 to to b9ebf3d31aeab8120143cfbf6bc2df0f617341cf
  • 00:19 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Add ability for OfficeWiki sysops to add and remove flood group rights from themselves. (duration: 01m 27s)
  • 00:14 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: enable EventBus extension on mediawikiwiki (duration: 01m 27s)
  • 00:10 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: enable sandboxlink on ladwiki and dont sent messages to autocreated accounts on metawiki (duration: 01m 27s)
  • 00:08 logmsgbot: ebernhardson@tin Synchronized wmf-config/throttle.php: Santiago Editatón throttle rule (duration: 01m 27s)
  • 00:02 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: configure cirrus completion suggester recycling (duration: 01m 29s)
  • 00:00 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: configure cirrus completion suggester recycling (duration: 01m 28s)

2016-01-21

  • 22:46 legoktm: started running migratePass0.php (CentralAuth) on group1 wikis
  • 22:24 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.11
  • 22:23 legoktm: started running migratePass0.php (CentralAuth) on group0 wikis
  • 21:35 ejegg: re-enabled low-level fundraising banner campaigns
  • 21:30 ejegg: reverted donatewiki maintenance message
  • 21:19 ejegg: updated paymentswiki from a7785baa7b40b442ecf0b60d47572502d0759780 to 1817327b4b0919ebe26bbd8b9d84fac1bd7ddb03
  • 21:13 andrewbogott: all reachable labs instances are now running security-patched kernels.
  • 21:12 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: cswiktionary to 1.27.0-wmf.11
  • 21:12 ejegg: disabled low-level fundraising banner campaigns
  • 21:12 andrewbogott: all labvirt10xx hosts are now running the latest utopic kernel
  • 21:09 ejegg: replaced form on donatewiki with maintenance notice
  • 21:08 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.11/includes/session/SessionManager.php: SessionManager: Notify AuthPlugin when auto-creating accounts gerrit:265578 (duration: 01m 26s)
  • 21:01 andrewbogott: rebooting labvirt1010
  • 20:51 andrewbogott: rebooting labvirt1009
  • 20:33 andrewbogott: rebooting labvirt1007
  • 20:33 logmsgbot: dduvall@tin Synchronized php-1.27.0-wmf.11/includes/user/BotPassword.php: deploy fix for T124335 (duration: 01m 29s)
  • 20:27 mobrovac: restbase deploy end of 79a4d27
  • 20:20 mobrovac: restbase deploy start of 79a4d27
  • 20:16 andrewbogott: rebooting labvirt1006
  • 19:58 mobrovac: mobileapps deploying 68c09e
  • 19:54 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: rollback cswiktionary to 1.27.0-wmf.10
  • 19:54 andrewbogott: rebooting labvirt1005
  • 19:32 andrewbogott: rebooting labvirt1004
  • 19:31 logmsgbot: dduvall@tin Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthTokenSessionProvider.php: deploy https://gerrit.wikimedia.org/r/#/c/265545/ for 1.27.0-wmf.11 (duration: 01m 28s)
  • 19:24 mobrovac: restbase rolling-restart after firejail inclusion
  • 19:22 mobrovac: restbase re-enabling puppet in prod
  • 19:14 andrewbogott: rebooting labvirt1003
  • 18:57 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.11
  • 18:53 marxarelli: starting train promotion of group1 to 1.27.0-wmf.11
  • 18:52 marxarelli: sync to mw2020 failed due to failed host key verification, mw2087/mw2039/mw2098 due to connection failed
  • 18:47 marxarelli: 4 apache sync failures during sync-file, appear to be know issues
  • 18:46 andrewbogott: rebooting labvirt1002
  • 18:43 logmsgbot: dduvall@tin Synchronized php-1.27.0-wmf.11/includes/session/PHPSessionHandler.php: deploy follow-up warning fix for T124126 (duration: 01m 28s)
  • 18:43 mobrovac: restbase disabling puppet in prod for testing firejail in staging
  • 18:41 akosiaris: enable puppet and salt-minion on sca100{1,2}.eqiad.wmnet
  • 18:39 akosiaris: depool sca1001, sca1002 for citoid
  • 18:34 akosiaris: pool scb1001, scb1002 for citoid
  • 18:07 andrewbogott: rebooting labvirt1001
  • 17:57 akosiaris: depool sca1001,sca1002 for graphoid pybal config
  • 17:49 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Really enable ContentTranslationCorpora gerrit:265514 (duration: 01m 29s)
  • 17:48 akosiaris: add scb1001, scb1002 in pybal graphoid config
  • 17:30 akosiaris: disabled puppet and salt-minion on sca1001, sca1002 for graphoid upgrade
  • 17:24 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Enable ContentTranslationCorpora Part II gerrit:265459 (duration: 01m 28s)
  • 17:22 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ContentTranslationCorpora Part I gerrit:265459 (duration: 01m 28s)
  • 17:12 _joe_: restarting pybal on the main balancers in ulsfo to consume from etcd
  • 17:02 andrewbogott: rebooting labvirt1008
  • 16:42 jynus: batch-converting m4-master (log) tables from innodb to tokudb
  • 16:42 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.11/extensions/MobileFrontend/MobileFrontend.php: SWAT: Use TitleSquidURLs hook to purge mobile URLs directly Part II gerrit:265486 (duration: 01m 28s)
  • 16:40 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.11/extensions/MobileFrontend/includes/MobileFrontend.hooks.php: SWAT: Use TitleSquidURLs hook to purge mobile URLs directly Part I gerrit:265486 (duration: 01m 28s)
  • 16:35 ottomata: stopped eventlogging mysql consumers for long downtime: https://phabricator.wikimedia.org/T120187
  • 16:28 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.10/extensions/MobileApp/config/config.json: SWAT: Roll out RESTBase usage to Android Beta app: 100% gerrit:265117 (duration: 01m 27s)
  • 16:22 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.11/extensions/MobileApp/config/config.json: SWAT: Roll out RESTBase usage to Android Beta app: 100% gerrit:265118 (duration: 01m 28s)
  • 16:20 ottomata: started eventlogging mysql consumers
  • 16:19 paravoid: deactivating GTT BGP peering on cr2-eqiad
  • 16:05 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: wgRCWatchCategoryMembership true on dewiki gerrit:264732 (duration: 01m 28s)
  • 15:59 ottomata: stopping eventlogging mysql consumers for https://phabricator.wikimedia.org/T123546
  • 14:37 paravoid: upgraded cr2-codfw to JunOS 13.3R8.7
  • 13:20 _joe_: rolling reboot of imagescalers, jobrunners in codfw
  • 12:10 paravoid: upgrading cr1-codfw to JunOS 13.3R8.7
  • 11:27 _joe_: restarting pybal on lvs4003, switching to etcd
  • 11:25 _joe_: restarting pybal on lvs4004, switching to etcd
  • 11:09 jynus: adding new version of mariadb to carbon for jessie (10.0.23-1)
  • 10:19 _joe_: mw2098 doesn't reboot, console unreachable
  • 10:10 jynus: mw2098.codfw.wmnet failed to sync
  • 10:10 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Restore s5 DB configuration (duration: 01m 57s)
  • 09:53 _joe_: rolling reboot of the codfw appserver layer
  • 09:27 _joe_: powercycled mw1162, memory exhaustion
  • 08:01 _joe_: upgrading all codfw appserver layer's kernel to linux-image-3.13.0-76-generic
  • 02:56 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Jan 21 02:56:44 UTC 2016 (duration 7m 9s)
  • 02:49 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 09m 39s)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 33s)
  • 02:24 mobrovac: citoid deploying 3a1b6c8648
  • 02:16 ori: Restarting jobrunner service on job runners to ensure I180856917 gets picked up
  • 01:47 mutante: nitrogen - install package upgrades
  • 01:15 bd808: Restarted logstash on logstash1003
  • 01:14 bd808: Restarted logstash on logstash1002
  • 01:04 logmsgbot: maxsem@tin Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/265395/ (duration: 00m 32s)
  • 00:56 logmsgbot: maxsem@tin Synchronized php-1.27.0-wmf.11/extensions/GeoData/: https://gerrit.wikimedia.org/r/#/c/265409/ (duration: 00m 33s)
  • 00:50 logmsgbot: maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/265142/ (duration: 00m 32s)

2016-01-20

  • 23:56 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.10/extensions/SemanticForms/: fix wikitech again (duration: 00m 34s)
  • 23:06 bd808: Restarted logstash on logstash1001
  • 23:04 bd808: Logstash1001 went nuts and decided that instead of 2016 it would go back to the start of 2015 after 2015-12-31T23:59
  • 22:54 bd808: no HHVM log events in logstash since 2015-12-31T23:59:44.000Z
  • 22:48 bd808: HHVM log messages not being recorded in Logstash; bd808 to investigate
  • 22:38 logmsgbot: tgr@tin Synchronized php-1.27.0-wmf.11/includes/: T124143,T124126 (duration: 00m 36s)
  • 22:06 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.11/extensions/OAuth: Deploy fix for T124224 (duration: 00m 32s)
  • 22:04 logmsgbot: anomie@tin Synchronized php-1.27.0-wmf.2/extensions/OAuth: Deploy fix for T124224 (duration: 00m 34s)
  • 21:51 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.11/extensions/SemanticResultFormats: Fix wikitech log noise (duration: 00m 31s)
  • 21:50 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.11/extensions/SemanticMediaWiki: Fix wikitech log noise (duration: 00m 34s)
  • 21:48 subbu: finished deploying parsoid sha f1ddfb88
  • 21:41 subbu: synced new parsoid code; restarted parsoid on wtp1001 as a canary
  • 21:35 subbu: starting parsoid deploy
  • 21:32 thcipriani: reverted group1 wikis to 1.27.0-wmf.10 due to session errors.
  • 21:30 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.10
  • 21:14 andrewbogott: rebooting labvirt1011
  • 21:08 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.11/extensions/SemanticForms/: Fix fatal on wikitech (duration: 00m 36s)
  • 20:37 akosiaris: s#/dev/md1#/dev/mapper/tank-data# on labvirt1010, reverted by puppet with Notice: /Stage[main]/Role::Labs::Openstack::Nova::Compute/Mount[/var/lib/nova/instances]/device: device changed '/dev/mapper/tank-data' to '/dev/md1'
  • 20:37 akosiaris: s#/dev/md1#/dev/mapper/tank-data#
  • 19:32 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.11
  • 19:14 marxarelli: including labswiki and labtestwiki in group1 promotion after all
  • 19:09 marxarelli: starting promotion of group1, but holding back labswiki and labtestwiki until Jan 21 'all' promotion
  • 18:54 paravoid: manually triggering an ubuntu mirror update ("sudo -u mirror /usr/local/sbin/update-ubuntu-mirror" on carbon)
  • 18:41 jynus: schema change on wikidatawiki (wb_terms) finished- slaves already catching up
  • 18:34 mutante: restart hhvm on mw1206
  • 18:32 godog: bounce stuck hhvm on mw1205
  • 18:06 paravoid: turning up BGP with Zayo in codfw
  • 17:48 jynus: restarting replication on db1026 after schema change
  • 17:09 gwicke: restbase cassandra: set DTCS max_window_size_seconds to 70736000, large enough to accommodate a two-year window
  • 16:56 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Set default graph vega version back to 1 gerrit:265289 (duration: 00m 32s)
  • 16:46 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add davidabian.com to wgCopyUploadsDomains gerrit:265286 (duration: 00m 32s)
  • 16:42 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Change default graph version param. Part II gerrit:265282 (duration: 00m 32s)
  • 16:42 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Change default graph version param. Part I gerrit:265282 (duration: 00m 36s)
  • 16:33 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add davidabian.com to wgCopyUploadsDomains gerrit:259003 (duration: 00m 32s)
  • 16:21 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add *.bodleian.ox.ac.uk to wgCopyUploadsDomains gerrit:265165 (duration: 00m 33s)
  • 16:19 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add *.archives.gov to wgCopyUploadsDomains gerrit:265163 (duration: 00m 32s)
  • 16:13 godog: bounce hhvm on mw1191 and syntaxlight runaway processes
  • 16:05 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable active gadget user stats on enwiki since it takes too long gerrit:265185 (duration: 00m 32s)
  • 14:52 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.11/vendor/: Fix ?PHP properly from commit (duration: 00m 36s)
  • 14:50 godog: powercycle mw1123, hhvm oom
  • 14:47 ema: Finished reverting migration of mobile traffic to text cluster in codfw https://phabricator.wikimedia.org/T109286
  • 14:24 logmsgbot: hoo@tin Synchronized wmf-config/db-eqiad.php: Set db1045 load to 0 (duration: 00m 32s)
  • 14:23 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.11/: consistency (duration: 02m 38s)
  • 14:15 logmsgbot: hoo@tin Synchronized wmf-config/db-eqiad.php: Re-Pool lagged db1045 (duration: 00m 35s)
  • 14:14 _joe_: syncronizing /srv/deployment manually between the two deployment servers for the first time
  • 14:11 logmsgbot: hoo@tin Synchronized wmf-config/db-eqiad.php: Has not been synced before (duration: 00m 32s)
  • 14:07 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.10/: consistency (duration: 02m 38s)
  • 13:58 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.11/extensions/Validator/: noop for wikitech deploy (duration: 00m 32s)
  • 13:58 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.11/extensions/SemanticMediaWiki/: noop for wikitech deploy (duration: 00m 34s)
  • 13:57 logmsgbot: reedy@tin Synchronized php-1.27.0-wmf.11/extensions/SemanticResultFormats/: noop for wikitech deploy (duration: 00m 33s)
  • 13:41 ema: Revert migration of mobile traffic to text cluster in codfw https://phabricator.wikimedia.org/T109286
  • 12:55 akosiaris: restart hhvm on mw1130
  • 12:43 jynus: performing alter table on db1026 (ETA: 5 hours)
  • 12:20 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Setting s5 master as recentchanges role (duration: 00m 32s)
  • 12:04 jynus: trying schema change on wikidata (wb_terms)
  • 09:36 akosiaris: gnt-instance modify -H disk_aio=native cygnus.codfw.wmnet
  • 09:18 akosiaris: offline fr_archive volume on nas1001-a
  • 09:15 akosiaris: unexport /vol/fr_archive on nas1001-a
  • 07:56 _joe_: powercycling mw1162, unable to login from console, memory exhaustion
  • 07:24 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.10/extensions/CirrusSearch/includes/DataSender.php: stop checking for frozen indices while codfw elasticsearch recovers (duration: 01m 42s)
  • 06:24 ebernhardson: codfw elasticsearch cluster stopped responding during load test, idling test to see if it recovers
  • 03:44 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Jan 20 03:44:48 UTC 2016 (duration 7m 29s)
  • 03:37 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 16m 21s)
  • 03:02 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 10m 06s)
  • 02:35 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 11m 20s)
  • 01:27 logmsgbot: aaron@tin Synchronized wmf-config: Configure $wgCdnReboundPurgeDelay (duration: 00m 32s)
  • 01:01 mobrovac: restbase deploy end of d621b76
  • 00:57 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264917/ (duration: 00m 32s)
  • 00:56 legoktm: delete from localuser where lu_name ="Αντώνης Μανιός" and lu_wiki ="mediawikiwiki" limit 1 on centralauth db for T119736
  • 00:53 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264920/ (duration: 00m 33s)
  • 00:49 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/MobileFrontend/includes/api/ApiMobileView.php: https://gerrit.wikimedia.org/r/#/c/264973/ (duration: 00m 32s)
  • 00:49 mobrovac: restbase deploy start of d621b76
  • 00:38 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/264961/ (duration: 00m 31s)
  • 00:37 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264961/ (duration: 00m 33s)
  • 00:22 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264260/ (duration: 00m 32s)
  • 00:21 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/264260/ (duration: 00m 32s)
  • 00:17 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/CirrusSearch: https://gerrit.wikimedia.org/r/#/c/265146/ (duration: 00m 33s)
  • 00:10 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/CirrusSearch/includes/ElasticsearchIntermediary.php: https://gerrit.wikimedia.org/r/#/c/264989/ (duration: 00m 32s)

2016-01-19

  • 23:33 logmsgbot: aaron@tin Synchronized wmf-config/CommonSettings.php: Bump $wgJobBackoffThrottling to lower the htmlcacheupdate backlog (duration: 00m 32s)
  • 23:22 logmsgbot: krenair@tin Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/265145 (duration: 02m 24s)
  • 23:19 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.11
  • 23:13 logmsgbot: dduvall@tin Finished scap: testwiki to php-1.27.0-wmf.11 and rebuild l10n cache (duration: 72m 03s)
  • 22:01 logmsgbot: dduvall@tin Started scap: testwiki to php-1.27.0-wmf.11 and rebuild l10n cache
  • 21:35 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/265135 (duration: 00m 32s)
  • 21:33 logmsgbot: krenair@tin Synchronized dblists/nonglobal.dblist: https://gerrit.wikimedia.org/r/265135 (duration: 03m 21s)
  • 21:33 ema: Finished migrating mobile traffic to text cluster in codfw (Mexico + green US states on this map https://phabricator.wikimedia.org/T114659)
  • 21:15 logmsgbot: dduvall@tin scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.qyk48j8kem" ' returned non-zero exit status 1 (duration: 16m 11s)
  • 20:59 Krenair: sync-common on labtestweb2001
  • 20:58 logmsgbot: dduvall@tin Started scap: testwiki to php-1.27.0-wmf.11 and rebuild l10n cache
  • 20:48 mutante: tin: deleted unused things from /srv/deployment (T120157)
  • 20:46 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Disable global AbuseFilters on non-global wikis (duration: 02m 04s)
  • 20:25 logmsgbot: dduvall@tin scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="labtestwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.jRNpeW67FO" ' returned non-zero exit status 1 (duration: 01m 31s)
  • 20:23 logmsgbot: dduvall@tin Started scap: testwiki to php-1.27.0-wmf.11 and rebuild l10n cache
  • 20:13 mutante: ruthenium: disable puppet, copy data over to osmium (screen)
  • 20:12 mutante: ruthenium: service mysql stop
  • 19:15 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: EventBus plumbing (duration: 00m 30s)
  • 19:14 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Disable Flow on wikitech; add EventBus plumbing (duration: 00m 31s)
  • 19:13 logmsgbot: catrope@tin Synchronized wmf-config/extension-list: Add EventBus (duration: 00m 31s)
  • 19:00 marxarelli: starting branch cut for 1.27.0-wmf.11
  • 18:42 ema: Starting migration of mobile traffic to text cluster https://phabricator.wikimedia.org/T109286
  • 17:54 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/UploadWizard/UploadWizard.config.php: https://gerrit.wikimedia.org/r/#/c/264969/ (duration: 00m 31s)
  • 16:51 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/264964/ (duration: 00m 31s)
  • 16:47 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Graph/modules/graph-loader.js: https://gerrit.wikimedia.org/r/#/c/264715/ (duration: 00m 31s)
  • 16:45 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264469/ (duration: 00m 31s)
  • 16:41 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/264437/ (duration: 00m 32s)
  • 14:58 cmjohnson1: reseating asw-c-eqiad uplink module (xe-1/1/0 and xe-1/1/2)
  • 14:29 jynus: reimporting some fawiki tables from production into labsdb hosts
  • 13:52 godog: powercycle ms-be1001
  • 13:51 paravoid: powercycling alsafi
  • 02:53 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 19 02:53:40 UTC 2016 (duration 7m 0s)
  • 02:46 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 21s)
  • 02:26 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 40s)

2016-01-18

  • 23:26 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: https://gerrit.wikimedia.org/r/264895 (duration: 00m 31s)
  • 23:08 logmsgbot: krenair@tin Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/264786/ (duration: 00m 32s)
  • 22:55 logmsgbot: krenair@tin rebuilt wikiversions.php and synchronized wikiversions files: (no message)
  • 22:55 logmsgbot: krenair@tin Synchronized dblists: (no message) (duration: 00m 31s)
  • 22:53 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/wikitech.png: https://gerrit.wikimedia.org/r/#/c/264786/ (duration: 00m 31s)
  • 17:30 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/264758 - labs-only change (duration: 00m 36s)
  • 14:24 godog: powercycle praseodymium
  • 10:42 godog: powercycle ms-be2016, high load avg
  • 10:16 godog: dist-upgrade ms-be3002 to trusty
  • 02:57 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Jan 18 02:57:41 UTC 2016 (duration 7m 8s)
  • 02:50 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 08m 39s)
  • 02:49 YuviPanda: updated annualreport for foks
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 11m 38s)

2016-01-17

  • 04:58 YuviPanda: started restbase on restbase1002
  • 02:53 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Jan 17 02:53:19 UTC 2016 (duration 6m 59s)
  • 02:46 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 08m 53s)
  • 02:26 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 41s)
  • 01:47 paravoid: restarting HHVM on mw1120, mw1125, mw1127, mw1132, mw1148; OOM

2016-01-16

  • 19:52 andrewbogott: renaming and reimaging labcontrol2001 -> labtestweb2001
  • 15:57 milimetric: piwik is taking events on bohrium but the interface can't complete the queries to load because there's too much data. Mysql is maxing the CPU but it seems ok for now, will check again Monday.
  • 15:22 milimetric: restarted mysql on bohrium because it had stopped working (probably due to piwik performance problems)
  • 03:02 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 16 03:02:21 UTC 2016 (duration 6m 57s)
  • 02:55 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 08m 35s)
  • 02:35 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 18m 55s)

2016-01-15

  • 22:43 logmsgbot: aaron@tin Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthUseSlaves for testwiki (duration: 00m 33s)
  • 22:38 mutante: gadolinium - shutdown -h now
  • 22:35 mutante: erbium - killing from puppet/icinga/salt
  • 21:54 mutante: mira - starting salt
  • 21:29 mutante: protactinium - shut down, unused system with outdated software
  • 21:09 mutante: (ganglia for ulsfo will be affected, brb)
  • 21:07 mutante: bast4001 - reinstalling with jessie
  • 18:55 ori: disabled gzip in apache for javascript mime types and did an apache config reload
  • 18:04 logmsgbot: ori@tin Synchronized docroot and w: Ie60638b0: Mirror homepage.js from 15.wikipedia.org (duration: 00m 42s)
  • 16:01 godog: bounce hhvm on mw1129 / mw1204
  • 15:41 godog: reimage ms-be3001 with trusty
  • 14:54 godog: reimage ms-fe3002 with trusty
  • 14:13 mark: Temporarily paused md126 RAID check on labstore1001 (sync_action idle)
  • 14:09 chasemp: phab restart phd (reports as not running in phab itself) seems ok now
  • 14:03 mark: set sync_speed_min to 5000 for md126 on labstore1001
  • 13:28 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: w:he as import source for commonswiki (duration: 00m 49s)
  • 12:17 hashar: restarting Jenkins for plugins updates
  • 11:07 _joe_: re-enabled puppet on mw1013, restarted HHVM to make it pick up our latest changes
  • 10:01 moritzm: installed ganeti security updates
  • 09:18 moritzm: installed git security updates on all jessie systems
  • 03:10 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Jan 15 03:10:09 UTC 2016 (duration 6m 48s)
  • 03:03 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 16m 02s)
  • 02:30 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/includes/api/ApiQueryRecentChanges.php: https://gerrit.wikimedia.org/r/264231 (duration: 00m 42s)
  • 02:29 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 14m 00s)
  • 02:23 YuviPanda: pull annualreport git repo on bromine for Krenair
  • 01:00 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/includes/api/ApiQueryWatchlist.php: https://gerrit.wikimedia.org/r/#/c/264224/ (duration: 00m 31s)
  • 00:27 logmsgbot: krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/263905/ (duration: 00m 32s)
  • 00:24 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 31s)
  • 00:22 logmsgbot: krenair@tin Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/264091/ (duration: 00m 32s)
  • 00:06 mobrovac: restbase started a dump of enwiki to populate storage with mobileapps renders

2016-01-14

  • 23:56 mobrovac: restbase end deploy of dac31a8c
  • 23:49 mobrovac: restbase start deploy of dac31a8c
  • 22:17 csteipp: deployed patch for T122807
  • 19:55 ottomata: restarted eventlogging_sync script to insert batches of 1000
  • 19:31 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: rollback labswiki to wmf.9
  • 19:02 logmsgbot: dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.10
  • 18:40 bblack: removing old eqiad misc-web IP (DNS switched 50h ago (not 26 like above), TTLs are max 1h)
  • 18:39 bblack: removing old eqiad misc-web IP (DNS switched 26h ago, TTLs are max 1h)
  • 18:01 paravoid: turning up BGP with Zayo in eqiad
  • 16:25 logmsgbot: demon@tin Synchronized wmf-config/throttle.php: (no message) (duration: 00m 49s)
  • 15:48 moritzm: installed DHCP security updates across the fleet
  • 14:44 _joe_: powercycling mw1013, console stuck
  • 11:28 godog: bounce uwsgi on labmon1001
  • 11:18 godog: upgrade graphite-carbon / graphite-web on labmon1001
  • 10:38 _joe_: restarting hhvm on odd-numbered jobrunners
  • 10:29 moritzm: installed DHCP security updates on carbon
  • 04:28 paravoid: powercycling mw1005/mw1011
  • 04:24 paravoid: restart hhvm on odd-numbered appservers
  • 02:30 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 12m 21s)
  • 01:32 Krenair: Wikitech rolled back to wmf.9 due to T123583
  • 01:27 logmsgbot: krenair@tin rebuilt wikiversions.php and synchronized wikiversions files: (no message)
  • 01:06 mutante: mw1009 - restarted hhvm
  • 01:00 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/VisualEditor/extension.json: https://gerrit.wikimedia.org/r/#/c/264031/ (duration: 01m 35s)
  • 00:30 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/CirrusSearch/includes: https://gerrit.wikimedia.org/r/#q,263991,n,z (duration: 06m 08s)
  • 00:11 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/263804/ (duration: 00m 31s)
  • 00:10 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263804/ (duration: 00m 31s)
  • 00:08 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Echo/modules/echo.variables.less: https://gerrit.wikimedia.org/r/#/c/263767/ (duration: 00m 45s)

2016-01-13

  • 23:46 tgr: T123451: running mwscript sql.php --wiki=metawiki patch-bot_passwords.sql
  • 23:09 mobrovac: restbase end deploy of 536e15b6
  • 22:58 andrewbogott: /etc/init.d/nfs-kernel-server restart on labstore1001
  • 22:54 mobrovac: restbase start deploy of 536e15b6
  • 22:20 logmsgbot: catrope@tin Synchronized wmf-config/: sync labs-only config changes (duration: 00m 32s)
  • 21:54 mobrovac: restbase end deploy of 559a13a
  • 21:44 mobrovac: restbase start deploy of 559a13a
  • 21:40 mdholloway: mobileapps deployed c9e7e28
  • 21:27 aude: Updated cirrus search mappings for testwikidata and wikidata to add new fields
  • 21:02 ori: Disabling Puppet on mw1013 (eqiad jobrunner) to hack in some debug logging into GWT jobs.
  • 20:01 ottomata: dropped MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 from analytics-store eventlogging slave db
  • 19:55 ostriches: *wikimania2017wiki_content
  • 19:55 ostriches: elasticsearch: wikimania2017_content was reporting as missing in logstash, ran updateSearchIndexConfig. messy aliases? Seems to be working again.
  • 19:27 ottomata: dropping eventlogging tables from MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 m4-master log database. These are too large and have been blacklisted from mysql. No more events will be inserted into mysql for these. We are attempting to help replication catch up on the analytics-store slave.
  • 19:11 logmsgbot: thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.10
  • 18:33 RobH: restarted zotero/mobileapps on sca1*/scb1* respectively for marko's code deploy
  • 18:33 RobH: restarted zotero/mobileapps on sca1*/scb1* respectively
  • 18:27 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: OfficeIT namespace on wikitech (duration: 00m 31s)
  • 18:03 mobrovac: zotero deploying translators 0476aa0
  • 17:12 gwicke: restarted mathoid on scb1001 and scb1002
  • 17:06 gwicke: restarted mathoid on sca1001 and sca1002
  • 17:00 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/263865/ (duration: 00m 41s)
  • 16:31 logmsgbot: krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/263625/ (duration: 00m 31s)
  • 16:28 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263341/ (duration: 00m 31s)
  • 16:22 logmsgbot: krenair@tin Synchronized portals: https://gerrit.wikimedia.org/r/#/c/263796/ (duration: 00m 31s)
  • 16:20 logmsgbot: krenair@tin Synchronized wmf-config/Wikibase-production.php: https://gerrit.wikimedia.org/r/#/c/263838/ (duration: 00m 31s)
  • 16:14 logmsgbot: krenair@tin Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/#/c/263354/ (duration: 00m 31s)
  • 16:03 logmsgbot: krenair@tin Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/263370/3 (duration: 00m 31s)
  • 14:11 godog: bounce hhvm on mw1007
  • 14:03 godog: bounce hhvm on mw1005, powercycle mw1011
  • 13:46 godog: bounce hhvm on mw1009, powercycle mw1003
  • 13:39 godog: bounce hhvm on mw1013
  • 10:31 paravoid: upgrading grafana 2.6.0-beta1 -> 2.6.0
  • 06:45 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.9/extensions/GWToolset: Ib9375b: Make sure XMLReader::close() is always called (T122069) (duration: 00m 32s)
  • 06:43 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.10/extensions/GWToolset: Ib9375b: Make sure XMLReader::close() is always called (T122069) (duration: 01m 07s)
  • 03:15 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Jan 13 03:15:57 UTC 2016 (duration 7m 13s)
  • 03:08 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 16m 09s)
  • 02:57 Krinkle: Manually killed uwsgi graphite-web child processes on graphite1001. Service recovered itself from there.
  • 02:44 Krinkle: Graphite is down. Consistently returns HTTP 502 Bad Gateway for any/all requests
  • 02:34 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 11m 13s)
  • 01:33 yurik: deployed tilerator maps service
  • 01:19 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Echo/Resources.php: https://gerrit.wikimedia.org/r/#/c/263645/ (duration: 00m 32s)
  • 01:18 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Flow/modules/editor/editors/visualeditor/mw.flow.ve.Target.js: https://gerrit.wikimedia.org/r/#/c/263644/ (duration: 00m 31s)
  • 01:03 logmsgbot: krenair@tin Synchronized portals: https://gerrit.wikimedia.org/r/#/c/263770/ - after having done the submodule update this time (duration: 00m 31s)
  • 00:37 logmsgbot: krenair@tin Synchronized portals: https://gerrit.wikimedia.org/r/#/c/263770/ (duration: 00m 33s)
  • 00:31 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/261994/ (duration: 00m 31s)
  • 00:28 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/262895/ (duration: 00m 32s)
  • 00:25 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/262894/ (duration: 00m 30s)
  • 00:17 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263237/ (duration: 00m 31s)
  • 00:15 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/262999/ (duration: 00m 31s)
  • 00:10 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263201/ (duration: 00m 30s)
  • 00:08 yurik: switched all maps kartotherian servers to v5, restarted
  • 00:06 logmsgbot: krenair@tin Synchronized images/mobile/wikivoyage.png: https://gerrit.wikimedia.org/r/#/c/263201/ (duration: 00m 31s)
  • 00:06 logmsgbot: krenair@tin Synchronized images/mobile/wikidata.png: https://gerrit.wikimedia.org/r/#/c/263201/ (duration: 00m 32s)

2016-01-12

  • 21:58 ori: Restarting jobchron / jobrunner / HHVM on all job runners for I44990808
  • 21:07 logmsgbot: hoo@tin Synchronized php-1.27.0-wmf.10/extensions/Math/: Introduce a "MathEnableWikibaseDataType" config (duration: 00m 32s)
  • 20:52 logmsgbot: hoo@tin Synchronized wmf-config/: Set $wgMathEnableWikibaseDataType to false (duration: 01m 29s)
  • 20:44 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.10
  • 20:34 logmsgbot: thcipriani@tin Finished scap: testwiki to php-1.27.0-wmf.10 and rebuild l10n cache (duration: 54m 42s)
  • 20:14 mobrovac: restbase switching restbase200x to node 4.2
  • 20:13 mobrovac: restbase switch of restbase100[1-4] to node 4.2 completed
  • 20:10 mobrovac: restbase switching restbase100[1-4] to node 4.2
  • 19:39 logmsgbot: thcipriani@tin Started scap: testwiki to php-1.27.0-wmf.10 and rebuild l10n cache
  • 19:31 logmsgbot: dduvall@tin scap failed: CalledProcessError Command 'sudo -u www-data -n -- /bin/mktemp' returned non-zero exit status 1 (duration: 00m 42s)
  • 19:30 logmsgbot: dduvall@tin Started scap: testwiki to php-1.27.0-wmf.10 and rebuild l10n cache
  • 19:26 YuviPanda: import new r-base package into carbon
  • 18:15 marxarelli: cutting MW branch 1.27.0-wmf.10
  • 17:37 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263632/ (duration: 00m 31s)
  • 16:53 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Import sources on gu.wikipedia gerrit:258441 (duration: 00m 29s)
  • 16:48 logmsgbot: thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Get rid of old unused $wgAllowed* variables gerrit:256853 (duration: 00m 29s)
  • 16:47 _joe_: restarted salt-minion on tin
  • 16:44 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add portal namespace to ps.wikipedia.org gerrit:255519 (duration: 00m 30s)
  • 16:42 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove proxyunbannable gerrit:254842 (duration: 00m 30s)
  • 16:37 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Allow sysop to grant and revoke transwiki on gu.wikipedia gerrit:258474 (duration: 00m 29s)
  • 16:33 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration on pa.wikipedia gerrit:258436 (duration: 00m 29s)
  • 16:22 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration on my.wikipedia gerrit:258442 (duration: 00m 30s)
  • 15:56 godog: reprovision ms-fe3001 with jessie
  • 14:55 ema: added myself to ops and wmf ldap groups
  • 11:57 _joe_: enabling auth on the production etcd cluster
  • 08:37 paravoid: ms-be1002: echo b > /proc/sysrq-trigger, kernel misbehaving and unrecoverable (out of kernel memory/XFS issues)
  • 07:38 paravoid: cr2-eqiad: reenable BGP peerings with GTT
  • 05:31 paravoid: rm CirrusSearchRequests.log-201510*.gz on fluorine (saving ~200G)
  • 04:07 paravoid: cleaning up elastic1006's /var/log from old logs
  • 03:59 paravoid: reenabling puppet on sca1001/2; no reason was left
  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 12 02:33:00 UTC 2016 (duration 6m 55s)
  • 02:26 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 47s)
  • 00:46 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: rv 443026e3ad18934dd0017a258673d88104cf6b5e (duration: 00m 29s)
  • 00:32 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/258670/ (duration: 00m 30s)
  • 00:29 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/258672/ (duration: 00m 30s)
  • 00:25 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/258453/ (duration: 00m 30s)
  • 00:18 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/258444/ (duration: 00m 30s)
  • 00:14 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/255361/ (duration: 00m 30s)
  • 00:10 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/244140/ (duration: 00m 30s)
  • 00:09 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/244140/ (duration: 00m 30s)
  • 00:06 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/260242/ (duration: 00m 30s)

2016-01-11

  • 22:52 logmsgbot: jzerebecki@tin Synchronized wmf-config/throttle.php: deploying https://gerrit.wikimedia.org/r/#/c/263427/ (duration: 00m 30s)
  • 22:48 YuviPanda: restart eventlogging_synch on dbstore1002
  • 22:47 logmsgbot: jzerebecki@tin Synchronized php-1.27.0-wmf.9/extensions/Wikidata/extensions/Wikibase/repo/maintenance/dispatchChanges.php: restoring truncated Wikidata dispatchChanges.php to let dispatchers run again (duration: 00m 30s)
  • 22:46 mutante: restbase1004, restbase2002, restbase2005 - manually install nodejs
  • 22:45 logmsgbot: jzerebecki@tin Synchronized php-1.27.0-wmf.9/extensions/Wikidata/extensions/Wikibase/repo: deploying https://gerrit.wikimedia.org/r/#/c/253898/ with dispatchChanges.php still truncated (duration: 00m 33s)
  • 22:40 mutante: restbase1001 - apt-get install nodejs
  • 22:40 jzerebecki: dispatchChanges.php killed on terbium
  • 22:38 logmsgbot: jzerebecki@tin Synchronized php-1.27.0-wmf.9/extensions/Wikidata/extensions/Wikibase/repo/maintenance/dispatchChanges.php: truncating Wikidata dispatchChanges.php to stop dispatchers as preparation for https://gerrit.wikimedia.org/r/#/c/253898/ (duration: 00m 31s)
  • 21:19 papaul: pc200[4-6] - signing puppet certs, salt-key, initial run
  • 21:13 subbu: finished deploying parsoid sha 07494cf2
  • 21:06 papaul: installing OS on pc200[4-6]
  • 21:06 subbu: synced new code; restarted parsoid on wtp1003 as a canary
  • 21:02 subbu: starting parsoid deploy
  • 18:52 RobH: rt.w.o cert expired and its replacement will be later today (rt is internal ops only tool)
  • 18:36 RobH: tendril cert updated and neon returned to normal service
  • 18:30 ori: Restarting HHVM on all job runners, to vacate memory now that the cause of the leak appears to have subsided.(T122069)
  • 18:24 RobH: tendril updating ssl cert on neon, https may flap for a second (this is on neon, so icinga https portal may also flap)
  • 17:29 hoo: Updated Wikidata's property suggester with data from today's json dump
  • 17:16 papaul: db2033 - signing puppet certs, salt-key, initial run
  • 16:58 papaul: installing OS on db2033
  • 16:49 logmsgbot: thcipriani@tin Synchronized robots.txt: SWAT: Remove overager unrequested /wiki/User: robots.txt rule gerrit:263360 (duration: 00m 30s)
  • 16:41 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable new user groups on gu.wikipedia.org gerrit:255810 (duration: 00m 30s)
  • 16:34 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: dewikibooks: Set $wgRestrictDisplayTitle to false gerrit:260964 (duration: 00m 30s)
  • 16:30 godog: halt ms-be1013, required to reset idrac
  • 16:27 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable global AubseFilter at French Wikipedia gerrit:257868 (duration: 00m 29s)
  • 16:23 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Changed user group rights at trwikiquote gerrit:261869 (duration: 00m 30s)
  • 16:16 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Added noindex rule for uawikimedia user namespace gerrit:261902 (duration: 00m 30s)
  • 16:09 logmsgbot: thcipriani@tin Synchronized robots.txt: SWAT: Tidy robots.txt gerrit:240065 (duration: 00m 30s)
  • 16:08 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgLocaltimezone for orwiki gerrit:260745 (duration: 00m 29s)
  • 16:03 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add enwiki as transwiki import source for ta.wikipedia gerrit:262352 (duration: 00m 33s)
  • 15:05 godog: repool restbase1004 in pybal, fully bootstrapped and running latest code
  • 11:14 _joe_: upgrading etcd to 2.2.1 in production
  • 10:36 _joe_: updating nodejs on restbase-test2002
  • 07:17 _joe_: restarting HHVM on a few jobrunners
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Jan 11 02:32:37 UTC 2016 (duration 6m 55s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 39s)
  • 01:11 paravoid: deactivating eqiad<->GTT BGP peering, reported network issues (P2469)

2016-01-10

  • 22:00 gwicke: restbase: 1005-1009 now on node 4.2
  • 19:44 paravoid: powercycling mw1004, mw1008, mw1012
  • 19:38 paravoid: restarting hhvm on jobrunners again
  • 12:40 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 626m 20s)
  • 10:13 ori: disabled categoryMembershipChange on mw1165 too, then restart jobrunner / jobchron / hhvm on mw1165 and mw1164
  • 08:55 ori: mw1166 -- disabled puppet; disabled categoryMembershipChange jobs
  • 08:48 ori: mw1167 -- disabled puppet; disabled deleteLinks and refreshLinks* jobs
  • 08:45 ori: mw1168 -- disabled puppet; disabled restbase jobs
  • 08:41 ori: mw1169 -- disables cirrus jobs.
  • 08:33 ori: Attempting to isolate cause of T122069 by toggling job types on mw1169. Disabling Puppet to prevent it from clobbering config changes.
  • 08:29 paravoid: restarting hhvm on jobrunners again
  • 04:58 paravoid: powercycling mw1005, mw1008, mw1009 -- unresponsive due to OOM
  • 04:56 paravoid: restarting HHVM on eqiad jobrunners, OOM, memleak faster than the 24h restarts

2016-01-09

  • 02:33 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 9 02:33:40 UTC 2016 (duration 6m 57s)
  • 02:26 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 11m 19s)

2016-01-08

  • 23:49 RobH: stalled puppet on carbon for now, messing with partman files
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Jan 8 02:31:46 UTC 2016 (duration 7m 0s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 15s)

2016-01-07

  • 23:24 akosiaris: repooled scb1002 for mobileapps
  • 23:24 akosiaris: enabled puppet,salt on scb1001
  • 23:23 mobrovac: mobileapps deploying 58b371a on scb1001
  • 23:09 mobrovac: mobileapps deploying 58b371a on scb1002
  • 23:01 akosiaris: apt-mark hold nodejs on scb1001, etherpad1001 and maps-test200{1,2,3,4}
  • 22:58 akosiaris: disable puppet and salt on scb1001 from nodejs 4.2 transition
  • 22:57 akosiaris: depool scb1002 for mobileapps. Transition to nodejs 4.2 ongoing
  • 19:21 YuviPanda: started tools / maps backup on labstore1001
  • 19:13 YuviPanda: remove snapshots others20150815030010, others20150815030010, maps20151216040005 and maps20151028040004 that were all stale and should've been removed anyway (on labstore2001)
  • 19:13 YuviPanda: remove snapshots others20150815030010, others20150815030010, maps20151216040005 and maps20151028040004 that were all stale and should've been removed anyway
  • 19:11 jynus: setting up watchdog process killing long running queries on db1051
  • 19:11 YuviPanda: run sudo lvremove backup/tools20151216020005 on labstore2001 to clean up full snapshot
  • 18:54 _joe_: also resetting the drac
  • 18:53 _joe_: powercycling ms-be1013
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Jan 7 02:32:04 UTC 2016 (duration 6m 54s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 33s)

2016-01-06

  • 23:03 gwicke: switched restbase1009 to node 4.2 for testing, and restarted restbase; see https://phabricator.wikimedia.org/T107762
  • 02:34 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Jan 6 02:34:38 UTC 2016 (duration 6m 53s)
  • 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 30s)

2016-01-05

  • 22:38 logmsgbot: aaron@tin Synchronized rpc: 830e1ed8d80295710dc02f18102b4fadae7fca86 (duration: 00m 55s)
  • 18:34 logmsgbot: jzerebecki@tin scap aborted: deploy-log (duration: 00m 04s)
  • 18:34 logmsgbot: jzerebecki@tin Started scap: deploy-log
  • 15:47 ottomata: transitioned analytics1001 to active namenode
  • 03:51 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.9/includes/specials/SpecialJavaScriptTest.php: Idaacf71870 (duration: 00m 30s)
  • 03:50 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.9/resources/src/mediawiki.special/: Idaacf71870 (duration: 00m 30s)
  • 03:49 logmsgbot: krinkle@tin Synchronized php-1.27.0-wmf.9/resources/Resources.php: Idaacf71870 (duration: 00m 36s)
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 5 02:31:46 UTC 2016 (duration 6m 54s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 13s)

2016-01-04

  • 20:50 mutante: ms-be1011 - powercycled, was frozen
  • 20:43 mutante: ms-be2007 - System halted!Error: Integrated RAID
  • 20:42 mutante: ms-be2007 - powercycle (was status: on but all frozen) (i assume xfs like be2006 appears in SAL recently)
  • 20:36 mutante: mw2019 - puppet run (icinga claimed it failed but just here)
  • 20:19 mutante: rutherfordium - attempt to restart with gnt-instance
  • 20:12 mutante: rutherfordium (people.wm) was down for days per icinga - then magically fixes itself when i connect to console but before even loggin in (ganeti VM)
  • 20:00 mutante: mw1123 - start HHVM (was 503 and service stopped)
  • 19:28 mutante: elastic1006 - out of disk - gzip eqiad_index_search_slowlog.log files
  • 17:37 logmsgbot: yurik@tin Synchronized php-1.27.0-wmf.9/extensions/Graph/: Deployed Graph ext - gerrit 262357 (duration: 00m 33s)
  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Jan 4 02:32:10 UTC 2016 (duration 6m 53s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 05s)

2016-01-03

  • 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Jan 3 02:31:58 UTC 2016 (duration 6m 52s)
  • 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 22s)

2016-01-02

  • 03:34 twentyafterfour: deploying https://gerrit.wikimedia.org/r/261725, restarted apache2 on iridium
  • 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 2 02:31:28 UTC 2016 (duration 6m 58s)
  • 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 09s)
  • 01:04 YuviPanda: imported vagrant 1.8.1 for jessie per bd808
  • 00:04 ori: (at 23:46 UTC) restarted nova-compute on labvirt1002

2016-01-01

  • 23:50 legoktm: restarted nodepool on labnodepool1001
  • 23:37 ori: restarting nodepool on labnodepool1001.eqiad.wmnet (T122731)
  • 19:41 bd808: Updated scholarships.wikimedia.org with latest translation data from translatewiki
  • 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Jan 1 02:30:27 UTC 2016 (duration 6m 47s)
  • 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 58s)


2000s

2010s

2020s