Server Admin Log

From Wikitech
Jump to: navigation, search

2017-06-29

  • 01:08 mutante: mwlog1001 - deleted /srv/xenon/logs from 2015 and 2016 as requested by Krinkle. Also merged https://gerrit.wikimedia.org/r/#/c/362114/ so now logs are retained for 14 days
  • 00:23 krinkle@tin: Synchronized wmf-config/InitialiseSettings.php: I8ce28a4ce7 - test2wiki config cleanup (duration: 00m 47s)

2017-06-28

  • 23:44 thcipriani@tin: Synchronized php-1.30.0-wmf.7/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Adding ssclick events for sister-search results T168916 (duration: 00m 46s)
  • 23:36 thcipriani@tin: Synchronized php-1.30.0-wmf.6/extensions/MobileFrontend/includes/specials/SpecialMobileDiff.php: SWAT: Revert "Run DiffViewHeader in mobile mode, too" T169024 (duration: 00m 46s)
  • 23:35 thcipriani@tin: Synchronized php-1.30.0-wmf.7/extensions/MobileFrontend/includes/specials/SpecialMobileDiff.php: SWAT: Revert "Run DiffViewHeader in mobile mode, too" T169024 (duration: 00m 47s)
  • 22:10 demon@tin: Synchronized wmf-config/InitialiseSettings.php: rm more stupid logging, wow this stuff has piled up (duration: 00m 46s)
  • 22:09 ppchelko@tin: Finished deploy [eventstreams/deploy@ba71a84]: redeploy to pick up config changes (duration: 02m 01s)
  • 22:07 ppchelko@tin: Started deploy [eventstreams/deploy@ba71a84]: redeploy to pick up config changes
  • 22:06 demon@tin: Synchronized wmf-config/InitialiseSettings.php: kill temp-debug (duration: 00m 46s)
  • 21:50 robh: wtp1025-1048 are having icinga reporting errors, they are new installs on stretch
  • 21:48 demon@tin: Synchronized wmf-config/InitialiseSettings.php: kill weird testwiki logging (duration: 00m 47s)
  • 21:38 ppchelko@tin: Finished deploy [eventstreams/deploy@05bcc8f]: redeploy to pick up config changes (duration: 00m 20s)
  • 21:37 ppchelko@tin: Started deploy [eventstreams/deploy@05bcc8f]: redeploy to pick up config changes
  • 21:34 demon@tin: Synchronized wmf-config/InitialiseSettings.php: kill oai logging channel (duration: 00m 47s)
  • 20:17 twentyafterfour@tin: Synchronized php-1.30.0-wmf.7/extensions/VisualEditor/VisualEditor.hooks.php: sync https://gerrit.wikimedia.org/r/#/c/361941/ refs T169132 T167536 (duration: 00m 47s)
  • 20:08 mutante: migrating servermon to stretch on netmon1002 is currently blocked by "python-django-south" package not existing anymore
  • 19:36 robh: puppet suspended on install1002 for robh to livehack the dhcp file for a single reboot of wtp1025
  • 19:26 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.7 refs T167536
  • 19:26 twentyafterfour@tin: Synchronized php-1.30.0-wmf.7/extensions/LoginNotify/includes/Hooks.php: deploy https://gerrit.wikimedia.org/r/#/c/361935/ to wmf.7 refs T168899 + T167536 (duration: 00m 45s)
  • 19:17 twentyafterfour: cherry-picked https://gerrit.wikimedia.org/r/#/c/361935/ to wmf.7 refs T168899 + T167536
  • 19:00 ebernhardson: starting load testing of elasticsearch in codfw
  • 18:31 joal@tin: Finished deploy [analytics/refinery@f6cccf9]: Regular deploy - One week late- Big changes (duration: 04m 49s)
  • 18:26 joal@tin: Started deploy [analytics/refinery@f6cccf9]: Regular deploy - One week late- Big changes
  • 18:13 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable autopatrol flag on ptwikivoyage T168981 (duration: 00m 47s)
  • 18:05 aaron@tin: Synchronized wmf-config/CommonSettings.php: Set $wgTrxProfilerLimits[PostSend] to avoid notices for now (duration: 00m 47s)
  • 18:04 kartik@tin: Finished deploy [cxserver/deploy@894e3fe]: (no justification provided) (duration: 02m 03s)
  • 18:02 kartik@tin: Started deploy [cxserver/deploy@894e3fe]: (no justification provided)
  • 15:52 marostegui: Temporary ignore jawiki.watchlist table during replication on dbstore1001 - T169050
  • 15:47 kartik@tin: Finished deploy [cxserver/deploy@894e3fe]: (no justification provided) (duration: 02m 47s)
  • 15:44 kartik@tin: Started deploy [cxserver/deploy@894e3fe]: (no justification provided)
  • 15:29 jynus: slowly enabling puppet on pending database hosts, checking diff on each one
  • 14:42 hashar: pypi.python.org is back again - T169091
  • 14:06 hashar: pypi.python.org has an issue with its CDN . That would affect any CI jobs relying on tox/python - See https://status.python.org for updates and T169091
  • 14:03 hashar: pypi.python.org has an issue with its CDN . That would affect any CI jobs relying on tox/python - See https://status.python.org for updates
  • 13:51 XioNoX: tigntening BGP configuration on cr* routers - T169048
  • 13:44 gehel: start reimage of the maps-test cluster - T169011
  • 13:30 akosiaris: renumber install1002
  • 12:47 marostegui: Deploy alter table on s3 directly on codfw master (db2018) and let it replicate - T168661
  • 12:42 jynus: starting enabling puppet on db2* hosts
  • 12:37 XioNoX: restricted inbound BGP to configured neighbors on pfw - T169048
  • 12:18 marostegui: Deploy alter table on s7 directly on codfw master (db2029) and let it replicate - T168661
  • 11:48 akosiaris: renumber dubnium fermium meitnerium ununpentium
  • 11:14 elukey: stop eventlogging_sync on db1047 - alter tables running
  • 11:04 jynus: restarting db2062's mysql
  • 10:52 jynus: restarting db2072's mysql for testing of new config
  • 09:05 legoktm@tin: Synchronized php-1.30.0-wmf.7/includes/parser/ParserCache.php: Add debug logging for T168040 (duration: 00m 46s)
  • 08:46 legoktm@tin: Synchronized php-1.30.0-wmf.6/includes/parser/ParserCache.php: Add debug logging for T168040 (duration: 00m 48s)
  • 07:49 jynus: disable puppet on all database hosts for deployment of gerrit:361456
  • 07:33 marostegui: Re-enable event scheduler on dbstore2001 - T168354
  • 07:01 elukey: stop jobrunner/jobchron on mw130[4,5,6] and reboot them for kernel updates
  • 06:43 elukey: stop jobrunner/jobchron on mw130[2,3] and reboot them for kernel updates
  • 06:37 elukey: restart pdfrender.service on scb1003 - xpra race condition
  • 06:35 elukey: executed sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +10 -delete on graphite1001 to free space
  • 06:34 marostegui: Stop Replication in sync on db2033 and dbstore2001 (x1) - T168354
  • 05:55 marostegui: Temporarily disable event scheduler on dbstore2001 - https://phabricator.wikimedia.org/T168354
  • 05:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove comments from db1033 status - T166208 (duration: 00m 47s)
  • 05:24 marostegui: Stop MySQL and reboot db1034 for maintenance - T166208
  • 03:05 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jun 28 03:05:57 UTC 2017 (duration 7m 0s)
  • 02:58 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 14m 50s)
  • 02:46 eileen: Update civicrm from d558df2 to e53d621
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 07m 55s)
  • 01:43 demon@tin: Synchronized README: profiling (duration: 00m 47s)

2017-06-27

  • 23:26 demon@tin: Synchronized php-1.30.0-wmf.6/extensions/RelatedArticles/: Hygene and stuff (duration: 00m 46s)
  • 23:22 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Only enable logging on enwiki for MobileFormatter#moveFirstParagraphBeforeInfobox (duration: 00m 46s)
  • 23:20 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Removing wgMFContentNamespace (duration: 00m 46s)
  • 23:14 demon@tin: Synchronized portals: (no justification provided) (duration: 00m 47s)
  • 23:13 demon@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 47s)
  • 23:05 demon@tin: Synchronized dblists/: ukwikimedia swapped from closed to deleted (duration: 00m 46s)
  • 22:44 demon@tin: Synchronized README: force co-master sync (duration: 00m 47s)
  • 21:58 bblack: pybal restarts on lvs4004,lvs4002 for misc@ulsfo
  • 21:50 bblack: removing cp4001-4 (cache_misc@ulsfo), except a few minor related alerts from race conditions
  • 21:24 bblack: cp1074: restart backend (mailbox lag)
  • 21:03 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.7 refs T167536
  • 20:46 twentyafterfour@tin: Finished scap: sync 1.30.0-wmf.7 and promote to test wikis - refs T167536 (duration: 30m 44s)
  • 20:16 twentyafterfour@tin: Started scap: sync 1.30.0-wmf.7 and promote to test wikis - refs T167536
  • 18:41 godog: switch thumbor back on with a fix for T168949
  • 18:35 godog: upgrade thumbor to 0.1.41
  • 18:25 gehel: reduce cluster_concurrent_rebalance to 8 and node_concurrent_recoveries to 4 on elasticsearch eqiad
  • 18:05 hashar: Some CI jobs are broken with "tidy.so: cannot open shared object file: No such file or directory" see T169004
  • 17:52 twentyafterfour: branching 1.30.0-wmf.7 - T167536
  • 17:44 bblack: restart pybal on lvs4004
  • 16:37 mutante: releases1001 - setting boot parameters to network, rebooting
  • 16:26 mutante: rebooting ganeti instance releases1001 - which is down network-wise but was running
  • 16:23 godog: revert back to imagescalers for thumbs - T168949
  • 16:22 twentyafterfour: restarted apache on iridium, phabricator was running an old version of libphutil
  • 14:22 elukey: stop jobcron/jobrunner on mw1300 and mw1301 and reboot the hosts for kernel updates
  • 13:52 marostegui: Rename table enwiki.localisation_file_hash on db1089 - T119811
  • 12:35 marostegui: Deploy alter table on s4 directly on codfw master (db2019) to let it replicate - T168661
  • 12:19 marostegui: Deploy alter table on s5 directly on codfw master (db2023) to let it replicate - T168661
  • 12:06 elukey: stop jobcron/jobrunner on mw1167 and mw1299 and reboot the hosts for kernel updates
  • 11:58 marostegui: Deploy alter table on s6 directly on codfw master (db2028) to let it replicate - T168661
  • 11:54 elukey: stop nova-spiceproxy and neutron-metadata-agent on labtestnet2001 to avoid root partition to fill up
  • 11:48 akosiaris: upload apertium-spa-cat_2.1.0~r79717-1 to apt.wikimedia.org/jessie-wikimedia/main
  • 11:36 elukey: stop jobcron/jobrunner on mw116[56] and reboot the hosts for kernel updates
  • 11:36 akosiaris: upload apertium-spa_1.1.0~r79716-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main
  • 11:36 akosiaris: upload apertium-cat_2.2.0~r79715-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main
  • 10:29 elukey: stop jobcron/jobrunner on mw116[34] and reboot the hosts for kernel updates
  • 10:25 elukey: re-enabled puppet and eventlogging_sync on db1047
  • 09:49 marostegui: executing alter tables to the log database on dbstore1002 for https://phabricator.wikimedia.org/T167162#3340421
  • 09:43 bawolff@tin: Synchronized php-1.30.0-wmf.6/api.php: Use redirect for api requests with pathinfo (duration: 00m 43s)
  • 09:24 gehel: restart of maps eqiad cluster completed
  • 08:59 elukey: stop puppet and eventlogging_sync on db1047
  • 08:46 elukey: executing alter tables to the log database on db1047 for https://phabricator.wikimedia.org/T167162#3340421
  • 08:44 gehel: reboot maps eqiad cluster
  • 08:33 gehel: restart of maps codfw cluster completed
  • 08:25 akosiaris: upload etherpad-lite_1.6.0-3 to apt.wikimedia.org/jessie-wikimedia/main
  • 08:18 elukey: stop jobcron/jobrunner on mw116[12] and reboot the hosts for kernel updates
  • 08:14 marostegui: Re-enable event scheduler on dbstore2001 - T168354
  • 08:08 godog: roll-restart swift-proxy on ms-fe1* to pick up thumbor changes
  • 07:57 gehel: reboot maps codfw cluster
  • 07:16 marostegui: Temporarily disable event scheduler on dbstore2001 - T168354
  • 07:11 marostegui: Deploy alter table db1034 - T166208
  • 06:48 marostegui: Deploy alter table s7 on labsdb1001 - T166208
  • 06:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1034 - T166208 (duration: 00m 43s)
  • 06:40 marostegui: Deploy alter table s7 - dbstore1002 - T166208
  • 05:58 elukey: restored rdb2004 as slave of rdb2003 (end of experiment)
  • 05:08 marostegui: Global rename of Green Cardamom → GreenC - T168776
  • 05:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1079 - T166208 (duration: 00m 43s)
  • 03:43 mutante: smokeping on stretch means 2.6.11-3 vs 2.6.9-1 we had before
  • 03:35 mutante: smokeping - stop/rsync/fix permissions/start one more time to minimize gaps in graphs - now fully migrated netmon1001->netmon1002, historic data has been copied (T159756)
  • 03:28 mutante: netmon1002 - ganglia apache_status.py broken in stretch (?), ganglia deprecated, stopping gmond, aggregator role got removed, was for torrus
  • 03:03 mutante: netmon1002 - fixing permissions on /var/lib/smokeping rrd files (rsynced, inconstent UIDs )
  • 02:29 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Jun 27 02:29:22 UTC 2017 (duration 6m 25s)
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 07m 46s)
  • 00:39 mutante: netmon1001 - rsyncing smokeping data (/var/lib/smokeping) over to netmon1002

2017-06-26

  • 23:51 maxsem@tin: Synchronized php-1.30.0-wmf.6/extensions/Kartographer/: https://gerrit.wikimedia.org/r/#/c/361584/ (duration: 00m 44s)
  • 23:38 maxsem@tin: Synchronized fonts/: https://gerrit.wikimedia.org/r/361195 (duration: 00m 45s)
  • 23:24 twentyafterfour@tin: Synchronized php-1.30.0-wmf.6/extensions/Scribunto/engines/LuaSandbox/Engine.php: deploy https://gerrit.wikimedia.org/r/#/c/361508 (duration: 00m 43s)
  • 23:23 twentyafterfour: deploying https://gerrit.wikimedia.org/r/#/c/361508
  • 22:56 halfak@tin: Finished deploy [ores/deploy@82dfd56]: Unscheduled/urgent deploy (T168099) (duration: 30m 55s)
  • 22:49 bd808: Updated LDAP loginShell to /bin/bash for 969 accounts that were still set to /usr/local/bin/sillyshell (T86668)
  • 22:34 legoktm@tin: Synchronized php-1.30.0-wmf.6/extensions/Linter/includes/ApiRecordLint.php: Add debug logging for missing 'dsr' - T168900 (duration: 00m 43s)
  • 22:32 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Enable 'Linter' debug log channel (duration: 00m 44s)
  • 22:27 mutante: netmon1001 - deactivate rancid crons - now running on netmon1002 instead - avoid duplicate mails (T159756)
  • 22:25 halfak@tin: Started deploy [ores/deploy@82dfd56]: Unscheduled/urgent deploy (T168099)
  • 21:50 robh: shutting down and decommissioning mw117[0-9] per T168271
  • 21:27 bawolff: deployed patch for T128209
  • 21:00 robh: attempting firmware update on lvs1007, which is currently offline
  • 20:38 bsitzmann@tin: Finished deploy [mobileapps/deploy@07066c7]: Update mobileapps to 0b05026 (duration: 03m 41s)
  • 20:34 bsitzmann@tin: Started deploy [mobileapps/deploy@07066c7]: Update mobileapps to 0b05026
  • 19:56 herron: updated ops list accept_these_nonmembers regex (T168903)
  • 19:41 hashar: Restarted Jenkins to lower console log spam ( https://gerrit.wikimedia.org/r/#/c/359116/ )
  • 19:35 urandom: T160570: Upgrading restbase-dev1003 to Cassandra 3.11.0 (release)
  • 19:30 urandom: T160570: Upgrading restbase-dev1002 to Cassandra 3.11.0 (release)
  • 19:05 mobrovac@tin: Finished deploy [restbase/deploy@3975ab2]: Update Parsoid HTML version to 1.5.0 - T39902 (duration: 06m 16s)
  • 18:59 mobrovac@tin: Started deploy [restbase/deploy@3975ab2]: Update Parsoid HTML version to 1.5.0 - T39902
  • 18:51 arlolra: Updated Parsoid to b59045f2 (T39902, T149794)
  • 18:32 urandom: T160570: Upgrading restbase-dev1001 to Cassandra 3.11.0 (release)
  • 18:31 arlolra@tin: Finished deploy [parsoid/deploy@70538a6]: Updating Parsoid to b59045f2 (duration: 11m 13s)
  • 18:20 arlolra@tin: Started deploy [parsoid/deploy@70538a6]: Updating Parsoid to b59045f2
  • 18:18 niharika29@tin: Finished scap: wmf-config/InitialiseSettings.php Deploy Quiz extension on huwikibooks (https://gerrit.wikimedia.org/r/#/c/361084) (duration: 03m 14s)
  • 18:15 niharika29@tin: Started scap: wmf-config/InitialiseSettings.php Deploy Quiz extension on huwikibooks (https://gerrit.wikimedia.org/r/#/c/361084)
  • 18:14 niharika29@tin: scap failed: RuntimeError scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details) (duration: 02m 15s)
  • 18:14 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:11 niharika29@tin: Started scap: wmf-config/InitialiseSettings.php Deploy Quiz extension on huwikibooks (https://gerrit.wikimedia.org/r/#/c/361084)
  • 17:46 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.6
  • 17:36 twentyafterfour: Deploying 1.30.0-wmf.6 to all wikis refs T167535
  • 17:35 twentyafterfour: resuming the train for wmf.6 which was blocked at group 1
  • 17:12 gehel@tin: Finished deploy [wdqs/wdqs@f8b9294]: (no justification provided) (duration: 03m 42s)
  • 17:09 gehel@tin: Started deploy [wdqs/wdqs@f8b9294]: (no justification provided)
  • 16:59 elukey: EXPERIMENT - T163337 - set slaveof no one on rdb2004 to remove its dependency to rdb2003 (puppet disabled on rdb2004, to rollback just systemctl unmask redis-instance-tcp_6380.service, enable/run puppet and start redis if it is not up)
  • 16:55 elukey: stop neutron-server on labtestnet2001 to avoid the root partition to fill up
  • 15:41 marostegui: Deploy alter table s7 - db1079 - T166208
  • 15:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1079 - T166208 (duration: 00m 46s)
  • 15:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1086 - T166208 (duration: 00m 46s)
  • 14:47 marostegui: Deploy alter table on silver and labtestweb2001 - T168661
  • 13:49 marostegui: Deploy alter table s7 - db1033 - T166208
  • 13:48 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments to db1033 status - T166208 (duration: 00m 48s)
  • 13:08 elukey: truncate /var/log/upstart/neutron-server.log on labtestnet2001 (root filled up, spam in logs for 'ERROR neutron.service OperationalError: (sqlite3.OperationalError) no such table:')
  • 12:58 marostegui: Deploy alter table on db2062 and db2055 - T168661
  • 12:55 elukey: reboot mw129[5,6,7,8] for kernel update (mw imagescalers, two at the time)
  • 12:02 marostegui: Deploy alter table on s2 codfw master (db2017) and let it replicate - T168661
  • 11:05 godog: roll-restart pybal in codfw to pick up thumbor.svc.codfw.wmnet
  • 10:28 elukey: reboot mw1288->90 for kernel updates (last batch of api-appservers)
  • 10:18 elukey: reboot mw128[4,5,6,7] for kernel updates (api-appservers)
  • 10:03 godog: roll-restart nginx on thumbor to disable te: chunked
  • 09:34 elukey: reboot mw128[0,1,2,3] for kernel updates (api-appservers)
  • 09:04 elukey: reboot mw127[6,7,8,9] for kernel updates (api-appservers)
  • 08:58 elukey: reboot mw127[3,4,5] for kernel updates (appservers)
  • 08:50 gehel: starting restart of elasticsearch codfw for kernel upgrade
  • 08:48 elukey: reboot mw1269 -> mw1272 for kernel updates (appservers)
  • 08:37 godog: roll-restart swift-proxy to use thumbor for commons
  • 08:28 elukey: reboot mw1258, 126[6,7,8] for kernel updates (appservers)
  • 08:11 elukey: reboot mw125[4,5,6,7] for kernel updates (appservers)
  • 07:55 marostegui: Stop replication on db1069:3313 (s3) and db1044 in the same position - T166546
  • 07:15 elukey: restart pdfrender on scb1002 for the xpra issue
  • 07:08 elukey: powercycle elastic1017 (stuck in console, no ssh access)
  • 06:57 marostegui: Drop table wikilove_image_log from silver - T127219
  • 06:56 elukey: truncated neutron-server.log files in /var/log on labtestnet2001 to free some space in root
  • 06:55 marostegui: Drop table wikilove_image_log from s1 - T127219
  • 06:51 marostegui: Drop table wikilove_image_log from s3 - T127219
  • 06:50 elukey: execute sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +15 -delete on graphite1001 to free some space for /var/lib/carbon
  • 06:49 marostegui: Drop table wikilove_image_log from s7 - T127219
  • 06:47 marostegui: Drop table wikilove_image_log from s2 - T127219
  • 06:45 marostegui: Drop table wikilove_image_log from s4 - T127219
  • 06:44 marostegui: Drop table wikilove_image_log from s6 - T127219
  • 06:36 marostegui: Deploy alter table s7 - db1086 - T166208
  • 06:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1086 - T166208 (duration: 00m 46s)
  • 06:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove comments from db1041 long running alter status - T166208 (duration: 00m 47s)
  • 03:01 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Jun 26 03:01:35 UTC 2017 (duration 6m 52s)
  • 02:54 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 08m 04s)
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 08m 03s)

2017-06-25

  • 09:00 elukey: Executing 'sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +15 -delete' on graphite1001 to free some space (/var/lib/carbon filling up) - T1075

2017-06-23

  • 23:42 akosiaris: bounce celery-ores-worker on scb1004
  • 19:38 ppchelko@tin: Finished deploy [changeprop/deploy@ffabd13]: Re-enable ORES rules back (duration: 01m 07s)
  • 19:37 ppchelko@tin: Started deploy [changeprop/deploy@ffabd13]: Re-enable ORES rules back
  • 19:34 akosiaris: restart celery-ores-workers on scb1001, scb1002, scb1003, leave scb1004 alone
  • 18:39 godog: roll restart celery-ores-worker in codfw
  • 17:01 mobrovac@tin: Finished deploy [changeprop/deploy@1f45fae]: Temporary disable ORES (ongoing outage) (duration: 01m 19s)
  • 16:59 mobrovac@tin: Started deploy [changeprop/deploy@1f45fae]: Temporary disable ORES (ongoing outage)
  • 16:44 mobrovac: scb1001 disabling puppet
  • 16:34 akosiaris: restart celery ores worker on scb1003
  • 15:54 hashar_: Restarted Jenkins
  • 15:45 godog: bounce celery-ores-worker on scb1001 with logging level INFO
  • 13:51 akosiaris: issue flashdb on oresrdb1001:6379
  • 13:21 akosiaris: issue flashdb on oresrdb1001:6379
  • 13:13 akosiaris: bump uwsgi-ores and celery-ores-worker on scb100*
  • 12:38 akosiaris: disable changeprop due to ORES issues
  • 12:26 Amir1: restarting celery and uwsgi on all scb nodes in eqiad
  • 11:55 Amir1: restarted uwsgi-ores and celery-ores-worker services in scb1003
  • 11:45 ema: scb1001: restart pdfrender.service
  • 09:55 elukey: reboot mw1250-53 for kernel updates
  • 09:27 jynus: reapplying dns change - small downtime on tendril until puppet deploy and run
  • 08:38 jynus: deploying dns change to tendril
  • 06:17 mutante: releases1001 - systemctl reset-failed to clear Icinga systemd status CRIT - service puppet
  • 06:17 marostegui: Deploy alter table on db1041 - s7 - T166208
  • 06:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments to db1041 long running alter status - T166208 (duration: 00m 46s)
  • 06:08 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2066 - T168354 (duration: 00m 46s)
  • 05:59 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026 - T166207 (duration: 00m 47s)
  • 00:15 mutante: RT (ununpentium) installing pending package upgrades

2017-06-22

  • 23:15 Dereckson: kbp.wikipedia wiki creation done.
  • 23:11 dereckson@tin: Synchronized wmf-config/interwiki.php: Add kbp.wikipedia to interwiki map (T160868) (duration: 00m 46s)
  • 23:07 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Add kbp.wikipedia to interwiki map (T160868) (duration: 00m 47s)
  • 22:56 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for kbp.wikipedia (T160868) (duration: 00m 45s)
  • 22:54 dereckson@tin: Synchronized langlist: +kbp (T160868) (duration: 00m 46s)
  • 22:53 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: +kbpwiki (T160868)
  • 22:52 dereckson@tin: Synchronized dblists: (no justification provided) (duration: 00m 48s)
  • 22:51 Dereckson: Create tables for kbpwiki (T160868)
  • 21:43 RainbowSprinkles: gerrit: Stopping momentarily, reindexing accounts
  • 21:03 andrewbogott: restarting rabbitmq-server on labcontrol1001
  • 20:34 mutante: icinga - re-enabling disabled notifications for IPMI temp checks on some mc* and mw* hosts where check is fine and OK
  • 20:21 andrewbogott: labtestnet2001 turning neutron debug logs off because they're flooding the (very small) '/' partition
  • 19:52 twentyafterfour: the train is currently blocked by https://phabricator.wikimedia.org/T168681
  • 19:31 thcipriani@tin: Finished scap: SWAT: Translation updates for QuickSurveys T131949 (duration: 22m 10s)
  • 19:09 thcipriani@tin: Started scap: SWAT: Translation updates for QuickSurveys T131949
  • 19:04 thcipriani@tin: Synchronized wmf-config: SWAT: Create a FeaturedFeed for the Wikimag bulletin on frwiki T168005 (duration: 00m 54s)
  • 18:51 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant the "movefile" right to the "autopatrolled" group on rowiki T168192 (duration: 00m 48s)
  • 18:39 thcipriani@tin: Synchronized php-1.30.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Switch to data-attribute for sister-search sidebar results T164854 (duration: 00m 50s)
  • 18:29 thcipriani@tin: Synchronized wmf-config: SWAT: relatedArticles: SamplingRate -> BucketSize PART II (duration: 00m 48s)
  • 18:27 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: relatedArticles: SamplingRate -> BucketSize PART I (duration: 00m 53s)
  • 18:24 jynus: restart db2062
  • 17:51 jynus: testing in-place upgrade from jessie to stretch of db2062
  • 17:34 bsitzmann@tin: Finished deploy [mobileapps/deploy@7bfe571]: Update mobileapps to 21f771d (duration: 02m 54s)
  • 17:31 bsitzmann@tin: Started deploy [mobileapps/deploy@7bfe571]: Update mobileapps to 21f771d
  • 17:24 gehel: restarting logstash on logstash1001 to validate plugin deplyoment with scap3
  • 17:23 gehel@tin: Finished deploy [logstash/plugins@720b648]: (no justification provided) (duration: 00m 02s)
  • 17:23 gehel@tin: Started deploy [logstash/plugins@720b648]: (no justification provided)
  • 17:14 gehel: moving to scap for logstash plugin deployment
  • 17:13 jynus: disable puppet on db2062 before maintenance
  • 17:05 andrewbogott: rebooting labsdb1007
  • 17:04 bd808: Log events between 15:46 and 17:03 missed due to stashbot downtime
  • 17:03 andrewbogott: rebooting labsdb1007
  • 15:46 moritzm: repooling scb1003 after hardware maintenance
  • 15:31 otto@tin: Finished deploy [eventlogging/analytics@328dea6]: inserting eventlogging events into mysql based on topic name if it exists, falling back to schema name (duration: 00m 03s)
  • 15:31 otto@tin: Started deploy [eventlogging/analytics@328dea6]: inserting eventlogging events into mysql based on topic name if it exists, falling back to schema name
  • 15:21 moritzm: rebooting restbase2005 for kernel update
  • 14:37 gehel: restarting maps-test cluster for kernel upgrade
  • 14:22 gehel: restart wdqs servers completed
  • 13:55 gehel: restart wdqs servers for kernel upgrade
  • 13:45 akosiaris: reboot planet1001 for kernel upgrades and renumbering
  • 13:21 moritzm: rebooting restbase2006 for kernel update
  • 13:09 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Reader Survey using QuickSurveys - T131949 (duration: 01m 04s)
  • 12:17 moritzm: rebooting restbase2008 for kernel update
  • 11:21 moritzm: rebooting ms-be2026 to ms-be2030 for kernel update
  • 11:12 moritzm: rebooting restbase2009 for kernel update
  • 10:45 ema: cp1074: restart varnish backend
  • 10:25 moritzm: rebooting ms-be2022 to ms-be2025 for kernel update
  • 10:19 moritzm: rearmed keyholder on tin
  • 10:12 moritzm: rebooting restbase2010 for kernel update
  • 10:00 moritzm: depooled mw1228, broken disk cause boot failure
  • 09:50 moritzm: rebooting tin for kernel update
  • 09:46 jynus: reimage db2072
  • 09:42 moritzm: powercycling mw1228, stuck in reboot
  • 09:36 akosiaris: rebooting chlorine.eqiad.wmnet etcd1004.eqiad.wmnet etcd1005.eqiad.wmnet mwdebug1002.eqiad.wmnet neon.eqiad.wmnet sca1004.eqiad.wmnet for kernel upgrades
  • 09:25 moritzm: rebooting mw1221-mw1235 for kernel update
  • 09:15 moritzm: rebooting restbase2011 for kernel update
  • 09:11 marostegui: Deploy alter table s5 - labsdb1003 - T166207
  • 09:06 elukey: rebooting kafka100[23] for kernel updates (evenbus eqiad)
  • 09:01 moritzm: rebooting rhenium for kernel update
  • 08:55 marostegui: Stop MySQL and reboot labsdb1011 - T168584
  • 08:50 moritzm: rebooting restbase2012 for kernel update
  • 08:44 marostegui: Stop MySQL and reboot labsdb1010 - T168584
  • 08:40 moritzm: rearmed keyholder on naos
  • 08:32 akosiaris: reboot etcd1002 for kernel upgrades
  • 08:20 moritzm: rebooting naos for kernel update
  • 08:20 marostegui: Stop MySQL and reboot labsdb1009 - T168584
  • 08:07 moritzm: powercycling labtestservices2001 (didn't come up after reboot)
  • 07:26 moritzm: rebooting suhail/subra for kernel update
  • 07:24 elukey: reboot kafka1001 for kernel updates (eventbus eqiad)
  • 07:24 marostegui: Deploy alter table s5 - db1026 - T166207
  • 07:21 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1026 - T166207 (duration: 00m 44s)
  • 07:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 - T166207 (duration: 01m 03s)
  • 07:12 marostegui: Deploy alter table s5 - dbstore1001 - T166207
  • 06:53 moritzm: rebooting mw1205-mw1208 for kernel update
  • 06:38 moritzm: rebooting bast2001 for kernel update
  • 05:34 moritzm: rebooting mw1238-mw1249 for kernel update
  • 05:02 moritzm: rebooting ms-be2015-ms-be2020 for kernel update
  • 03:40 mutante: regarding my last log message: this is just true for stretch! ah!
  • 03:35 mutante: netmon1002 - installed psmisc to have 'killall' - will clean it up, but also suggest we add psmisc to base packages. it provides killall, fuser, pstree...
  • 02:49 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 22 02:49:58 UTC 2017 (duration 6m 53s)
  • 02:43 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 07m 23s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 08m 10s)
  • 00:15 twentyafterfour: finished phabricator deployments
  • 00:13 twentyafterfour: deploying https://phabricator.wikimedia.org/D687

2017-06-21

  • 23:19 twentyafterfour@tin: Synchronized static/images/project-logos/wikimania2017wiki.png: swat (duration: 00m 45s)
  • 23:10 twentyafterfour@tin: Synchronized static/images/project-logos/wikimania2017wiki.png: swat (duration: 00m 45s)
  • 22:38 mutante: new language din.wikipedia.org has been created in DNS - Dinka is a Nilotic dialect cluster spoken by the Dinka people, the major ethnic group of South Sudan. (T168518) - https://en.wikipedia.org/wiki/Dinka_language
  • 22:34 mutante: DNS - authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones to trigger template recreation after edit to langs.tmpl
  • 22:31 chasemp: remove manual 10.64.37.26 definition from eth1 on labstore1005 in /etc/network/interfaces
  • 22:27 chasemp: reboot labstore1004 to reset network config from boot
  • 21:44 RainbowSprinkles: cobalt: updated to 2.13.8-11-gde96955fb2 (T168360, T161206)
  • 21:40 RainbowSprinkles: gerrit2001: updated to 2.13.8-11-gde96955fb2 (T168360, T161206)
  • 21:14 mutante: apt.wm.org - reprepro copy stretch-wikimedia jessie-wikimedia gerrit - make gerrit available in stretch
  • 21:05 mutante: apt.wm.org - reprepro, include gerrit_2.13.8+git1-wmf.6 for jessie-wikimedia
  • 21:02 mutante: install1002 - rsynced gerrit packages from copper, closed firewall again, cleaned up rsyncd config from old unused things
  • 20:57 arlolra: Updated Parsoid to 881ade32 (T127421, T167933, T167714)
  • 20:50 mutante: install1002 - allow rsync from copper (build host) to /srv/wikimedia/incoming , temp for package upload
  • 20:49 arlolra@tin: Finished deploy [parsoid/deploy@2c4c0de]: Updating Parsoid to 881ade32 (duration: 12m 02s)
  • 20:37 arlolra@tin: Started deploy [parsoid/deploy@2c4c0de]: Updating Parsoid to 881ade32
  • 20:35 mutante: install1002 - removing rsyncd config fragments from carbon migration, running puppet
  • 20:25 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.6
  • 20:15 bearND: rolled back deploy since scap could not connect to scb1003
  • 20:14 twentyafterfour@tin: Synchronized php-1.30.0-wmf.6/includes/gallery/ImageGalleryBase.php: deploy https://gerrit.wikimedia.org/r/#/c/360695/ refs T168479 to unblock the train (duration: 00m 56s)
  • 20:13 bsitzmann@tin: Finished deploy [mobileapps/deploy@7bfe571]: Update mobileapps to 21f771d (duration: 08m 43s)
  • 20:13 andrewbogott: deleting the old IAD wikitech-static server so we stop paying rackspace for it
  • 20:11 ppchelko@tin: Finished deploy [changeprop/deploy@63e6a7b]: Actually start black-listing and rate-limiting articles. T161710 (duration: 01m 16s)
  • 20:09 ppchelko@tin: Started deploy [changeprop/deploy@63e6a7b]: Actually start black-listing and rate-limiting articles. T161710
  • 20:04 bsitzmann@tin: Started deploy [mobileapps/deploy@7bfe571]: Update mobileapps to 21f771d
  • 19:41 mutante: copper: building gerrit_2.13.8+git1-wmf.6 for stretch (experimental)
  • 19:39 mutante: copper: building gerrit_2.13.8+git1-wmf.6 for jessie
  • 19:30 twentyafterfour: The train for wmf.6 (T167535) is currently blocked by T168479
  • 19:13 madhuvishy: Rebooting labstore1004 (secondary in drbd pair)
  • 18:55 andrewbogott: rebooting labnet1001, which will cause a labs-wide network outage
  • 18:46 gehel: restarting wdqs-updater on all wdqs servers
  • 18:38 krinkle@tin: Synchronized static/images/: I737e6f9fce (duration: 00m 46s)
  • 18:20 gehel@tin: Finished deploy [wdqs/wdqs@d67d4a4]: (no justification provided) (duration: 01m 50s)
  • 18:18 gehel@tin: Started deploy [wdqs/wdqs@d67d4a4]: (no justification provided)
  • 18:17 gehel: deploying wdqs to fix missing lib
  • 18:14 andrewbogott: rebooting labnet1002
  • 18:09 andrewbogott: rebooting labnodepool1001
  • 18:02 andrewbogott: rebooting labcontrol1001
  • 18:02 andrewbogott: rebooting labservices1001
  • 18:02 andrewbogott: rebooting silver
  • 18:02 andrewbogott: rebooting californium
  • 17:59 andrewbogott: rebooting labservices1001
  • 17:58 andrewbogott: disabling the openstack scheduler so that we don't get new inconsistent VMs during some reboots
  • 17:53 andrewbogott: rebooting labcontrol1002
  • 17:53 andrewbogott: rebooting labservices1002
  • 17:37 twentyafterfour: phabricator is back online
  • 17:36 andrewbogott: rebooting labvirt1013
  • 17:35 herron: iridium - upgraded exim packages and rebooted to apply kernel upgrade
  • 17:35 ottomata: beginning reboots of kafka10(14|18|20|22) for kernel upgrade
  • 17:34 twentyafterfour: phabricator will be offline momentarily while iridium reboots
  • 17:25 andrewbogott: rebooting labvirt1012
  • 17:12 andrewbogott: rebooting labvirt1011
  • 16:59 andrewbogott: rebooting labvirt1010
  • 16:57 herron: reboot fermium (lists) for kernel upgrade
  • 16:42 andrewbogott: rebooting labvirt1009
  • 16:41 moritzm: rebooting video scalers in codfw for kernel update
  • 16:35 moritzm: rebooting mw1293/mw1294 for kernel update
  • 16:32 andrewbogott: rebooting labvirt1008
  • 15:53 godog: upgrade ms-be10[31-39] to swift 2.10
  • 15:46 ema: reboot lvs[4001-4002] (ulsfo primaries) for kernel update
  • 15:45 moritzm: upgrade ms-be2013/ms-be2014 to final stretch release and reboot for kernel update
  • 15:34 ema: reboot lvs[4003-4004] (ulsfo secondaries) for kernel update
  • 15:32 moritzm: reboot image scalers in codfw for kernel update
  • 15:32 andrewbogott: rebooting labvirt1007
  • 15:13 andrewbogott: rebooting labvirt1006
  • 15:04 moritzm: rebooting ruthenium for kernel update
  • 15:01 moritzm: reboot job runners in codfw for kernel update
  • 15:01 elukey: reboot kafka200[23] for kernel updates (eventbus codfw)
  • 14:53 andrewbogott: rebooting labvirt1005
  • 14:40 moritzm: reboot remaining scb* hosts for kernel update
  • 14:38 andrewbogott: rebooting labvirt1004
  • 14:32 ema: reboot lvs[3001-3002] (esams primaries) for kernel update
  • 14:25 andrewbogott: rebooting labvirt1003
  • 14:21 andrewbogott: rebooting labvirt1002
  • 14:18 herron: rebooting mx1001 for kernel upgrade
  • 14:08 ema: reboot lvs[3003-3004] (esams secondaries) for kernel update
  • 14:03 elukey: reboot eventlog2001 for kernel update
  • 14:02 andrewbogott: rebooting labvirt1001
  • 14:01 gehel: restarting wdqs1001 for kernel upgrade
  • 14:01 godog: reimage ms-be1020 / ms-be1021 with stretch
  • 13:52 gehel: install analysis-kuromoji plugin on relforge
  • 13:52 herron: install exim security updates on fermium (lists)
  • 13:51 elukey: rebooting eventlog1001 for kernel update (eventlogging host)
  • 13:50 moritzm: pruning old kernels on prometheus*
  • 13:48 addshore@tin: Synchronized php-1.30.0-wmf.6/extensions/RevisionSlider/modules/ext.RevisionSlider.SliderView.js: SWAT: Fix errors leading to wrong slider scroll postions T168299 (duration: 00m 44s)
  • 13:47 addshore@tin: Synchronized php-1.30.0-wmf.5/extensions/RevisionSlider/modules/ext.RevisionSlider.SliderView.js: SWAT: Fix errors leading to wrong slider scroll postions T168299 (duration: 00m 46s)
  • 13:44 elukey: reboot aqs100[89] for kernel updates
  • 13:39 ema: reboot lvs[2001-2003] (codfw primaries) for kernel update
  • 13:29 elukey: reboot aqs1007 for kernel update
  • 13:22 marostegui: Deploy alter table on s7 - directly on codfw master (db2029) - this will generate lag on codfw - T166208
  • 13:21 elukey: reboot kafka1013 for kernel updates
  • 13:16 marostegui: Deploy alter table s5 - labsdb1001 - T166207
  • 13:15 marostegui: Deploy alter table s5 - db1045 - T166207
  • 13:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1045 - T166207 (duration: 00m 44s)
  • 13:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 - T166207 (duration: 00m 46s)
  • 13:05 elukey: reboot analytics1003 (Hue, Camus, Oozie, Hive master) for kernel upgrade
  • 12:32 gehel: deploying T167871 and restarting kartotherian / tilerator on maps eqiad
  • 12:32 moritzm: rebooting mw1189-mw1199 for kernel update
  • 12:10 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=sca1004.eqiad.wmnet
  • 12:09 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet
  • 11:59 moritzm: rebooting mw1209-mw1220 for kernel update
  • 11:45 moritzm: rebooting mediawiki api servers in codfw for kernel update
  • 11:42 akosiaris: rollback change in asw-a-eqiad for ganeti interface range due to alerts
  • 11:23 akosiaris: reboot ganeti1007 for insertion into ganeti cluster
  • 11:14 elukey: reboot aqs1006 for kernel update
  • 11:04 moritzm: rebooting mw1180-mw1188 for kernel update
  • 11:02 akosiaris: starting up all instances on ganeti01.svc.codfw.wmnet
  • 11:01 godog: reimage ms-be1018 / 1019 with stretch
  • 10:58 ema: reboot lvs[2004-2006] (codfw secondaries) for kernel update
  • 10:50 akosiaris: rebooting all ganeti200X nodes
  • 10:47 akosiaris: shutdown all VMs on the ganeti01.svc.codfw.wmnet cluster
  • 10:43 elukey: reboot analytics1001 (Hadoop master) for kernel update
  • 10:35 akosiaris: rebooting the entire codfw ganeti cluster for kernel upgrades. Silenced hosts in icinga already. T167643
  • 10:30 moritzm: rebooting bast4001 for kernel update
  • 10:21 ema: reboot lvs[1001-1003] (eqiad primaries) for kernel update
  • 10:17 elukey: running a script in tmux on rdb[12]003 called "check" to dump periodically LLEN enwiki:jobqueue:enqueue:l-unclaimed and stopped the one on rdb2004
  • 10:07 ema: reboot lvs[1004-1006] (eqiad secondaries) for kernel update
  • 10:01 elukey: reboot analytics1002 (Hadoop master standby) for kernel update
  • 10:01 moritzm: rebooting auth* servers for kernel update
  • 09:48 ema: reboot lvs[1010-1012] for kernel update
  • 09:48 elukey: reboot aqs1005 for kernel update
  • 09:10 elukey: reboot kafka2001 for kernel update (eventbus codfw)
  • 09:06 moritzm: rebooting restbase1017 for kernel update
  • 08:52 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=restbase2001.codfw.wmnet,dc=codfw,service=restbase
  • 08:49 _joe_: correction: restarting pybal
  • 08:49 _joe_: restarting etcd on lvs2003/2006, connection lost to etcd
  • 08:34 elukey: reboot kafka1012 for kernel upgrades
  • 08:34 marostegui: Deploy alter table db1070 s5 - T166207
  • 08:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 - T166207 (duration: 00m 44s)
  • 08:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1082 - T166207 (duration: 00m 45s)
  • 08:26 godog: reimage ms-be1014 / 1015 with jessie
  • 07:37 marostegui: Stop and reset slave s5 on dbstore2001 - T168354
  • 06:23 mutante: planet2001 wget missing unpuppetized logo file from https://en.planet.wikimedia.org/images/planet-wm2.png - should fix puppet run
  • 06:19 marostegui: Stop replication and puppet on db2066 for maintenance - T168354
  • 06:18 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2066 - T168354 (duration: 00m 43s)
  • 06:08 elukey: reboot thorium for kernel upgrades (outage to all the analytics websites)
  • 06:05 marostegui: Deploy alter table s5 - db1082 - T166207
  • 06:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1082 - T166207 (duration: 00m 44s)
  • 06:04 marostegui: Deploy alter table s5 - dbstore1002 - T166207
  • 05:59 elukey: reboot stat100[2,3,4] for kernel upgrades
  • 05:57 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1087 - T166207 (duration: 00m 44s)
  • 05:54 marostegui: Deploy alter table s5 - labsdb1011 - T166207
  • 05:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1021 - T166205 (duration: 01m 00s)
  • 05:41 marostegui: Start relearn BBU cycle on db1016 - T166344
  • 03:13 mutante: planet - copying HTML files from docroot from planet1001 to planet2001 - (don't serve Debian default page)
  • 03:03 mutante: planet1001 - remove/purge all php5* packages
  • 02:57 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jun 21 02:57:19 UTC 2017 (duration 6m 41s)
  • 02:50 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 06m 06s)
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 06m 52s)
  • 01:45 mutante: planet1001 - remove php5 package
  • 00:34 mutante: planet2001 - revoke old puppet cert, salt-key, re-add new cert/key after reinstall
  • 00:24 mutante: planet2001 - scheduled downtime, reinstall with stretch
  • 00:06 mutante: tin (deployment): manually remove l10nupdate cron, let puppet re-create it after gerrit:350749. stops l10nupdate cron from running on weekends. naos didn't need an action. (T164035).

2017-06-20

  • 23:06 aude@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Remove temp wiktionary site link settings (duration: 00m 43s)
  • 23:05 aude@tin: Synchronized wmf-config/Wikibase-labs.php: Remove temp wiktionary site link settings (duration: 00m 44s)
  • 23:03 aude@tin: Synchronized wmf-config/Wikibase-production.php: Remove temp wiktionary site link settings for test wikidata (duration: 00m 43s)
  • 22:59 aude@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase (phase 1) on Wiktionary wikis (duration: 00m 44s)
  • 22:49 aude: created wbc_entity_usage table and updated sites table on wiktionary wikis
  • 21:36 legoktm@tin: Synchronized wmf-config: touch (duration: 00m 45s)
  • 21:29 arlolra@tin: Started restart [parsoid/deploy@4b60bf9]: (no justification provided)
  • 21:17 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to all wikis (try #2) - T148609 (duration: 00m 44s)
  • 21:17 andrewbogott: rebooting labvirt1014 as practice for tomorrow's security reboots
  • 21:13 mutante: labtestpuppetmaster2001 - install-console, activate puppet, sign cert, initial puppet run, add salt key (T167157)
  • 20:54 twentyafterfour: Finished train deployment for group0, train will resume tomorrow as scheduled.
  • 20:53 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: Group0 to 1.30.0-wmf.6 refs T167535
  • 20:44 twentyafterfour@tin: Synchronized php-1.30.0-wmf.6/includes/changes/EnhancedChangesList.php: deploy bad7bde refs T167535 (duration: 00m 53s)
  • 20:37 twentyafterfour@tin: Finished scap: sync 1.30.0-wmf.6 refs T167535 (duration: 29m 16s)
  • 20:08 twentyafterfour@tin: Started scap: sync 1.30.0-wmf.6 refs T167535
  • 19:17 twentyafterfour: Prepping 1.30.0-wmf.6 - T167535
  • 18:09 mutante: netmon1002 - arm keyholder with rancid key
  • 18:06 ema: route ulsfo back to codfw T167274
  • 18:02 chasemp: ssh labsdb101[0|1].eqiad.wmnet 'sudo maintain-meta_p --all-databases --debug'
  • 17:53 mutante: cobalt (gerrit) - re-enabling puppet, running it. nothing should change, the system unit file mentioned in T168360#3362314 does not get installed by puppet, it comes from the deb
  • 17:49 subbu: Since arlolra noticed some unexpected warnings from the canaries, the Parsoid deploy was rolled back, so Parsoid was not updated to e2e2b5f6 (contrary to what scap said above).
  • 17:48 gehel@tin: Finished deploy [wdqs/wdqs@b60d224]: (no justification provided) (duration: 01m 41s)
  • 17:47 XioNoX: repool codfw - T167274
  • 17:46 gehel@tin: Started deploy [wdqs/wdqs@b60d224]: (no justification provided)
  • 17:45 gehel: deploying wdqs blazegraph and GUI updates
  • 17:43 mutante: RT - ununpentium - upgraded rt4-db-mysql
  • 17:42 arlolra@tin: Finished deploy [parsoid/deploy@4b60bf9]: Updating Parsoid to e2e2b5f6 (duration: 07m 57s)
  • 17:40 mutante: mwreleases1001 - puppet node clean, puppet node deactivate - was reinstalled as releases1001
  • 17:34 arlolra@tin: Started deploy [parsoid/deploy@4b60bf9]: Updating Parsoid to e2e2b5f6
  • 17:29 elukey: running a script in tmux on rdb200[34] called "check" to dump periodically LLEN enwiki:jobqueue:enqueue:l-unclaimed
  • 17:21 elukey: restart redis-instance-tcp_6380.service on rdb2003 to force sync with its master
  • 17:16 elukey: restart redis-instance-tcp_6380.service on rdb2004 to force sync with its master
  • 17:04 XioNoX: re-enable igmp-snooping on asw-d-codfw
  • 17:01 bd808: Ran maintain-meta_p --all-databases on labsdb1003
  • 16:55 bd808: Ran maintain-meta_p --all-databases on labsdb1001
  • 16:53 paravoid: updating the d-i image for stretch in puppet volatile
  • 16:09 chasemp: openstack server delete admin-monitoring openstack project instances (we have leaked 7)
  • 16:05 elukey: reboot kafka1013 for kernel upgrade
  • 15:08 XioNoX: starting asw-d-codfw switch upgrade - T167274
  • 14:47 elukey: rolling restart of druid100[123] for kernel upgrades
  • 14:32 XioNoX: depooled codfw - T167274
  • 14:27 moritzm: rebooting scb1001 for kernel update
  • 14:17 hashar: CI is fully backup (following reboot of contint1001 / labnodepool1001 )
  • 14:16 hashar: Upgraded Jenkins plugins
  • 14:05 hashar: Starting Jenkins on contint1001
  • 14:05 elukey: reboot kafka2001 for kernel upgrade
  • 14:02 hashar: Rebooting contint1001
  • 14:00 hashar: Stopping Nodepool service to prevent new builds
  • 13:55 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1087 - T166207 (duration: 01m 41s)
  • 13:55 marostegui: Deploy alter table db1087 - s5 - T166207
  • 13:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1071 - T166207 (duration: 00m 41s)
  • 13:44 aude@tin: Synchronized wmf-config/Wikibase-production.php: Enable Wiktionary site links on test.wikidata (duration: 00m 43s)
  • 13:42 _joe_: manually started nrpe on ms-be1016
  • 13:39 marostegui: Deploy alter table on db1049 - s5 - T166207
  • 13:39 moritzm: rebooting labnodepool1001 for kernel update
  • 13:37 hashar: Restarting Jenkins
  • 13:36 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=sca1004.eqiad.wmnet
  • 13:36 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=mwdebug1002.eqiad.wmnet
  • 13:33 godog: pool thumbor100[34] into service - T168297
  • 13:26 marostegui: Deploy alter table labsdb1010 - s5 - T166207
  • 13:14 moritzm: rebooting restbase staging cluster (cerium/praseodymium/xenon) for kernel update
  • 12:09 gehel: starting cluster restart elasticsearch eqiad
  • 12:00 elukey: reboot analytics1029 -> analytics1069 for kernel upgrades (Hadoop worker nodes)
  • 11:36 moritzm: installing libgcrypt security updates
  • 11:29 moritzm: rebooting mediawiki app servers in codfw for kernel update
  • 11:13 akosiaris: renumber sca1004, mwdebug1002. Downtime should be a few minutes
  • 11:08 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet
  • 10:56 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=sca1004.eqiad.wmnet
  • 10:07 moritzm: rebooting mwdebug servers for kernel update
  • 10:03 elukey: reboot kafka1012, analytics1028, aqs1004 for kernel upgrades (canary hosts)
  • 10:00 godog: reimage ms-be1016 with stretch
  • 09:53 godog: reset ms-be1014 idrac via ipmitool
  • 09:46 moritzm: rebooting app server canaries for kernel update
  • 09:40 godog: roll-restart thumbor to increase swift timeout
  • 09:29 marostegui: Rename table on db1089 enwiki.wikilove_image_log - T127219
  • 08:46 marostegui: Drop table titlekey from s1 - T164949
  • 08:35 godog: roll restart swift-proxy on ms-fe* to pick up thumbor changes
  • 08:30 _joe_: restarting gerrit T168360
  • 08:25 _joe_: manually patching gerrit's systemd unit file to allow more open files
  • 08:22 marostegui: Drop table titlekey from s3 - T164949
  • 08:15 marostegui: Drop table titlekey from s4 - T164949
  • 08:06 marostegui: Drop table titlekey from s7 - https://phabricator.wikimedia.org/T164949
  • 07:45 marostegui: Drop table titlekey from s5 - T164949
  • 07:35 gehel: restarting elastic1017 to validate upgrades
  • 07:27 marostegui: kill alter table on enwiki.revision db1047 after running for 13 days - T166452
  • 07:23 moritzm: installing glibc security updates
  • 07:22 marostegui: Stop MySQL dbstore2001 for maintenance - T168354
  • 07:20 marostegui: Deploy alter table s5 - db1071 - T166207
  • 07:10 marostegui: Deploy alter table s5 - db1095 - T166207
  • 06:57 moritzm: install remaining exim security updates

2017-06-19

  • 23:38 andrewbogott: are we logging?
  • 23:35 legoktm: legoktm@tin: Synchronized static/images/project-logos/: Upload logos for the Dinka Wikipedia (duration: 00m 42s)
  • 22:45 andrewbogott: removed some big dirs from /home/ori on install1002
  • 22:30 andrewbogott: find /srv/carbon/whisper/archived_metrics -mtime +730 -type f -delete on labmon1001
  • afk: Added non-voting operations-puppet-tests-docker job for operations/puppet repo, should (hopefully) be fast, and will timeout after 1 minute if it's not. More info https://gerrit.wikimedia.org/r/#/c/360091/ + T166888
  • afk: updated payments-wiki from 7a50542 to 8bdd706
  • 19:39 mepps: correction: updated civicrm from dfc26f0 to d558df2
  • 19:28 mepps: updated from dfc26f0 to d558df2
  • 18:35 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 41s)
  • 18:34 reedy@tin: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 00m 42s)
  • 18:29 reedy@tin: Synchronized wmf-config/abusefilter.php: (no justification provided) (duration: 00m 41s)
  • 18:21 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: logos (duration: 00m 41s)
  • 18:20 reedy@tin: Synchronized static/favicon/wmf.ico: (no justification provided) (duration: 00m 41s)
  • 18:19 reedy@tin: Synchronized wmf-config/flaggedrevs.php: Remove old setting that does nothing (duration: 00m 41s)
  • 18:18 reedy@tin: Synchronized static/images: (no justification provided) (duration: 00m 41s)
  • 18:10 reedy@tin: Synchronized dblists/securepollglobal.dblist: (no justification provided) (duration: 00m 41s)
  • 18:02 reedy@tin: Synchronized wmf-config/InterwikiSortOrders.php: Add atjwiki (duration: 00m 41s)
  • 17:42 ejegg: updated fundraising tools from 585f546 to 457bddb
  • 17:21 moritzm: installing exim4 security updates
  • 15:48 moritzm: uploaded linux-meta_1.13 to apt.wikimedia.org (with this update the linux-meta package now also defaults to 4.9 (previously 4.4))
  • 15:47 moritzm: uploaded linux_4.9.25-1~bpo8+3 to apt.wikimedia.org
  • 15:25 volans: installed python-setuptools-scm on copper
  • 15:16 marostegui: Deploy alter table labsdb1009 - T166207
  • 15:12 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=chlorine.eqiad.wmnet
  • 15:03 mobrovac: restbase restbase2001 is out of rotation, performing experiments with the new cassandra driver v3.2.2 which seems to be causing problems only in production
  • 14:59 godog: cold reset ms-be1013 drac
  • 14:53 gehel: pausing cluster restart of elasticsearch eqiad
  • 14:24 godog: roll-upgrade swift to 2.10 on ms-be10[22-30] - T162609
  • 14:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 - T153743 (duration: 00m 41s)
  • 14:06 gehel: starting cluster restart on elasticsearch / cirrus / eqiad for ltr plugin deployment
  • 14:01 gehel: restarting elasticsearch / relforge for ltr plugin deployment
  • 13:58 gehel: remove decommissioned nodes from redis / trebuchet for elasticsearch/plugins
  • 13:48 gehel: deploying latest elasticsearch plugin (ltr plugin)
  • 13:48 moritzm: fixing salt minion setup on wtp1047
  • 13:44 hashar: European SWAT completed
  • 13:44 aude@tin: Synchronized wmf-config/Wikibase.php: Remove old constraints section config (duration: 00m 41s)
  • 13:42 aude@tin: Synchronized wmf-config/Wikibase-production.php: Add constraints section to property pages on test.wikidata (duration: 00m 41s)
  • 13:29 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: [cleanup] remove old interwiki search config (duration: 00m 41s)
  • 13:28 dcausse@tin: Synchronized wmf-config/CirrusSearch-labs.php: [cleanup] remove old interwiki search config (duration: 00m 41s)
  • 13:21 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Enable OOjs UI buttons on EditPage for plwiki - T162849 (duration: 00m 42s)
  • 13:08 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Add sandbox link for dtywiki - T168038 (duration: 00m 42s)
  • 12:54 dcausse: restarting elasticsearch on relforge1* to pickup new snapshot of the ltr plugin
  • 12:36 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=chlorine.eqiad.wmnet
  • 12:04 elukey: run 'echo "autoLearnMode=1" > /tmp/disable_learn && megacli -AdpBbuCmd -SetBbuProperties -f /tmp/disable_learn -a0' on all the analytics workers to disable BBU Auto learn - T167809
  • 11:33 marostegui: Rename user Smuconlaw → Sgconlaw - T168109
  • 11:31 jynus: restarting replication on dbstore1002:s3 and db1015
  • 11:19 moritzm: rebooting cp3007 for kernel update
  • 11:01 _joe_: depooling mw1170-mw1179 for decommissioning, T168271
  • 10:15 godog: roll-upgrade swift to 2.10 on to ms-fe1* - T162609
  • 09:56 akosiaris: migrate neon.eqiad.wmnet to ganeti01.svc.eqiad.wmnet's row_A nodegroup
  • 09:55 dcausse: restarting elasticsearch on relforge1* to pickup new snapshot of the ltr plugin
  • 09:33 jynus: temporarily stop dbstore1002:s3 and db1015 to fix srwiki
  • 09:30 marostegui: Deploy alter table on s2 - dbstore1001 - T166205
  • 09:18 godog: swift eqiad-prod: remove ms-be1001 - ms-be1012 - T166489
  • 09:13 paravoid: rebooting achernar to address CPU throttling and apply the BIOS update
  • 09:11 paravoid: upgrading achernar's BIOS from 1.2.4 to 2.4.2 hoping it will address recurring CPU throttling issue (T162850)
  • 09:07 akosiaris: restart ircecho on einsteinium, was not notifying due to a thrown exception
  • 08:35 marostegui: Drop table title key from s2 - T164949
  • 08:16 marostegui: Drop table titlekey on s6 - T164949
  • 07:59 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool pc2004,5,6 after maintenance (duration: 00m 41s)
  • 07:42 moritzm: restarting app server canaries to pick up gnutls update
  • 07:13 marostegui: Reboot ms-be1010
  • 07:10 marostegui: Deploy alter table s5 - codfw master - db2023 (and will replicate) so this will generate lag on codfw slaves - T166207
  • 07:09 jynus: upgrade, reboot and clear data on pc2006
  • 07:05 jynus: upgrade, reboot and clear data on pc2005
  • 07:03 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool pc2005 & pc2006 (duration: 00m 41s)
  • 06:58 moritzm: installing gnutls security updates
  • 06:38 marostegui: Deploy alter table s2 - labsdb1001 - T166205
  • 06:37 jynus: force learning cycle to db1046 controller T166141
  • 06:23 marostegui: Deploy alter table on s2 - db1021 - T166205
  • 06:18 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1021 - T166205 (duration: 00m 41s)
  • 04:21 reedy@tin: Synchronized composer.lock: update (duration: 00m 41s)
  • 04:20 reedy@tin: Synchronized composer.json: update (duration: 00m 41s)
  • 04:19 reedy@tin: Synchronized multiversion/vendor/: Update! (duration: 01m 05s)
  • 04:05 reedy@tin: Synchronized wmf-config/CommonSettings.php: Fix comments minor code style (duration: 00m 42s)
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Jun 19 02:26:06 UTC 2017 (duration 6m 8s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 04s)

2017-06-18

  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Jun 18 02:25:55 UTC 2017 (duration 6m 8s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 27s)

2017-06-17

  • 19:30 ebernhardson: restarting elasticsearch on relforge to pick up new vrsion of ltr-query
  • 16:51 volans: restarted pdfrender on scb200[2,4] T159922
  • 15:26 jynus: rebuild pc2004's (depooled) data from scratch
  • 02:29 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Jun 17 02:29:51 UTC 2017 (duration 6m 8s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 09s)

2017-06-16

  • 19:54 Reedy: disabled cluster 2fa for Chrissymad for T168064 (confirmed by email)
  • 19:26 ejegg: re-enabled paypal audit download and parse job
  • 19:13 ebernhardson: restarting elasticesarch on relforge to pick up new ltr-query plugin version
  • 18:14 mutante: ms-be1001: did not change config, tried again, now detected 13 drives again, coming back
  • 18:10 mutante: ms-be1001 - The following VDs are missing: 09
  • 18:08 mutante: ms-be1001 - powercycling crashed server - "[14076481.245487] general protection fault: 0000 [#4] SMP
  • 13:36 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove comments from db1018 current status - T166205 (duration: 00m 41s)
  • 13:26 twentyafterfour: fixed phabricator "upgrade database" error.
  • 13:20 twentyafterfour: fixing phab database migrations
  • 13:01 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1091 after performance testing (duration: 00m 41s)
  • 10:18 jynus: running analyze on db1091 (depooled), may create lag
  • 10:11 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1091 for performance testing (duration: 00m 42s)
  • 09:52 moritzm: installing guile security updates
  • 09:13 moritzm: re-enabled puppet on mw2129 (no reason was given why it was disabled(
  • 08:50 jynus: bringing down pc1005 and pc1006 for maintenance T167567
  • 08:40 jynus@tin: Synchronized wmf-config/db-codfw.php: Add db1099 and db1001 hosts to config (duration: 00m 41s)
  • 08:23 jynus@tin: Synchronized wmf-config/db-eqiad.php: Switchover pc1005 and pc1006 to db1099 and db1001 (duration: 00m 45s)
  • 08:20 jynus: about to swithover pc1005 and pc1006 to db1099 and db1001
  • 05:45 ebernhardson: increase enwiki_content replicas on codfw from 2 to 3 to match eqiad
  • 02:37 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Jun 16 02:37:05 UTC 2017 (duration 6m 25s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 02s)

2017-06-15

  • 23:37 mutante: added stretch support for jenkins (https://gerrit.wikimedia.org/r/#/c/359227/, https://gerrit.wikimedia.org/r/#/c/359356/) | 'reprepro copy stretch-wikimedia jessie-wikimedia jenkins' to make .deb available on stretch | releases1001 now running jenkins , icinga recovered | (hashar) (T164030)
  • 23:30 mutante: APT - reprepro copy strech-wikimedia jessie-wikimedia jenkins (copy existing jenkins package to stretch, it can be used on both)
  • 23:18 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: T166408: Remove dead config variable MinervaPrintStyles (duration: 00m 41s)
  • 23:15 ebernhardson@tin: Finished scap: wmf-config Scap: T162276: Enable crossproject search (duration: 03m 37s)
  • 23:11 ebernhardson@tin: Started scap: wmf-config Scap: T162276: Enable crossproject search
  • 23:10 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: Scap: T162276: Enable crossproject search (duration: 00m 51s)
  • 22:59 mutante: mw2251 - repooled
  • 22:56 mutante: mw2251 - scap pull
  • 22:53 ebernhardson: restarting elasticsearch on relforge to pickup new ltr-query plugin
  • 22:30 ejegg: updated DjangoBannerStats from 9e6b117 to 5963e7c
  • 22:02 volans: restarted pdfrender on scb1001 T159922
  • 21:45 mutante: powercycling mw2251 (frozen console)
  • 21:39 volans@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2251.codfw.wmnet
  • 21:37 volans: re-enabled puppet and force run to re-enable ircecho on einstenium
  • 21:29 demon@tin: Finished scap: Removing Cards extension (duration: 21m 49s)
  • 21:08 demon@tin: Started scap: Removing Cards extension
  • 20:57 mutante: upgrading RT (request tracker)
  • 19:35 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.5
  • 19:22 ladsgroup@tin: Finished deploy [ores/deploy@ab88a74]: Deploying gerrit:359224/1 for missing config variables (duration: 24m 15s)
  • 19:17 XioNoX: Re-enabled link between cr2-codfw and cr1-eqdfw - T167261
  • 18:58 ladsgroup@tin: Started deploy [ores/deploy@ab88a74]: Deploying gerrit:359224/1 for missing config variables
  • 18:44 paravoid: restarting all puppetmasters
  • 18:40 paravoid: temporarily stopping icinga-wm
  • 18:27 demon@tin: Synchronized wmf-config/CirrusSearch-common.php: Remove quirks and enable token_count_router thingie (duration: 00m 44s)
  • 18:16 demon@tin: Synchronized php-1.30.0-wmf.5/includes/libs/objectcache/MultiWriteBagOStuff.php: T167465 (duration: 00m 44s)
  • 18:14 demon@tin: Synchronized wmf-config/InitialiseSettings.php: T167617 (duration: 00m 44s)
  • 18:12 demon@tin: Synchronized wmf-config/FeaturedFeedsWMF.php: T167617 (duration: 00m 44s)
  • 17:50 mutante: install2002 - re-enabled puppet, reverted live hack, back to normal (issue seems to be NIC or other)
  • 17:28 mutante: install2002 - temp disabling puppet and applying hot fix to debug install issue for papaul
  • 17:27 bblack: disabling puppet on cp*wmnet to avoid puppet races on https://gerrit.wikimedia.org/r/#/c/341729 merge
  • 14:39 gehel: killing stuck replication on maps1001
  • 14:38 krinkle@tin: Synchronized wmf-config/CommonSettings.php: no-op Ifc7b1ea80 - Remove EtcdConfig from beta (duration: 00m 45s)
  • 13:24 gehel: elasticsearch upgrade to 5.3.2 on relforge cluster completed, cluster still recovering - T163708
  • 13:23 aude@tin: Synchronized wmf-config/Wikibase.php: Add constraints statements section on Wikidata T167126 (duration: 00m 43s)
  • 13:19 dcausse: [cirrus] reindexing all zh wikis (eqiad & codfw)
  • 13:14 aude@tin: Synchronized wmf-config/InitialiseSettings.php: Enable BM25 for Chinese wikis (duration: 00m 44s)
  • 13:13 aude@tin: Synchronized tests/cirrusTest.php: (no justification provided) (duration: 00m 45s)
  • 13:02 gehel: starting elasticsearch upgrade to 5.3.2 on relforge cluster - T163708
  • 12:14 gehel: restart elasticsearch on relforge1001 to validate latest config changes
  • 10:16 moritzm: rollout remaining systemd updates from jessie point release
  • 09:14 jynus: shutting down and deleting data at pc1004 for cloning from db1096
  • 09:10 hashar: Jenkins back up and happy.
  • 09:05 moritzm: reenable puppet on notebook1002, was disabled for the merge of the zookeeper role refactor two days ago, can be re-enabled now
  • 09:04 hashar: Restarting Jenkins. It seems I managed to deadlock it
  • 08:52 ariel@tin: Finished deploy [dumps/dumps@1734c6d]: history dump rebalance script, fixup for extension script dumps, root logger for misc dumps (duration: 00m 02s)
  • 08:52 ariel@tin: Started deploy [dumps/dumps@1734c6d]: history dump rebalance script, fixup for extension script dumps, root logger for misc dumps
  • 08:40 gehel: restart relforge1001 to validate latest config changes
  • 08:16 akosiaris@tin: Finished deploy [citoid/deploy@ba0db9c]: Remove the bad PMCID test from spec (duration: 07m 44s)
  • 08:09 akosiaris@tin: Started deploy [citoid/deploy@ba0db9c]: Remove the bad PMCID test from spec
  • 08:02 moritzm: updating HHVM on terbium/wasat to 3.18
  • 07:57 akosiaris@tin: Finished deploy [citoid/deploy@ba0db9c]: Remove the bad PMCID test from spec (duration: 00m 38s)
  • 07:57 akosiaris@tin: Started deploy [citoid/deploy@ba0db9c]: Remove the bad PMCID test from spec
  • 07:48 akosiaris: schedule 2 hours downtime for all citoid endpoints health on scb boxes
  • 06:08 marostegui: Deploy alter table s2 - labsdb1003 - T166205
  • 05:50 marostegui: Deploy alter table s2 - db1018 - T166205
  • 05:49 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments to db1018 current status - T166205 (duration: 00m 43s)
  • 05:41 marostegui: Deploy alter table s4 - dbstore1001 - T166206
  • 05:22 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1036 - T166205 (duration: 00m 44s)
  • 02:50 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 15 02:50:16 UTC 2017 (duration 6m 48s)
  • 02:43 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 34s)
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 09m 15s)
  • 01:17 mutante: releases1001 - reinstalling with stretch
  • 00:15 mutante: dumpsdata1001 - was reported in icinga as CRIT systemdstate - reason was puppet service was failed with "Invalid value '"no"' for boolean parameter: daemonize" (it was ok on other hosts??). commented the option, stopped puppet, systemctl reset-failed - which made it recover (T165368)
  • 00:02 twentyafterfour: Deploying phabricator update (tagged release/2017-06-14/1) details: https://phabricator.wikimedia.org/project/view/2831/

2017-06-14

  • 23:55 mutante: mwreleases: revoke puppet cert, delete salt key, remove from icinga. releases1001 still syncing disks for a while (50m), being created... T164030
  • 23:49 mutante: ganeti: removed instance mwreleases1001, created new instance releases1001 with same parameters (2 VCPUS,4G memory, 1 x 128G disk) (T164030)
  • 23:41 mutante: mwreleases1001 - scheduled downtime, shutdown, kill VM, re-install as releases1001 (T164030)
  • 23:33 catrope@tin: Synchronized php-1.30.0-wmf.5/includes/: Unbreak watchlist highlighting T167922 (duration: 01m 30s)
  • 23:30 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Send search traffic back to eqiad T149006 (duration: 00m 44s)
  • 23:23 catrope@tin: Synchronized wmf-config/: ORES config cleanups (duration: 00m 46s)
  • 22:43 reedy@tin: Synchronized php-1.30.0-wmf.5/extensions/WikimediaMaintenance/addWiki.php: Remove accountaudit (duration: 00m 44s)
  • 22:33 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: meta namespace talk for atjwiki (duration: 00m 44s)
  • 21:36 reedy@tin: Synchronized wmf-config/interwiki.php: Update interwiki map for atjwiki T167714 (duration: 00m 44s)
  • 21:29 reedy@tin: Synchronized langlist: Add atj T167714 (duration: 00m 43s)
  • 21:29 reedy@tin: Synchronized static/images/project-logos/: atjwiki T167714 (duration: 00m 43s)
  • 21:27 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: atjwiki T167714 (duration: 00m 43s)
  • 21:26 reedy@tin: rebuilt wikiversions.php and synchronized wikiversions files: Add atjwiki T167714
  • 21:25 reedy@tin: Synchronized dblists/: add atjwiki T167714 (duration: 00m 42s)
  • 21:22 reedy@tin: Synchronized php-1.30.0-wmf.4/extensions/WikimediaMaintenance/addWiki.php: Remove accountaudit (duration: 00m 44s)
  • 21:15 reedy@terbium: scap aborted: (no justification provided) (duration: 00m 01s)
  • 21:15 reedy@terbium: Started scap: (no justification provided)
  • 20:06 reedy@tin: Synchronized wmf-config/CommonSettings-labs.php: noop (duration: 00m 43s)
  • 20:05 reedy@tin: Synchronized wmf-config/CommonSettings.php: CollaborationKit loader code (duration: 00m 43s)
  • 20:03 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Add CollaborationKit to testwiki (duration: 00m 44s)
  • 19:47 demon@tin: Synchronized wmf-config/CommonSettings-labs.php: no-op (duration: 00m 44s)
  • 19:42 Reedy: running mwscript initSiteStats.php srnwiki --update
  • 19:37 demon@tin: Synchronized wmf-config/extension-list-labs: No-op (duration: 00m 44s)
  • 19:23 demon@tin: Synchronized php: symlink bump (duration: 00m 43s)
  • 19:17 bblack: restart varnish backend on cp1074
  • 19:08 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.5
  • 18:50 otto@tin: Finished deploy [eventlogging/analytics@1ce446d]: (no justification provided) (duration: 00m 04s)
  • 18:49 otto@tin: Started deploy [eventlogging/analytics@1ce446d]: (no justification provided)
  • 18:34 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/358007/ Add wmgBabelMainCategory for many languages (duration: 00m 43s)
  • 18:32 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:25 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Sort wmgBabelMainCategory alphabetically https://gerrit.wikimedia.org/r/#/c/358006/ (duration: 00m 44s)
  • 18:24 jynus: reimporting data from pc1004 to db1096
  • 18:17 niharika29@tin: Synchronized tests/cirrusTest.php: https://gerrit.wikimedia.org/r/#/c/358625/ Test elastic2020 does not fall out of cluster (duration: 00m 43s)
  • 18:13 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/358625/ Test elastic2020 does not fall out of cluster (duration: 00m 44s)
  • 18:06 moritzm: installing unzip security updates
  • 17:55 moritzm: restarting hhvm on mw1261-mw1265 to pick up libxslt update
  • 17:49 moritzm: installing mongodb update from jessie point release on tungsten
  • 16:03 godog: point varnish upload in esams back to eqiad
  • 16:00 mobrovac@tin: Finished deploy [restbase/deploy@4c1cdd0]: (no justification provided) (duration: 04m 51s)
  • 15:55 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 15:44 godog: point varnish upload back to swift eqiad
  • 15:14 ema: restart varnish-backend on cp2017
  • 15:08 moritzm: installing systemd bugfix updates from jessie point update
  • 15:00 ema: restart varnish-backend on cp2014
  • 13:50 zeljkof: eu swat finished
  • 13:42 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove ContentTranslationTargetNamespace config (T167865) (duration: 00m 43s)
  • 13:41 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Remove unneeded ContentTranslationTargetNamespace (T167865) (duration: 00m 44s)
  • 13:35 zfilipin@tin: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 12:24 jynus@tin: Synchronized wmf-config/db-codfw.php: Switchover pc2004 to db2072 (duration: 00m 43s)
  • 12:13 akosiaris: upload apertium-spa-ita_0.2.0~r78826-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 12:13 akosiaris: upload apertium-fra-cat_1.2.0~r78602-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 11:41 jynus@tin: Synchronized wmf-config/db-eqiad.php: Switchover pc1004 to db1096 (duration: 00m 54s)
  • 11:34 jynus: about to deploy performance-impacting change on the parsercache persistent storage T167567
  • 11:19 marostegui: Deploy alter table s4 - labsdb1011 - T166206
  • 09:46 marostegui: Rename table titlekey before dropping it on enwiki - db1089 - T164949
  • 09:18 godog: delete files older than 365d from 'servers' graphite hierarchy
  • 07:59 marostegui: Drop table updates on s3 - T139342
  • 07:32 moritzm: installing zziplib security updates on jessie
  • 07:04 elukey: restart pdfrender on scb200[2,4] (xpra race condition)
  • 07:03 elukey: restart pdfrender on scb1004 (xpra race condition)
  • 06:32 moritzm: installing remaining libtasn security updates
  • 03:14 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jun 14 03:14:28 UTC 2017 (duration 6m 56s)
  • 03:07 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 14m 52s)
  • 02:32 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 07m 58s)
  • 01:48 mutante: netmon1002 - chown rancid:rancid /var/lib/rancid ; touch /var/lib/rancid/.gitconfig, let rancid write to config, then git config --global user.email and user.name as the rancid user | fix permissions on .git/objects files, let rancid user own them all | re-commit .gitingore change | SSH_AUTH_SOCK=/run/keyholder/proxy.sock /usr/lib/rancid/bin/rancid-run as user "rancid" runs clean,
  • 01:20 mutante: netmon1002 - copied missing router.db, routers.all/.down/.up over from netmon1001 to /var/lib/rancid/core. routers.db is an untracked file, the others are in .gitignore. this is all like on netmon1001 as well. adding routers.db to .gitignore file on both, like the other router* files already were (T159756)
  • 01:00 mutante: netmon1002 - locally "git clone /var/lib/rancid/GIT/core" into /var/lib/rancid (i rsynced that but it's a bare repository without a work tree. work tree is /var/lib/rancid/core (after this) (T159756)
  • 00:44 mutante: naos: disarm keyholder and armed it again to proof i didn't break anything on jessie by fixing keyholder on stretch with gerrit:358884
  • 00:39 demon@tin: Synchronized wmf-config/CommonSettings.php: extdist update (duration: 00m 44s)
  • 00:09 aaron@tin: Synchronized wmf-config/InitialiseSettings.php: Capture messages on 'autoloader' debug log channel (duration: 00m 44s)

2017-06-13

  • 23:29 RainbowSprinkles: gerrit: upgrading on master 2.13.4-13-gc0c5cc4742 -> 2.13.8-1-g7c438d37a2 (been running on slave for a week)
  • 23:13 mutante: contint1001 - started zuul using the old init script
  • 23:05 mutante: netmon1001/1002: rsynced /var/lib/rancid/CVS and /var/lib/rancid/GIT from 1001 to 1002 for rancid migration (T159756)
  • 23:04 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/OpenStackManager: Re-adding deleted special page (duration: 00m 45s)
  • 22:06 ejegg: updated fundraising tools from f2522cd to 585f546
  • 21:59 gwicke: restarted pdfrender on scb1003; was spinning on CPU & using 15G of memory (!)
  • 21:58 gwicke: restarted pdfrender on scb1002 and scb1004; was spinning on CPU
  • 21:56 hashar: Zuul back, running in an interactive terminal.
  • 21:46 mutante: netmon1002 - was able to "keyholder arm" after stretch install after applying https://gerrit.wikimedia.org/r/358884 as hotfix
  • 21:30 mobrovac@tin: Finished deploy [restbase/deploy@9a86d4c]: (no justification provided) (duration: 01m 06s)
  • 21:29 mobrovac@tin: Started deploy [restbase/deploy@9a86d4c]: (no justification provided)
  • 21:13 hashar: Gracefully restarting Zuul
  • 21:11 ppchelko@tin: Finished deploy [changeprop/deploy@4ba3c59]: Rate-limiter enhancements (duration: 01m 08s)
  • 21:10 ppchelko@tin: Started deploy [changeprop/deploy@4ba3c59]: Rate-limiter enhancements
  • 21:02 demon@tin: Synchronized php-1.30.0-wmf.5/extensions/CentralAuth/includes/CentralAuthHooks.php: Fix bad method name (duration: 00m 44s)
  • 20:37 hashar: Restarting Nodepool. apparently confused in pool tracking and spawning to many Trusty nodes (7 instead of 4)
  • 20:02 demon@tin: Synchronized php-1.30.0-wmf.5/includes/api/ApiParse.php: T167826 (duration: 00m 44s)
  • 20:00 mobrovac@tin: Finished deploy [restbase/deploy@4c1cdd0]: (no justification provided) (duration: 04m 29s)
  • 19:56 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 19:37 Amir1: restarting ores-related services in scb1001 (T167819)
  • 19:24 mutante: scb1001 - killed process 10971 (pdfrendering/electron)
  • 19:24 demon@tin: Synchronized php-1.30.0-wmf.5/extensions/CategoryTree/CategoryPageSubclass.php: Fix up variable visibility (duration: 00m 44s)
  • 19:12 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.5
  • 19:09 mobrovac@tin: Finished deploy [restbase/deploy@9a86d4c]: (no justification provided) (duration: 07m 33s)
  • 19:08 mutante: netmon1002 - reinstallled with stretch, revoked puppet cert, salt key, signing new cert, accepting new key, initial puppet run (T159756)
  • 19:01 mobrovac@tin: Started deploy [restbase/deploy@9a86d4c]: (no justification provided)
  • 18:56 mutante: reinstalling netmon1002 with stretch - scheduled icinga downtime
  • 18:54 legoktm: starting to delete all rows from linter tables on large wikis - T167758
  • 18:48 mobrovac@tin: Finished deploy [restbase/deploy@4c1cdd0]: (no justification provided) (duration: 04m 36s)
  • 18:43 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 18:39 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 18:37 mobrovac@tin: Finished deploy [restbase/deploy@4c1cdd0]: (no justification provided) (duration: 04m 19s)
  • 18:33 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 18:27 demon@tin: Finished scap: testwiki to wmf.5 + l10n bootstrap (duration: 42m 16s)
  • 17:52 bblack: cp4021 reboot for bnx2x modparam change
  • 17:50 ottomata: merged removal of x_forwarded_for from all varnishkafka webrequest instances
  • 17:45 ladsgroup@tin: Finished deploy [ores/deploy@862aea9]: ORES deploy early June: T167223 (duration: 33m 52s)
  • 17:45 demon@tin: Started scap: testwiki to wmf.5 + l10n bootstrap
  • 17:42 demon@tin: Pruned MediaWiki: 1.30.0-wmf.2 [keeping static files] (duration: 01m 13s)
  • 17:40 demon@tin: Pruned MediaWiki: 1.30.0-wmf.1 [keeping static files] (duration: 05m 10s)
  • 17:39 bblack: restart varnish-be on cp2002 (mailbox lag, likely induced by swift traffic testing in codfw)
  • 17:11 ladsgroup@tin: Started deploy [ores/deploy@862aea9]: ORES deploy early June: T167223
  • 17:06 akosiaris: rebooting sca2003 for tests
  • 16:35 moritzm: upgrading osmium to HHVM 3.18
  • 16:08 moritzm: installing libnl security updates on trusty
  • 15:41 akosiaris: upload apertium-spa_1.0.0~r78827-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 15:41 akosiaris: upload apertium-ita_0.9.0~r78828-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 15:41 akosiaris: upload apertium-fra_1.1.0~r78695-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 15:41 akosiaris: upload apertium-cat_2.1.0~r78615-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 15:41 gehel: restart of relforge1001 to test https://gerrit.wikimedia.org/r/#/c/358353/
  • 15:09 gehel: applying new GC configuration on elastic1018 - T167636
  • 14:53 godog: update inter-routing for upload to point esams to codfw
  • 14:22 gehel: restarting elasticsearch on relforge to validate GC configuration - T167636
  • 14:17 ottomata: stopping puppet on cp1045, testing removal of xff from varnishkafka webrequest data
  • 14:14 godog: point upload varnish to swift in codfw - T162609
  • 14:11 moritzm: upgrading mw1299-mw1306 to HHVM 3.18
  • 14:10 urandom: T164865: Restart RESTBase dev; apply range delete probability of 1.0
  • 13:30 godog: Thumbor to group1 wikis + mediawiki.org - T167793
  • 13:15 hashar: European SWAT completed
  • 13:13 hashar@tin: Synchronized php-1.30.0-wmf.4/extensions/Popups: actions/rest: Use DB-key version of title - T167633 (duration: 00m 41s)
  • 13:08 hashar@tin: Synchronized php-1.30.0-wmf.4/includes/htmlform/OOUIHTMLForm.php: Do not try to parse empty argument in getErrorsOrWarnings in OOUI - T167644 (duration: 00m 41s)
  • 13:04 hashar@tin: Synchronized wmf-config/Wikibase-production.php: Enable Wikidata echo notifications for all wikis (except enwiki, frwiki, dewiki) - T142102 (duration: 00m 42s)
  • 12:44 marostegui: Deploy alter table on s2 on db1036 - T166205
  • 12:39 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1036 - T166205 (duration: 00m 41s)
  • 12:12 marostegui: Deploy alter table on s2 on dbstore1002 - T166205
  • 12:11 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1064 - T166206 (duration: 00m 51s)
  • 11:56 godog: enable thumbor serving for group0 wikis with media files - T167782
  • 11:41 moritzm: upgrading HHVM on tin/naos to HHVM 3.18
  • 10:59 moritzm: upgrading mw1283-mw1290 to HHVM 3.18
  • 10:21 godog: reenable thumbor swift storage, same paths as mediawiki - T167783
  • 10:11 elukey: completed rollout of https://gerrit.wikimedia.org/r/354449
  • 09:54 moritzm: upgrading mw2248-mw2250 to HHVM 3.18
  • 09:37 godog: disable thumbor shadow requests, enable thumbor-only serving for testwiki - T167490
  • 09:28 moritzm: upgrading mw1276-mw1282 to HHVM 3.18
  • 09:27 elukey: puppet disabled on kafka*, analytics*, druid*, conf* for https://gerrit.wikimedia.org/r/354449 - incremental rollout
  • 09:13 marostegui: Deploy alter table s4 - db1095 - T166206
  • 08:56 moritzm: upgrading mw1165-mw1167 to HHVM 3.18
  • 08:42 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060 - T166205 (duration: 00m 41s)
  • 08:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1089 original weight - T166935 (duration: 00m 42s)
  • 08:21 gehel: restart OSM synchronisation on maps2001
  • 08:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1089 weight (duration: 00m 42s)
  • 08:05 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic2020.codfw.wmnet
  • 08:01 gehel: adding elastic2020 back in the elasticsearch cluster - T149006
  • 07:48 marostegui: Drop table updates on enwiki (s1) - T139342
  • 07:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1089 weight (duration: 00m 41s)
  • 07:30 moritzm: restarting HHVM on mw canaries to pick up libtasn update
  • 07:21 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 with less weight (duration: 00m 41s)
  • 07:12 marostegui: Reboot scb2005 - T167638
  • 06:55 elukey: executed "cumin 'mw2*.codfw.wmnet' 'find /var/log/hhvm/* -user root -exec chown www-data:www-data {} \;'" to fix the last occurences of wrong root:adm owned hhvm error logs
  • 06:51 moritzm: installing libtasn security updates
  • 06:43 marostegui: Stop MySQL on db1089 to upgrade its raid controller firmware - T166935
  • 06:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T166935 (duration: 00m 42s)
  • 02:33 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Jun 13 02:33:23 UTC 2017 (duration 6m 12s)
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 08m 00s)
  • 01:29 papaul: OS install on labtestnet2002
  • 00:40 andyrussg@tin: Finished scap: Update CentralNotice (duration: 20m 51s)
  • 00:19 andyrussg@tin: Started scap: Update CentralNotice

2017-06-12

  • 23:22 mutante: netmon1002 - keyholder arm - loaded rancid deploy key (uses separate passphrase from deployment key)
  • 22:01 mutante: netmon1002 - apt-get -t jessie-backports install rancid (upgrade from 2.3.8 to 3.6.2 to match version on netmon1001) - rancid version is not specified in puppet so even though backports gets enabled the older version gets installed and this manual step is needed unless we start specifying the version in the manifest (T159756)
  • 20:30 mutante: ns0, ns1 - same as before - gen zones, check zones, reload zones, to add "atj.wikipedia.org" (T167714)
  • 20:26 mutante: ns2 - authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones to add new Wikipedia language "atj" (needed when editing langlist but not touching templates) (T167714)
  • 19:10 thcipriani@tin: Synchronized wmf-config/throttle.php: SWAT: Lift IP throttle for Wikipedia workshop (14 June 2017) T167011 + Fix throttle rule for Scotland university editathon (duration: 00m 41s)
  • 18:46 thcipriani@tin: Synchronized wmf-config/throttle.php: SWAT: Lift IP throttle for Editathon (13 June 2017) T167517 (duration: 00m 41s)
  • 18:40 thcipriani@tin: Synchronized wmf-config/throttle.php: SWAT: Lift IP throttle for Wikipedia Editathon (June 16th 2017) T167201 (duration: 00m 41s)
  • 18:30 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add NS:100 to wgNamespacesToBeSearchedDefault for enwikisource T167511 (duration: 00m 41s)
  • 18:27 thcipriani@tin: Synchronized php-1.30.0-wmf.4/resources/src/mediawiki.rcfilters/mw.rcfilters.Controller.js: SWAT: RCFilters: Retain extra url params when comparing url equivalency T167551 (duration: 00m 41s)
  • 18:17 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Setup the new wgPopupsGateway config variable. NOOP T165018 (duration: 00m 42s)
  • 17:24 joal@tin: Finished deploy [analytics/refinery@08fe129]: Bug correction on regular weekly deploy of refinery (2) (duration: 03m 00s)
  • 17:24 gehel: running stress + bonnie on elastic2020 to check new hardware - T149006
  • 17:21 joal@tin: Started deploy [analytics/refinery@08fe129]: Bug correction on regular weekly deploy of refinery (2)
  • 17:07 gehel@tin: Finished deploy [wdqs/wdqs@84557b8]: (no justification provided) (duration: 02m 32s)
  • 17:05 gehel@tin: Started deploy [wdqs/wdqs@84557b8]: (no justification provided)
  • 16:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 with less weight (duration: 00m 41s)
  • 14:32 gehel: restart elasticsearch on relforge1001 to validate GC configuration
  • 14:14 moritzm: updating tor on radium to 0.2.9.11-1~d80.jessie+1
  • 14:14 hashar: European SWAT completed
  • 14:13 hashar@tin: Synchronized static/images/project-logos/: Update logo for the Norwegian Wikisource - T167192 (duration: 00m 41s)
  • 14:12 hashar@tin: Synchronized static/images/: Delete duplicate HD logos for the Punjabi Wikipedia (duration: 00m 41s)
  • 14:04 moritzm: updating tor in jessie-wikimedia to 0.2.9.11-1~d80.jessie+1 (via reprepro update from tor repository)
  • 13:59 moritzm: upgrading mw1296-mw1298 to HHVM 3.18
  • 13:53 marostegui: Shutdown db1089 for maintenance - T166935
  • 13:48 hashar@tin: Synchronized php-1.30.0-wmf.4/includes/specials/SpecialNewimages.php: SpecialNewimages: Do not add the module when the special page is included - T167601 (duration: 00m 41s)
  • 13:40 hashar: redoing all the fawiki* updateCollation.php since I ran them without deploying the IS.php change :(
  • 13:38 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Change Persian Wikis from uca-fa to xx-uca-fa - T139110 (duration: 00m 41s)
  • 13:35 moritzm: uploaded openssl 1.1.0f to apt.wikimedia.org
  • 13:31 joal@tin: Finished deploy [analytics/refinery@0dda4a9]: Bug correction for egular weekly deploy of refinery (duration: 03m 40s)
  • 13:30 aharoni: running mwscript updateCollation.php --wiki=bawikibooks
  • 13:28 joal@tin: Started deploy [analytics/refinery@0dda4a9]: Bug correction for egular weekly deploy of refinery
  • 13:25 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawikiquote --previous-collation=uca-fa
  • 13:24 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawikinews --previous-collation=uca-fa
  • 13:24 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawikibooks --previous-collation=uca-fa
  • 13:24 aharoni: running mwscript updateCollation.php --wiki=bawiki
  • 13:23 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawiktionary --previous-collation=uca-fa
  • 13:22 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawikisource --previous-collation=uca-fa
  • 13:21 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawiki --previous-collation=uca-fa
  • 13:17 moritzm: upgrading cp1008 to openssl 1.1.0f
  • 13:13 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Set collation for Bashkir wikis to uppercase-ba - T162823 (duration: 00m 41s)
  • 13:10 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: update some logos 6974b9ab4..76939d15f (duration: 00m 41s)
  • 13:08 hashar@tin: Synchronized static/images/project-logos: (no justification provided) (duration: 00m 43s)
  • 12:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 for maintenance - T166935 (duration: 00m 41s)
  • 12:01 moritzm: upgrading mw1266-mw1275 to HHVM 3.18
  • 11:09 joal@tin: Finished deploy [analytics/refinery@d9c3419]: Regular weekly deploy of refinery (mostly unique_devices patches) (duration: 06m 18s)
  • 11:05 moritzm: upgrading job runners mw1162-mw1164 to HHVM 3.18
  • 11:03 joal@tin: Started deploy [analytics/refinery@d9c3419]: Regular weekly deploy of refinery (mostly unique_devices patches)
  • 10:59 marostegui: Drop table updates on commonswiki (s4) - T139342
  • 10:28 moritzm: upgrading mw1250-mw1258 to HHVM 3.18
  • 09:55 moritzm: upgrading mw1221-mw1235 to HHVM 3.18
  • 09:25 godog: swift eqiad-prod finish decom ms-be1005/6/7 - T166489
  • 09:13 moritzm: upgrading mw1236-mw1249 to HHVM 3.18
  • 09:12 marostegui: Drop table updates on dewiki and wikidatawiki (s5) - T139342
  • 08:31 godog: reboot ms-be1002, load avg slowly creeping up
  • 08:22 elukey: powercycle scb2005 (console frozen, host unresponsive)
  • 07:40 elukey: restarted citoid on scb1001 (kept failing health checks for Error: write EPIPE)
  • 07:38 marostegui: Reboot ms-be1008 as xfs is failing
  • 07:31 marostegui: Deploy alter table s2 - db1060 - T166205
  • 07:31 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1060 - T166205 (duration: 00m 41s)
  • 07:26 elukey: ran restart-pdfrender on scb1001 (OOM errors in the dmesg from hours ago)
  • 07:22 elukey: ran restart-pdfrender on scb1002 (OOM errors in the dmesg from hours ago)
  • 07:21 marostegui: Deploy alter table s4 - db1064 - https://phabricator.wikimedia.org/T166206
  • 07:19 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1064 - T166206 (duration: 00m 41s)
  • 06:53 moritzm: upgrade remaining app servers running HHVM 3.18 to 3.18.2+wmf5
  • 05:38 marostegui: Deploy alter table s4 - labsdb1003 - T166206
  • 02:14 l10nupdate@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)

2017-06-11

  • 14:14 elukey: executed cumin 'mw22[51-60].codfw.wmnet' 'find /var/log/hhvm/* -user root -exec chown www-data:www-data {} \;' to reduce cron-spam (new hosts added in March) - T146464
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Jun 11 02:25:53 UTC 2017 (duration 6m 6s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 07m 37s)

2017-06-10

  • 11:54 andrewbogott: cleared leaked instances out of the nova fullstack test. Six were up and running and reachable, one had a network failure.
  • 10:19 TimStarling: on terbium: running purgeParserCache.php prior to cron job due to observed disk space usage increase
  • 10:00 marostegui: Purge binary logs on pc1006-pc2006
  • 09:58 marostegui: Purge binary logs on pc1004-pc2004 and pc1005-pc2005
  • 02:22 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Jun 10 02:22:22 UTC 2017 (duration 6m 13s)
  • 02:16 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 05m 33s)

2017-06-09

  • 21:18 mobrovac@tin: Finished deploy [restbase/deploy@4e5cb35]: (no justification provided) (duration: 01m 40s)
  • 21:17 mobrovac@tin: Started deploy [restbase/deploy@4e5cb35]: (no justification provided)
  • 21:07 mobrovac@tin: Finished deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (take #2) (duration: 05m 23s)
  • 21:02 mobrovac@tin: Started deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (take #2)
  • 21:01 mobrovac@tin: Finished deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (duration: 04m 57s)
  • 20:56 mobrovac@tin: Started deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045
  • 20:54 mobrovac@tin: Finished deploy [restbase/deploy@4e5cb35] (staging): Ensure the extract field is always present in the summary response (duration: 03m 39s)
  • 20:50 mobrovac@tin: Started deploy [restbase/deploy@4e5cb35] (staging): Ensure the extract field is always present in the summary response
  • 20:12 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/CirrusSearch/includes/Job/DeleteArchive.php: Really fix it this time (duration: 00m 43s)
  • 19:49 mutante: fermium: $ sudo /usr/local/sbin/disable_list wikino-bureaucrats (T166848)
  • 19:46 RainbowSprinkles: mw1299: running scap pull, maybe out of date?
  • 18:12 gehel: retry allocation of failed shards on elasticsearch eqiad
  • 15:47 _joe_: installed python-service-checker 0.1.3 on einsteinium,tegmen T167048
  • 15:44 _joe_: uploaded service-checker 0.1.3
  • 15:11 _joe_: upgraded python-service-checker to 0.1.2 on tegmen,einsteinium
  • 13:18 godog: upgrade thumbor to 0.1.40 - T167462
  • 12:36 gehel: reducing high watermark on elasticsearch eqiad to rebalance shards
  • 07:51 elukey: run megacli -LDSetProp -Direct -LALL -aALL on analytics[1058-1068] - T166140
  • 07:40 moritzm: upgrade app servers in codfw running HHVM 3.18 to +wmf5
  • 07:26 elukey: run megacli -LDSetProp ADRA -LALL -aALL on analytics[1058-1068] - T166140
  • 07:15 elukey: deleted /etc/logrotate.d/nova-manage from labtestvirt2003 to reduce cronspam (same solution used in T132422#2679434)
  • 06:58 moritzm: updating mw117* to HHVM 3.18+wmf5
  • 06:41 moritzm: updating mw1161 to HHVM 3.18
  • 05:57 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 - T166206 (duration: 00m 41s)
  • 05:51 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1074 - T166205 (duration: 00m 42s)
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Jun 9 02:25:29 UTC 2017 (duration 6m 27s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 06m 04s)
  • 00:36 ejegg: disabled banner impressions loader
  • 00:15 mutante: mw1275 depooled (T124956)
  • 00:08 ejegg: updated CiviCRM from 5a83ee1 to dfc26f0
  • 00:01 mutante: seeing "php: Lost parent, LightProcess exiting" in syslog on mw1275 today (T124956)

2017-06-08

  • 23:48 mutante: mw1275 - restarted hhvm (php: Lost parent, LightProcess exiting in syslog)
  • 23:37 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: remaining wikis to wmf.4
  • 23:16 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/CirrusSearch/includes/Job/DeleteArchive.php: Fix array access bug (duration: 00m 43s)
  • 23:15 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/GeoData/includes/Searcher.php: Temp hax to point GeoData at codfw DC (duration: 00m 43s)
  • 22:56 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/RevisionSlider/src/RevisionSliderHooks.php: Re-syncing with permanent committed fix (duration: 00m 44s)
  • 22:36 ejegg: updated civicrm from c70ae65 to 5a83ee1
  • 22:29 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/RevisionSlider/src/RevisionSliderHooks.php: Livehack/test (duration: 00m 44s)
  • 22:17 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/MobileFrontend/includes/specials/SpecialMobileDiff.php: (no justification provided) (duration: 00m 44s)
  • 22:15 mobrovac@tin: Finished deploy [changeprop/deploy@836b070]: Rate limiting, attempt #2 (duration: 01m 23s)
  • 22:13 mobrovac@tin: Started deploy [changeprop/deploy@836b070]: Rate limiting, attempt #2
  • 21:56 mobrovac@tin: Finished deploy [changeprop/deploy@dc1948f]: (no justification provided) (duration: 01m 39s)
  • 21:54 mobrovac@tin: Started deploy [changeprop/deploy@dc1948f]: (no justification provided)
  • 21:54 mobrovac@tin: Finished deploy [changeprop/deploy@56f7511]: (no justification provided) (duration: 01m 32s)
  • 21:52 mobrovac@tin: Started deploy [changeprop/deploy@56f7511]: (no justification provided)
  • 21:50 mobrovac@tin: Finished deploy [changeprop/deploy@56f7511]: (no justification provided) (duration: 00m 34s)
  • 21:50 mobrovac@tin: Started deploy [changeprop/deploy@56f7511]: (no justification provided)
  • 21:42 urandom: T160570: Rolling Cassandra restart, restbase-dev
  • 21:35 ppchelko@tin: Finished deploy [changeprop/deploy@56f7511]: Revert previous deploy (duration: 01m 07s)
  • 21:34 ppchelko@tin: Started deploy [changeprop/deploy@56f7511]: Revert previous deploy
  • 21:31 ppchelko@tin: Started deploy [changeprop/deploy@56f7511]: dc1948f6bc7b1 Revert previous deploy
  • 21:29 ppchelko@tin: Finished deploy [changeprop/deploy@56f7511]: dc1948f6bc7b1 (duration: 00m 16s)
  • 21:29 ppchelko@tin: Started deploy [changeprop/deploy@56f7511]: dc1948f6bc7b1
  • 21:24 ppchelko@tin: Finished deploy [changeprop/deploy@56f7511]: Rate limiting code and config. T161710 (duration: 01m 46s)
  • 21:23 ppchelko@tin: Started deploy [changeprop/deploy@56f7511]: Rate limiting code and config. T161710
  • 20:23 RainbowSprinkles: gerrit2001: upgraded to 2.13.8+git1-wmf.5 / 2.13.8-1-g7c438d37a2
  • 20:12 mutante: imported gerrit_2.13.8+git1-wmf.5_amd64 on apt.wikimedia.org (T158946)
  • 19:26 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.4
  • 19:13 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: mw.org -> wmf.4
  • 19:05 demon@tin: Synchronized wmf-config/InitialiseSettings.php: New wordmark for mk/srwiki (duration: 00m 57s)
  • 19:03 demon@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-sr.svg: new wordmark (duration: 00m 46s)
  • 18:59 maxsem@tin: Synchronized php-1.30.0-wmf.4/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/357846/ (duration: 00m 49s)
  • 18:55 maxsem@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:49 urandom: Restarting Cassandra, restbase-dev1001-a to test alternative disk access mode
  • 18:42 mutante: built gerrit_2.13.8+git1-wmf.5 on copper (T158946)
  • 18:40 maxsem@tin: Synchronized php-1.30.0-wmf.4/extensions/LoginNotify/: https://gerrit.wikimedia.org/r/#/c/357743/ (duration: 00m 44s)
  • 18:36 maxsem@tin: Synchronized php-1.30.0-wmf.4/includes/EditPage.php: https://gerrit.wikimedia.org/r/#/c/357855/ (duration: 00m 45s)
  • 18:25 maxsem@tin: Synchronized multiversion/submodules.json: https://gerrit.wikimedia.org/r/#/c/352985/3 (duration: 00m 43s)
  • 18:17 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/356881/4 (duration: 00m 44s)
  • 18:09 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/354731/6 (duration: 00m 44s)
  • 17:55 arlolra: Updated Parsoid to 108eed81 (T136653, T167081)
  • 17:46 arlolra@tin: Finished deploy [parsoid/deploy@f82cb4f]: Updating Parsoid to 108eed81 (duration: 10m 12s)
  • 17:36 arlolra@tin: Started deploy [parsoid/deploy@f82cb4f]: Updating Parsoid to 108eed81
  • 16:44 nuria@tin: Finished deploy [analytics/refinery@2fbed63]: (no justification provided) (duration: 04m 08s)
  • 16:40 nuria@tin: Started deploy [analytics/refinery@2fbed63]: (no justification provided)
  • 16:33 godog: delete net.ifnames for ms-be2001 and ms-be2013 - T158429
  • 16:24 bblack: cp1074: varnish-backend-restart for mailbox lag
  • 15:22 moritzm: updating mw1262-mw1265 to HHVM 3.18.2+wmf5
  • 15:11 XioNoX: Upgrading rancid to 3 - T167288
  • 14:56 moritzm: updating mw1261 to HHVM 3.18.2+wmf5
  • 14:54 XioNoX: 2 blackhole IPs pushed to cr* routers
  • 14:02 aude@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Do not enable Wikibase data access yet on beta wiktionary (duration: 00m 43s)
  • 13:47 aude@tin: Synchronized php-1.30.0-wmf.4/extensions/RevisionSlider: Fix fatal error: T167359 (duration: 00m 44s)
  • 13:41 aude@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 13:33 aude@tin: Synchronized php-1.30.0-wmf.4/extensions/Wikidata: Fix warning in date formatting T167360 (duration: 02m 16s)
  • 13:31 XioNoX: blackhole v4 IPs removed from all cr* routers
  • 12:39 moritzm: updating mwdebug* to HHVM 3.18.2+wmf5
  • 12:17 moritzm: uploaded hhvm 3.18.2-dfsg-1+wmf5 to apt.wikimedia.org
  • 12:17 moritzm: updated hhvm 3.18.2-dfsg-1+wmf5 to apt.wikimedia.org
  • 11:41 marostegui: Drop table updates on s7 - T139342
  • 11:41 moritzm: powercycling mw1294, mgmt is unresponsive
  • 09:41 moritzm: updating mysql-connector-java on hadoop cluster
  • 09:05 elukey: upgrade zookeeper packages to 3.4.5+dfsg-2+deb8u2 on conf100[123], conf200[23] and druid100[123]
  • 08:58 godog: swift eqiad-prod eqiad-prod: decom ms-be1005/6/7 - T166489
  • 08:50 TabbyCat: Rename user "Mlpearc" to "FlightTime" on Central Auth is now finished (T166028)
  • 08:36 godog: temporarily stop ircecho on tegmen, puppet spam
  • 08:22 TabbyCat: Starting big global rename as requested in T166028
  • 07:00 marostegui: Drop table updates on s6 - T139342
  • 05:59 _joe_: uploading new service-checker version to reprepro, T167048
  • 05:54 marostegui: Deploy alter table s2 - db1074 - T166205
  • 05:53 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1074 - T166205 (duration: 00m 43s)
  • 05:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1076 - T166205 (duration: 00m 45s)
  • 02:56 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 8 02:56:27 UTC 2017 (duration 6m 26s)
  • 02:50 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 05m 07s)
  • 02:40 twentyafterfour: deploying hotfix for T166958
  • 02:34 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 08m 41s)
  • 01:45 mutante: manually running mediawiki maintenance job "echo_mail_batch" (on terbium as www-data, just like cron). did _NOT_ get denied by DB (T167373)
  • 01:37 maxsem@tin: Synchronized php-1.30.0-wmf.2/extensions/GeoData/includes/Searcher.php: Livehack to stop exceptions (duration: 00m 46s)
  • 00:54 mutante: cp4019 - powercycled (same as others) | lvs1007 - sits at installer - waiting for IP to be configured (T150256)
  • 00:47 mutante: cp1059 - same thing - powercycle after failed boot after reimaging script
  • 00:41 mutante: cp4011 - like cp4010 - powercycling (host down, console sat at initramfs). it hat the "did not detect disk by uid" issue but boots normal after powercycle
  • 00:34 mutante: cp4020 - powercycling (host down, console sat at initramfs)
  • 00:31 mutante: cp2012 - fixed salt key issue as for cp3005 (delete key, stop/start minion, accept new key)
  • 00:25 mutante: salt-master: deleted salt-key for cp3005, stopped started minion cp3005 - key got accepted again (was: Salt Master has rejected this minion's public key)

2017-06-07

  • 23:33 ppchelko@tin: Finished deploy [trending-edits/deploy@e0a8716]: Include reverts from bots to get rid of false positives (duration: 07m 00s)
  • 23:30 catrope@tin: Synchronized php-1.30.0-wmf.4/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.eventLogging/index.js: T167236 (duration: 00m 43s)
  • 23:28 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Relaunch related pages A/B test to 98% of users on enwiki (T167310) (duration: 00m 44s)
  • 23:26 ppchelko@tin: Started deploy [trending-edits/deploy@e0a8716]: Include reverts from bots to get rid of false positives
  • 22:24 bblack: reimaging ex-cache_maps hosts (fresh role::spare::system installs)
  • 22:18 bblack: puppet node clean+deactivate for cp3003
  • 22:15 bblack: lvs4002 - restarting pybal to remove old maps table entries
  • 22:14 bblack: lvs3002 - restarting pybal to remove old maps table entries
  • 22:13 bblack: lvs2002 - restarting pybal to remove old maps table entries
  • 22:13 bblack: lvs1002 - restarting pybal to remove old maps table entries
  • 22:12 bblack: lvs4004 - restarting pybal to remove old maps table entries
  • 22:11 bblack: lvs3004 - restarting pybal to remove old maps table entries
  • 22:09 bblack: lvs2005 - restarting pybal to remove old maps table entries
  • 22:07 bblack: lvs1005 - restarting pybal to remove old maps table entries
  • 21:32 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.2
  • 21:31 twentyafterfour: rolling back to wmf.2 due to error spike and popups no longer working refs T166829
  • 21:25 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.4
  • 21:23 twentyafterfour@tin: Synchronized php-1.30.0-wmf.4/: sync 3248a17 refs T167343 (duration: 07m 52s)
  • 20:26 twentyafterfour@tin: Synchronized php-1.30.0-wmf.4/extensions/MobileFrontend: Deploy 66ef9cb refs T167216 (duration: 00m 46s)
  • 20:04 twentyafterfour: Preparing to deploy the MediaWiki train for group1 wikis, 1.30.0-wmf.4 refs T166829
  • 18:22 thcipriani@tin: Synchronized wmf-config: SWAT: Enable archive indexing on delete for select wikis T162302 (duration: 00m 47s)
  • 18:14 thcipriani@tin: Synchronized portals: SWAT: Updating portals stats T128546 (duration: 00m 44s)
  • 18:13 thcipriani@tin: Synchronized portals/prod/wikipedia.org/assets: SWAT: Updating portals stats T128546 (duration: 00m 44s)
  • 17:14 elukey: restart nutcracker on thumbor1002 (too many connections approaching the 1024 ulimit)
  • 15:37 akosiaris: disable puppet on puppetmaster1001, depool rhodium for tests
  • 14:51 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2007.codfw.wmnet
  • 14:48 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1007.codfw.wmnet
  • 14:11 dcausse: eu swat done
  • 12:56 aude@tin: Synchronized php-1.30.0-wmf.4/extensions/Wikidata: Fix parser function registration T167238 (duration: 02m 20s)
  • 12:43 marostegui: Drop table updates on s2 - T139342
  • 12:40 aude@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Enable Wikibase Client on beta wiktionary sites T158323 (duration: 00m 43s)
  • 12:40 elukey: upgrade zookeeper packages on conf2002 to 3.4.5+dfsg-2+deb8u2
  • 12:32 bblack: cp1072, cp1063 restarting varnish backend for mailbox lag
  • 12:26 aude@tin: Synchronized wmf-config/Wikibase.php: Site links for non-main namespace wiktionary pages T158323 (duration: 00m 43s)
  • 12:19 aude@tin: Synchronized wmf-config/Wikibase-labs.php: Site links for non-main namespace wiktionary pages (duration: 00m 44s)
  • 11:08 gehel: restarting cron on logstash cluster
  • 10:29 moritzm: installing tiff regression security update on trusty
  • 10:26 ema: upgrade lvs1*/lvs2* to jessie 8.8 point release T164703
  • 09:49 ema: upgrade lvs[3001-3004] to jessie 8.8 point release T164703
  • 09:28 gehel: upgrading kibana to v5.3.3 on logstash cluster - T167266
  • 09:15 ema: upgrade lvs4001-4004 to jessie 8.8 point release T164703
  • 08:58 marostegui: Deploy alter table on s2 - db1076 - T166205
  • 08:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1076 - T166205 (duration: 00m 43s)
  • 08:50 marostegui: Deploy alter table s4 - db1056 - T166206
  • 08:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1056 - T166206 (duration: 00m 43s)
  • 08:02 marostegui: Run redact_sanitarium on db1095 for dewiki - T153743
  • 07:22 marostegui: Deploy alter table on db1047 enwiki.revision - T162807
  • 06:49 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 - T166206 (duration: 00m 44s)
  • 05:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1053, depool db1056 - T166206 (duration: 01m 03s)
  • 03:11 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jun 7 03:11:40 UTC 2017 (duration 6m 54s)
  • 03:04 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 14m 29s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 57s)
  • 00:21 RainbowSprinkles: gerrit: rolled back to 2.13.4-13-gc0c5cc4742 from 2.13.8. T152640 rearing its ugly head again (login issues)

2017-06-06

  • 23:59 thcipriani@tin: Synchronized php-1.30.0-wmf.2/extensions/Flow/includes/Content/BoardContentHandler.php: SWAT: Revert "Throw when unserializing invalid Flow workflow metadata JSON" T166100 T156813 (duration: 00m 43s)
  • 23:58 thcipriani@tin: Synchronized php-1.30.0-wmf.4/extensions/Flow/includes/Content/BoardContentHandler.php: SWAT: Revert "Throw when unserializing invalid Flow workflow metadata JSON" T166100 T156813 (duration: 00m 45s)
  • 23:56 RainbowSprinkles: gerrit: back from reindexing
  • 23:55 RainbowSprinkles: gerrit: force stopping for a second to reindex accounts
  • 23:17 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable page previews on wikispecies T166894 (duration: 00m 44s)
  • 23:12 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update ContentNamespaces for Commons Wiki T167077 (duration: 00m 46s)
  • 21:57 RainbowSprinkles: gerrit: restarting last time, didn't work like I wanted
  • 21:53 RainbowSprinkles: gerrit: restarting to test a config tweak
  • 21:41 mutante: contint1001 - graceful'ed Apache to deploy gerrit:351391
  • 21:19 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: unbreak mw.org pref page
  • 20:21 RainbowSprinkles: gerrit: Down for just a moment, finally doing point release on cobalt
  • 19:57 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.4
  • 19:45 demon@tin: Finished scap: testwiki to wmf.4 + prepping l10n. again (x2) (duration: 20m 25s)
  • 19:36 mutante: cobalt - removed systemd unit file (that has issues with ulimit and isn't used yet) - ran "systemctl reset-failed" which cleared the "systemctl status" which made the Icinga check recover
  • 19:24 demon@tin: Started scap: testwiki to wmf.4 + prepping l10n. again (x2)
  • 19:23 demon@tin: scap failed: RuntimeError scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details) (duration: 13m 32s)
  • 19:23 demon@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 19:10 demon@tin: Started scap: testwiki to wmf.4 + prepping l10n. again
  • 19:08 demon@tin: Synchronized README: No-op, just forcing co-master sync (duration: 01m 27s)
  • 19:01 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: testwiki back to wmf.2
  • 18:55 maxsem@tin: Finished scap: LoginNotify to testwiki - rebuild messages (duration: 38m 19s)
  • 18:16 maxsem@tin: Started scap: LoginNotify to testwiki - rebuild messages
  • 18:15 maxsem@tin: Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/357317/2 (duration: 00m 44s)
  • 18:10 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357317/2 (duration: 00m 44s)
  • 18:03 demon@tin: Finished scap: testwiki to wmf.3, prepping l10n cache (duration: 31m 58s)
  • 17:31 demon@tin: Started scap: testwiki to wmf.3, prepping l10n cache
  • 16:53 moritzm: installing wireshark security updates on trusty (jessie already fixed)
  • 16:41 bblack: rebooted lvs1007 (kernel update)
  • 16:35 bblack: rebooted lvs1007 (kernel update)
  • 15:21 otto@tin: Finished deploy [eventlogging/analytics@37233cd]: (no justification provided) (duration: 00m 04s)
  • 15:21 otto@tin: Started deploy [eventlogging/analytics@37233cd]: (no justification provided)
  • 14:58 moritzm: installing libsndfile security updates on trusty
  • 14:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1094 original weight (duration: 00m 40s)
  • 13:46 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1094 weight (duration: 00m 40s)
  • 13:39 elukey: shutdown analytics1033 and analytics1039 to replace their BBU - T166140
  • 13:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1094 with low weight (duration: 00m 40s)
  • 12:58 marostegui: Shutdown db1094 for maintenance - T166518
  • 12:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1094 for maintenance - T166518 (duration: 00m 39s)
  • 12:51 godog: upgrade scap to 3.5.8 - T127762
  • 12:41 mobrovac@tin: Finished deploy [changeprop/deploy@e92dd66]: Bump src to bc8abf3 (duration: 01m 45s)
  • 12:40 mobrovac@tin: Started deploy [changeprop/deploy@e92dd66]: Bump src to bc8abf3
  • 12:16 bblack: cp1049 - restaret varnish backend for mailbox lag
  • 12:08 gehel: kill stuck osm replication on maps1001
  • 11:28 akosiaris@tin: Finished deploy [servermon/servermon@4a2288f]: (no justification provided) (duration: 00m 04s)
  • 11:28 akosiaris@tin: Started deploy [servermon/servermon@4a2288f]: (no justification provided)
  • 11:17 moritzm: uploaded ferm 2.3.2+wmf1 to apt.wikimedia.org/stretch-wikimedia (T166653)
  • 11:02 ladsgroup@tin: Synchronized wmf-config/Wikibase-production.php: Enabling writing in full entity id in testwikidatawiki (T165197) (duration: 00m 39s)
  • 10:22 moritzm: installing NSS security updates
  • 09:43 moritzm: installing perl security updates
  • 09:41 akosiaris: stop jobchron/jobrunner processes across jobrunner and videoscalers in codfw
  • 09:35 akosiaris: restart jobchron service across videoscalers T129148
  • 09:33 akosiaris: restart jobchron service across jobrunners T129148
  • 09:32 akosiaris@tin: Finished deploy [jobrunner/jobrunner@161c84c]: (no justification provided) (duration: 01m 17s)
  • 09:31 akosiaris@tin: Started deploy [jobrunner/jobrunner@161c84c]: (no justification provided)
  • 09:29 akosiaris: running puppet on jobrunners T129148
  • 09:25 akosiaris: running puppet on videoscalers T129148
  • 09:25 akosiaris: moving around jobrunner/jobrunner was probably not required T129148
  • 09:19 akosiaris: running puppet again on tin, after moving /serv/deployment/jobrunner/jobrunner T129148
  • 09:12 akosiaris: running puppet on mw1161 T129148
  • 09:11 akosiaris: git pull and scap deploy --init for jobrunner T129148
  • 09:08 akosiaris: running puppet on tin T129148
  • 09:04 akosiaris: disable puppet on all jobrunners T129148
  • 09:04 akosiaris: disable puppet on all jobrunners
  • 08:54 dcausse: restarting elastic2014 to reclaim free space on deleted log file
  • 08:43 jynus: stopping db2035 and preparing for reimage
  • 08:39 gehel: raise log level to WARN for TransportShardBulkAction on elasticsearch cirrus - T167091
  • 07:53 gehel: starting upgrade to elasticsearch 5.3.2 on cirrus eqiad cluster - T163708
  • 06:40 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments about current status of db1089 - T166935 (duration: 00m 39s)
  • 05:56 marostegui: Deploy alter table s3 on db1075 (eqiad master) - T166278
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Jun 6 02:27:37 UTC 2017 (duration 6m 3s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 32s)

2017-06-05

  • 23:33 thcipriani: running on terbium: mwscript extensions/ORES/maintenance/CheckModelVersions.php frwiki && mwscript extensions/ORES/maintenance/PopulateDatabase.php frwiki
  • 23:32 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ORES review tool in frwiki T165044 (duration: 00m 40s)
  • 23:23 thcipriani: frwiki create tables ores_model and ores_classification T165044
  • 22:03 bblack: cp1074 - varnish-backend-restart (mailbox lag)
  • 22:02 bblack: cp1099 - varnish-backend-restart (mailbox lag)
  • 21:34 bawolff: deployed patch for T165846
  • 21:01 reedy@tin: Synchronized wmf-config/CommonSettings.php: Run Pdf Processors in firejails T164145 T164000 (duration: 00m 40s)
  • 20:16 subbu: updated parsoid to 141fc07d (T166655)
  • 20:10 ssastry@tin: Finished deploy [parsoid/deploy@bb0613c]: Updating Parsoid to 141fc07d (duration: 07m 02s)
  • 20:03 ssastry@tin: Started deploy [parsoid/deploy@bb0613c]: Updating Parsoid to 141fc07d
  • 18:52 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357169/2 (duration: 00m 39s)
  • 18:43 MaxSem: ran mwscript maintenance/namespaceDupes.php --wiki=etwiki --fix
  • 18:41 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357025/2 (duration: 00m 39s)
  • 18:36 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/355594/2 (duration: 00m 39s)
  • 18:29 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357186/2 (duration: 00m 42s)
  • 18:25 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357026/2 (duration: 00m 38s)
  • 18:11 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/356437/4 (duration: 00m 40s)
  • 16:25 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1089 original weight - T166935 (duration: 00m 38s)
  • 16:19 jynus: stopping db2037 and preparing for reimage
  • 15:16 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1089 weight - T166935 (duration: 00m 39s)
  • 15:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1089 weight - T166935 (duration: 00m 38s)
  • 14:45 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight - T166935 (duration: 00m 39s)
  • 13:47 bblack: rebooting lvs1010 again
  • 13:27 zeljkof: eu swat finished
  • 13:16 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: Lift IP throttle for Wikimedia Chile editathon (T166788) (duration: 00m 39s)
  • 13:02 bblack: rebooting lsv1010 (post-reinstall)
  • 12:54 marostegui: Stop MySQL db1047 - T166452
  • 09:06 marostegui: Stop replication on db1070 for maintenance - T153743
  • 08:10 godog: swift eqiad-prod decom ms-be1009 / 10 / 11 - T166489
  • 07:43 marostegui: Stop labsdb1011 to take a backup - T153743
  • 07:41 jynus: stopping db2038 mysql and preparing for reimage
  • 07:15 marostegui: Deploy alter table in s2 (codfw master) this will generate lag in codfw - T166205
  • 06:20 marostegui: Deploy alter table s4 - on labsdb1001 - T166206
  • 06:15 marostegui: Deploy alter table on s3 - db1069 - T166278
  • 06:13 marostegui: Deploy alter table on s4 - db1053 - T166206
  • 06:12 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1053 - T166206 (duration: 00m 39s)
  • 05:58 marostegui: Stop MySQL on db1095 to take a backup - T153743
  • 05:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add coments to db1089's current status (duration: 00m 39s)
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Jun 5 02:27:53 UTC 2017 (duration 6m 2s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 08m 14s)

2017-06-04

  • 10:31 ema: mw2256 down, console stuck on 'Starti'. power cycled.
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 09m 12s)

2017-06-03

  • 05:20 marostegui: Reboot db1089 - T166933
  • 05:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - it is broken (duration: 00m 41s)
  • 02:30 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Jun 3 02:30:27 UTC 2017 (duration 6m 24s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 47s)
  • 00:04 mutante: wikitech-static-iad: mv /etc/acme/cert/wikitech-static-iad-signed.csr /etc/acme/cert/wikitech-static-iad.chained.crt ; wikitech-static-ord: copy wiki logo: /srv/mediawiki/images# wget https://wikitech-static-iad.wikimedia.org/w/images/labswiki.png

2017-06-02

  • 23:53 demon@tin: Synchronized wmf-config/throttle.php: pruning some old throttle exceptions (duration: 00m 40s)
  • 23:46 mutante: wikitech-static-iad: edited acme_tiny.py to adjust URL to agreement PDF, to fix ""Provided agreement URL [1] does not match current agreement URL[2]"
  • 23:45 mutante: wikitech-static-iad: create new cert for "iad" hostname, using acme-setup/acme-tiny: /usr/local/sbin# acme-setup -i "wikitech-static-iad" -s "wikitech-static-iad.wikimedia.org" ; python acme_tiny.py --account-key /etc/acme/acct/acct.key --csr /etc/acme/csr/wikitech-static-iad.pem --acme-dir /var/acme/challenge/ > /etc/acme/cert/wikitech-static-iad-signed.csr  ; had to hack acme_tiny.py
  • 23:22 mutante: wikitech-static-ord copied Lets-Encrypt intermediate certs from /usr/local/share/ca-certificates on old server
  • 23:19 mutante: wikitech-static (iad): adjust Apache config to use wikitech-static-iad
  • 23:18 mutante: wikitech-static-ord: installed package upgrades, installed vim, removing "ord" from Apache config after DNS change ..
  • 23:14 mutante: maintenance on status.wikimedia.org and wikitech-static.wikimedia.org
  • 20:08 ejegg: re-enabled AstroPay/dLocal payment methods
  • 19:36 ejegg: updated payments-wiki from 5edd788 to 7a50542
  • 19:23 ejegg: updated CiviCRM from 9c06bd2 to c70ae65
  • 18:29 mobrovac@tin: Finished deploy [restbase/deploy@4b14527]: (no justification provided) (duration: 00m 41s)
  • 18:29 mobrovac@tin: Started deploy [restbase/deploy@4b14527]: (no justification provided)
  • 18:28 mobrovac@tin: Started deploy [restbase/deploy@4b14527]: h
  • 17:01 bblack: starting wmf-auto-reimage on lvs1007-10
  • 16:16 RainbowSprinkles: gerrit2001: gerrit updated to 2.13.8+git1-wmf.4
  • 16:03 bblack: start wmf-auto-reimage of lvs1011, lvs1012
  • 15:01 jynus: restarting ircecho on tegment
  • 14:32 mobrovac@tin: Finished deploy [restbase/deploy@4b14527]: Add the extract_html property to the summary end point for T165017 (duration: 06m 43s)
  • 14:25 mobrovac@tin: Started deploy [restbase/deploy@4b14527]: Add the extract_html property to the summary end point for T165017
  • 13:28 gehel: restart elastic2003 to reload logging configuration
  • 12:11 hashar: restarting Jenkins to upgrade the logstash plugin
  • 09:49 jynus: stopping db2041 to prepare it for reimage
  • 09:18 marostegui: Deploy alter table s3 - db1015 - T166278
  • 09:12 marostegui: Deploy alter table s3 - labsdb1003 - T166278
  • 07:47 marostegui: Resume alter table on db1047 enwiki.revision - T166452
  • 07:45 moritzm: uploaded gerrit 2.13.8+git1-wmf4 to apt.wikimedia.org
  • 07:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1059 - T166206 (duration: 00m 39s)
  • 07:36 marostegui: Deploy alter table on s4 - labsdb1009 - T166206
  • 07:02 akosiaris: starting fleet wide PCC for gerrit change 356030. Should take a while to complete
  • 05:25 jynus@tin: Synchronized wmf-config/db-eqiad.php: Emergency pool of db1049 (duration: 00m 48s)
  • 04:42 elukey: removed some old scap revs for the Analytics refinery on stat1002 to free space (git fat jars replicating after each deployment, known issue)
  • 02:46 bd808: Loadavg on mw1198 very high (44+) and nginx/hhvm checks flapping

2017-06-01

  • 23:33 twentyafterfour: phabricator upgrade complete.
  • 23:29 twentyafterfour: Performing phabricator update, expect momentary downtime.
  • 23:25 twentyafterfour: Preparing phabricator update to tag release/2017-06-01/1 [ https://phabricator.wikimedia.org/project/view/2802/ ]
  • 23:20 ebernhardson@tin: Synchronized wmf-config/CirrusSearch-common.php: T163463: apply sister search restrictions requested by enwiki (duration: 00m 39s)
  • 23:18 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: T163463: apply sister search restrictions requested by enwiki (duration: 00m 40s)
  • 21:59 RainbowSprinkles: gerrit2001: Upgraded to 2.13.8, seems to be running fine this time.
  • 20:37 mobrovac@tin: Finished deploy [citoid/deploy@ba0db9c]: Update spec to minimise alert noise - T163986 (duration: 05m 20s)
  • 20:32 mobrovac@tin: Started deploy [citoid/deploy@ba0db9c]: Update spec to minimise alert noise - T163986
  • 20:23 bsitzmann@tin: Finished deploy [mobileapps/deploy@2a8e648]: Update mobileapps to c4dc72d (duration: 05m 18s)
  • 20:18 bsitzmann@tin: Started deploy [mobileapps/deploy@2a8e648]: Update mobileapps to c4dc72d
  • 19:30 mepps: updated SmashPig from 4f84d88 to d4458fa
  • 19:25 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.2
  • 19:23 gehel@tin: Finished deploy [wdqs/wdqs@3936e36]: (no justification provided) (duration: 01m 20s)
  • 19:22 gehel@tin: Started deploy [wdqs/wdqs@3936e36]: (no justification provided)
  • 19:08 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: Revert "Add RejectParserCacheValue handler for mw-parser-output invalidation" T166345 (duration: 00m 43s)
  • 18:21 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1002.eqiad.wmnet
  • 18:20 gehel: wdqs1002 back in LVS - thermal paste added - T166524
  • 17:42 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1002.eqiad.wmnet
  • 17:41 gehel: shutting down wdqs1002 for maintenance - T166524
  • 17:02 elukey: sto mysql, eventlogging_sync and shutdown db1047 (analytics-store) for maintenance - T159266
  • 16:22 jynus: retrying reimage of db2044
  • 15:03 elukey: restart kafka100[23] for jvm upgrades
  • 14:21 mforns@tin: Finished deploy [analytics/refinery@7540403]: (no justification provided) (duration: 02m 50s)
  • 14:18 mforns@tin: Started deploy [analytics/refinery@7540403]: (no justification provided)
  • 14:00 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2048 after maintenance (duration: 00m 44s)
  • 13:18 marostegui: Deploy alter table s3 revision on labsdb1001 - T166278
  • 13:15 marostegui: Deploy alter table s3 revision on labsdb1011 - T166278
  • 13:11 gilles: restored original configuration on mwdebug1001
  • 11:33 godog: test upgrade of swift 2.10 on ms-fe2005 - T162609
  • 10:24 gilles: Point nutcracker to localhost on mwdebug1001
  • 10:06 godog: run puppet to blacklist acpi_power_meter across the fleet and rmmod the module
  • 09:51 _joe_: refreshing facts on the puppet compiler
  • 08:15 godog: upgrade grafana to 4.3.2 on labmon1001 / krypton
  • 07:49 gilles: editing wikiversions.php manually on mwdebug1001 to point enwiki to wmf.2
  • 06:08 marostegui: Deploy alter table on s3, labsdb1010 - T166278
  • 06:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1035 - T166278 (duration: 00m 57s)
  • 06:04 marostegui: Deploy alter table on s3, db1044 - T166278
  • 06:02 marostegui: Deploy alter table on s3, dbstore1001 - T166278
  • 05:58 elukey: powercycle cp3032 - T166758
  • 05:43 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3032.esams.wmnet
  • 02:52 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 1 02:52:25 UTC 2017 (duration 6m 42s)
  • 02:45 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 02s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 29s)

2017-05-31

  • 23:59 dereckson@tin: Synchronized wmf-config/throttle.php: Add throttule rules for 2017-06-01 Fortaleza event (T166619) (duration: 00m 41s)
  • 23:03 ejegg: disabled d*local payment methods
  • 22:37 ejegg: updated payments-wiki from 4786e7c to 5edd788
  • 22:14 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: wikipedias back to 1.30.0-wmf.1
  • 21:41 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: touch InitialiseSettings.php (duration: 00m 39s)
  • 21:37 ejegg: reverted payments-wiki to 4786e7c
  • 21:32 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: wikipedias to 1.30.0-wmf.2
  • 21:29 AaronSchulz: Restored mwdebug1001 to wmf1 with normal nutcracker/memcached and puppet running
  • 21:23 ejegg: updated payments-wiki from 4786e7c to d467d3b
  • 20:17 RainbowSprinkles: gerrit: bringing offline for a few minutes for point release (2.13.4 -> 2.13.8, T158946)
  • 20:15 mobrovac@tin: Finished deploy [citoid/deploy@7d69554]: Relaxing date validation - T132308 (duration: 02m 32s)
  • 20:13 mobrovac@tin: Started deploy [citoid/deploy@7d69554]: Relaxing date validation - T132308
  • 19:31 demon@tin: Synchronized scap/plugins/clean.py: cleanup r us (duration: 00m 42s)
  • 19:13 gehel@tin: Finished deploy [wdqs/wdqs@af495a2]: (no justification provided) (duration: 01m 29s)
  • 19:11 gehel@tin: Started deploy [wdqs/wdqs@af495a2]: (no justification provided)
  • 17:30 godog: swift eqiad-prod decom ms-be100[128] - T166489
  • 16:53 ema: restart varnish-backend on cp1074
  • 16:53 ema: merge cache_maps into cache_upload: finished moving LVS IPs T164608
  • 16:33 hoo@tin: Synchronized wmf-config/InitialiseSettings.php: Index article placeholders up to Q16956 on cywiki (T162244) (duration: 00m 42s)
  • 15:58 hoo: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds
  • 15:31 ema: merge cache_maps into cache_upload: move LVS IPs T164608
  • 14:34 XioNoX: init7 fixed the issue, ping works from the init7 interface, reenabling the BGP session - T166663
  • 14:02 moritzm: upgrading install2002 to reprepro 5.1.1
  • 13:26 hoo@tin: Synchronized wmf-config/Wikibase-production.php: WikibaseClient: Don't persist Statement usages (T151717) (duration: 00m 41s)
  • 13:21 ema: cache_eqiad: upgrade to jessie 8.8 point release T164703
  • 13:20 hoo@tin: Synchronized wmf-config/InitialiseSettings.php: Log "api-readonly" errors (T164191, T123867) (duration: 00m 43s)
  • 13:15 ema: cache_codfw: upgrade to jessie 8.8 point release T164703
  • 13:10 ema: cache_esams: upgrade to jessie 8.8 point release T164703
  • 13:08 marostegui: Stop MySQL on db1048 and shutdown the host for maintenance - T160731
  • 13:08 moritzm: uploaded zookeeper 3.4.5+dfsg-2+deb8u2 to apt.wikimedia.org
  • 12:36 ema: cache_ulsfo: upgrade to jessie 8.8 point release T164703
  • 12:35 marostegui: Deploy alter table on s3 revision table - db1035 - https://phabricator.wikimedia.org/T166278
  • 12:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1077, depool db1035 - T166278 (duration: 00m 41s)
  • 12:22 ema: cp1008: upgrade to jessie 8.8 point release T164703
  • 12:11 XioNoX: Disable v6 BGP session with Init7 in knams because of routing loop on their network
  • 12:04 volans: merged stringify_facts=false for production hosts T166372
  • 10:59 jynus: preparing for backup and reimage to jessie of db2044
  • 10:35 moritzm: updated reprepro on install1002 to 5.1.1 from backports (for support of dbgsym and buildinfo files)
  • 10:29 godog: remove salt-minion salt-common from stretch-wikimedia - T166646
  • 09:30 marostegui: Deploy alter table on s3 revision table - db1078 - https://phabricator.wikimedia.org/T166278
  • 09:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1078, depool db1077 - T166278 (duration: 00m 42s)
  • 09:24 _joe_: etcd in eqiad in read-write mode
  • 09:22 _joe_: started etcd replica eqiad => codfw
  • 09:15 _joe_: etcd replica codfw => eqiad now stopped
  • 09:09 _joe_: etcd in read-only mode for switchover to eqiad
  • 08:27 godog: complete linux 4.9 upgrade on Debian ms-be2* machines
  • 08:24 moritzm: installing imagemagick security updates on trusty (jessie already fixed)
  • 07:47 elukey: restart kafka on kafka10[14,22,20] for jvm upgrades
  • 06:45 moritzm: installing sudo security updates
  • 06:45 marostegui: Deploy alter table s3 revision table - dbstore1002 - T166278
  • 06:31 marostegui: Deploy alter table on s4 - db1059 - T166206
  • 06:31 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1081, depool db1059 - T166206 (duration: 00m 41s)
  • 06:12 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1078 - T166278 (duration: 00m 43s)
  • 06:04 marostegui: Deploy alter table on s3 revision table - db1078 - T166278
  • 06:04 marostegui: Deploy alter table on s3 revision table - db1095 - T166278
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 45s)

2017-05-30

  • 23:15 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Page images can come outside the lead for all projects except Wikipedia (duration: 00m 41s)
  • 23:09 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Add Wikipedia wordmark in Serbian/Macedonian (duration: 00m 45s)
  • 23:08 demon@tin: Synchronized static/images/mobile/copyright/: Compressed + new images (duration: 00m 42s)
  • 22:43 Reedy: created securepoll_elections.el_owner on testwiki T166568
  • 22:20 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Make Flow default in all namespaces on cawikiquote (T165497) (duration: 00m 43s)
  • 22:20 mutante: Welcome new root shell user herron (T166587)
  • 22:10 RoanKattouw: Running populateContentModel.php on all talk namespaces for all tables on cawikiquote
  • 21:28 RoanKattouw: Running Flow/convertNamespaceFromWikitext.php on all discussion namespaces on cawikiquote (T165497)
  • 21:21 mobrovac@tin: Finished deploy [zotero/translators@f051fe7]: Translators update for T95128 and T166292 (duration: 00m 05s)
  • 21:21 mobrovac@tin: Started deploy [zotero/translators@f051fe7]: Translators update for T95128 and T166292
  • 20:36 AaronSchulz: Set all wikis to wmf.2 via wikiversions.php on mwdebug1001 only; manual nutcracker running a screen to use local memcached for debugging
  • 20:18 mutante: LDAP - added uid=herron to groups "ops" and "wmf" for ops onboarding of Keith (T166587)
  • 20:09 gilles: Restarting nutcracker on mwdebug1001
  • 20:06 gilles: Overwriting nutcracker.yml on mwdebug1001 to point memcache cluster only to memcached on localhost
  • 20:05 gilles: Manually installed memcached on mwdebug1001, running on default port 11211
  • 20:04 gilles: Disabled puppet on mwdebug1001
  • 18:37 urandom: T160570: Upgrading dev env to Cassandra 3.11 (snapshot)
  • 17:55 thcipriani: branching 1.30.0-wmf.3 T165957
  • 17:28 arlolra: Updated Parsoid to d07dfe1a (T161151, T136653)
  • 17:17 arlolra@tin: Finished deploy [parsoid/deploy@744f719]: Updating Parsoid to d07dfe1a (duration: 08m 41s)
  • 17:09 arlolra@tin: Started deploy [parsoid/deploy@744f719]: Updating Parsoid to d07dfe1a
  • 16:40 moritzm: installing shadow regression update
  • 15:33 marostegui: Deploy alter table on s3.revision on labsdb1009 - T166278
  • 15:14 moritzm: installing bash security updates on trusty (jessie already fixed)
  • 15:03 moritzm: installing mysql-connector-java security update on analytics1031
  • 14:53 _joe_: failing citoid over to codfw, T165105
  • 14:48 moritzm: updating mw2140-mw2147, mw2251-mw2253 to HHVM 3.18
  • 14:27 _joe_: restarting squid on aluminium.
  • 13:58 moritzm: updating mw2240-mw2242, mw2254-mw2260 to HHVM 3.18
  • 13:47 aude@tin: Synchronized wmf-config/InitialiseSettings.php: Set wgPageImagesAPIDefaultLicense for wikidata (duration: 00m 41s)
  • 13:44 elukey: restart kafka on kafka1013 for jvm upgrades
  • 13:35 aude@tin: Synchronized wmf-config/Wikibase-production.php: Enable Wikibase echo notifications on Wikipedia, except enwiki, dewiki, frwiki T142102 (duration: 00m 42s)
  • 13:21 elukey: restart kafka on kafka1001 for jvm upgrades
  • 13:14 ema: upgrade prometheus-node-exporter to 0.14.0~git20170523-0 on ubuntu systems
  • 12:43 elukey: restart kafka on kafka200[123] for jvm upgrades (main-codfw, eventbus)
  • 12:10 moritzm: installin jbig2dec security updates
  • 12:07 elukey: restart kafka on kafka1012 for jvm upgrades
  • 12:01 moritzm: installing jbig2dec security updates
  • 11:48 marostegui: Rename update table on enwiki on db1089 host - T139342
  • 11:31 moritzm: installing fop security updates
  • 11:14 godog: upgrade grafana to 4.3.1 on krypton
  • 10:44 gilles: run refreshFileHeaders for group 0 wikis on Terbium
  • 10:32 akosiaris: enable calico IPv6 BGP peering for cr1-eqiad
  • 10:18 jynus: stopping and backing up db2048 in preparation for reimage
  • 09:50 ema: upgrade prometheus-node-exporter to 0.14.0~git20170523-0 on debian systems
  • 09:43 jynus: restarting db2055 for mariadb and kernel upgrade
  • 08:23 elukey: restart jmxtrans on all the kafka brokers (analytics+main-codfw/eqiad) for jvm upgrades
  • 08:17 elukey: restart kafka on kafka1018 for jvm upgrades
  • 07:38 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1002.eqiad.wmnet
  • 07:38 gehel: wdqs1002 back in LVS - T166524
  • 07:09 marostegui: Deploy alter table on enwiki.revision on db1047 - T166452
  • 06:45 marostegui: Deploy alter table on s3 db1038 - T166278
  • 06:41 marostegui: Deploy alter table on s3 dbstore1002 - https://phabricator.wikimedia.org/T166278
  • 06:35 marostegui: Deploy alter table s4 - db1081 - https://phabricator.wikimedia.org/T166206
  • 06:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1084, depool db1081 - T166206 (duration: 00m 59s)
  • 06:23 marostegui: Deploy alter table on s3 dbstore2001 - T166278
  • 02:49 l10nupdate@tin: ResourceLoader cache refresh completed at Tue May 30 02:49:20 UTC 2017 (duration 6m 44s)
  • 02:42 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 54s)
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 22s)

2017-05-29

  • 20:04 mobrovac@tin: Started restart [zotero/translation-server@50f216a]: Memory at 50%
  • 19:56 gehel: removing wdqs1002 from LVS pending investigation of T166524
  • 19:55 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1002.eqiad.wmnet
  • 18:57 gehel: restarting wdqs-updater on wdqs1002
  • 17:40 volans: re-enabled puppet on tegmen and re-enabled raid_handler T163998
  • 17:29 volans: disabled puppet on tegmen and disabled raid_handler temporarily T163998
  • 15:02 gehel: restarting wdqs-updater on wdqs1002
  • 14:33 moritzm: rebooting multatuli for systemd modules-load.d debugging
  • 14:24 godog: upgrade prometheus-hhvm-exporter to 0.3-1 in codfw/eqiad with less verbose logging - T158286
  • 14:15 gehel: reset remote for elasticsearch/plugins deployment - T163708
  • 14:14 marostegui: Stop MySQL labsdb1009 to take a backup - T153743
  • 14:04 gehel: starting upgrade to elasticsearch 5.3.2 on cirrus codfw cluster - T163708
  • 14:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2036 - T166278 (duration: 00m 41s)
  • 14:01 marostegui: Deploy alter table s3 on codfw master db2018 - T166278
  • 13:42 moritzm: updating gdb on mw* servers
  • 13:10 marostegui: Stop replication on db1070 to flush tables for export - T153743
  • 13:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 - T153743 (duration: 00m 41s)
  • 13:02 akosiaris: enable puppet across eqiad/esams after puppetmaster upgrade.
  • 12:52 akosiaris: disable puppet across eqiad/esams for puppetmaster upgrade. This should avoid any irc spam about failed puppet agent runs
  • 12:52 akosiaris: enable puppet across codfw/ulsfo after puppetmaster upgrade
  • 12:41 akosiaris: disable puppet across codfw/ulsfo for puppetmaster upgrade. This should avoid any irc spam about failed puppet agent runs
  • 12:36 moritzm: installing imagemagick security updates on jessie
  • {{safesubst:SAL entry|1=12:31 akosiaris: update kubernetes policy-options on cr{1,2}-{eqiad,codfw}. T165732}}
  • 10:39 moritzm: installing fop security updates
  • 10:18 ema: upgrade nginx to 1.11.10-1+wmf1 on hassium and hassaleh
  • 09:53 moritzm: upgrade remaining mw* hosts already running HHVM 3.18 to 3.18.2+dfsg-1+wmf4
  • 09:22 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 (duration: 00m 41s)
  • 09:01 marostegui: Drop gather tables from: testwiki, test2wiki, enwikivoyage, hewiki, enwiki - T166097
  • 08:02 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove db1023 - T166486 (duration: 00m 41s)
  • 08:02 marostegui@tin: Synchronized wmf-config/db-codfw.php: Remove db1023 - T166486 (duration: 00m 42s)
  • 07:38 marostegui: Stop MySQL on db1095 to take a backup - this will make labsdb1009,10 and 11 break replication while it is down - T153743
  • 07:01 _joe_: reeanbling scap on mw2140, T166328
  • 06:45 _joe_: restarting changeprop on scb1002, using 15 gigs of RAM
  • 06:42 marostegui: Deploy alter table s3 - dbstore2002 - T166278
  • 06:41 marostegui: Deploy alter table s4 - dbstore1002 - T166206
  • 06:33 _joe_: trying to restart pdfrender on scb1002
  • 06:32 marostegui: Deploy alter table s3 - db2036 - T166278
  • 06:32 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2043, depool db2036 - T166278 (duration: 01m 44s)
  • 06:29 _joe_: powercycling mw1294
  • 06:11 marostegui: Deploy alter table on s4 db1084 - T166206
  • 06:10 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1091, depool db1084 - T166206 (duration: 02m 45s)
  • 06:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1091 - T166206 (duration: 03m 01s)
  • 05:54 marostegui: Restart MySQL on db1047 - T166452
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 20s)

2017-05-28

  • 13:19 jynus: restart db1069:3313 mysql instance, stuck on replication
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 46s)

2017-05-27

  • 02:51 l10nupdate@tin: ResourceLoader cache refresh completed at Sat May 27 02:51:13 UTC 2017 (duration 6m 49s)
  • 02:44 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 05s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 41s)

2017-05-26

  • 14:29 marostegui: Stop pt-table-checksum on s1 - T162807
  • 14:04 marostegui: Deploy alter table on s3 revision table db2043 - T166278
  • 14:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2050, depool db2043 - T166278 (duration: 00m 41s)
  • 13:57 _joe_: consuming the backlog of htmlCacheUpdate jobs for enwiktionary
  • 13:19 gehel: restart wdqs-updater on all wdqs nodes - T166378
  • 12:55 marostegui: Deploy alter table s4 on db1097 - T166206
  • 12:44 elukey: Restart Hadoop daemons on analytics100[12] (Hadoop master nodes) for jvm upgrades
  • 12:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 - T166206 (duration: 00m 41s)
  • 10:56 gehel: restart wdqs-updater on all wdqs nodes - T166378
  • 09:30 volans: slowly testing if puppet stringify_facts=false is a noop across the fleet T166372
  • 08:45 volans: killed daemonized puppet on tegmen, lvs1006 T166203
  • 06:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1097 - T166206 (duration: 00m 40s)
  • 06:10 marostegui: Deploy alter table on s3 - db2050 - T166278
  • 06:09 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2057, depool db2050 - T166278 (duration: 00m 56s)
  • 06:05 marostegui: Resume pt-table-checksum on s1 - T162807
  • 02:58 l10nupdate@tin: ResourceLoader cache refresh completed at Fri May 26 02:58:48 UTC 2017 (duration 6m 37s)
  • 02:52 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 59s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 31s)
  • 01:33 bsitzmann@tin: Finished deploy [mobileapps/deploy@a8d0c91]: Update mobileapps to db6493c (duration: 03m 45s)
  • 01:29 bsitzmann@tin: Started deploy [mobileapps/deploy@a8d0c91]: Update mobileapps to db6493c
  • 00:16 thcipriani@tin: Finished scap: SWAT: Fix version of DonationInterface deployed to donatewiki T166302 (duration: 19m 44s)

2017-05-25

  • 23:56 thcipriani@tin: Started scap: SWAT: Fix version of DonationInterface deployed to donatewiki T166302
  • 23:44 thcipriani@tin: Synchronized php-1.30.0-wmf.2/resources/src/jquery/jquery.makeCollapsible.js: SWAT: jquery.makeCollapsible: Restore considering empty <a> as part of toggle T166298 (duration: 00m 42s)
  • 23:20 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Revert "Add Code of Conduct footer links to wikitech and mw.o"" PART II (duration: 00m 41s)
  • 23:19 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Revert "Add Code of Conduct footer links to wikitech and mw.o"" PART I (duration: 00m 43s)
  • 22:58 thcipriani: mw1170 wikipedias back to 1.30.0-wmf.1
  • 22:26 thcipriani: mw1170 running wmf.2 for all wikis for troubleshooting T166345
  • 22:24 thcipriani: mw1161 wikipedias back to running running wmf.1
  • 22:20 thcipriani: mw1161 running wmf.2 for all wikis for troubleshooting T166345
  • 22:17 papaul: ores200[1-9] - signing puppet certs, salt-key, initial run
  • 21:43 papaul: OS install on ores200[1-9]
  • 21:31 arlolra: Updated Parsoid to 5b52d07b (T166068)
  • 21:25 arlolra@tin: Finished deploy [parsoid/deploy@4a2c3f4]: Updating Parsoid to 5b52d07b (duration: 07m 43s)
  • 21:18 arlolra@tin: Started deploy [parsoid/deploy@4a2c3f4]: Updating Parsoid to 5b52d07b
  • 20:30 urandom: T164865: RESTBase dev, disable revision range deletes
  • 20:25 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: wikipedias back to 1.30.0-wmf.1
  • 19:48 chasemp: restart redises on rdb2003
  • 19:44 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: revert SWAT: Add Code of Conduct footer links to wikitech and mw.o Part II (duration: 00m 38s)
  • 19:43 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: revert SWAT: Add Code of Conduct footer links to wikitech and mw.o Part I (duration: 00m 39s)
  • 19:23 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Add Code of Conduct footer links to wikitech and mw.o Part II (duration: 00m 39s)
  • 19:22 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Code of Conduct footer links to wikitech and mw.o Part I (duration: 00m 39s)
  • 19:09 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.2
  • 18:47 volans: completed upgrade of facter across the fleet T166203 (apart few hosts down)
  • 18:39 volans: forcing BBU learn on db1016
  • 18:34 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Remove special Math extension settings for hewiki Remove UseMathJax from CommonSettings.php T165475 (duration: 00m 43s)
  • 18:27 urandom: T164865: RESTBase dev, re-enable render range deletes
  • 18:12 thcipriani: mwscript namespaceDupes.php hewiki --fix
  • 18:12 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add namespace aliases for Hebrew Wikipedia T164858 (duration: 00m 47s)
  • 17:51 volans@sarin: conftool action : set/pooled=inactive; selector: name=mw2140.codfw.wmnet
  • 17:31 jynus@neodymium: conftool action : set/pooled=no; selector: name=mw2140.codfw.wmnet
  • 17:30 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2055 after maintenance, 2nd try (duration: 02m 42s)
  • 17:25 jynus@neodymium: conftool action : set/pooled=inactive; selector: name=mw2140.codfw.wmnet
  • 17:14 bsitzmann@tin: Finished deploy [mobileapps/deploy@614d752]: Update mobileapps to 946fe1f (duration: 04m 04s)
  • 17:12 jynus: powercycling mw2140
  • 17:10 bsitzmann@tin: Started deploy [mobileapps/deploy@614d752]: Update mobileapps to 946fe1f
  • 17:07 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2055 after maintenance (duration: 02m 43s)
  • 16:27 urandom: T164865: RESTBase dev, re-enable revision range deletes
  • 15:43 godog: delete thumbnails with > 2000px for wikivoyage / wikiversity / wikisource / wikiquote - T162796
  • 15:28 jynus: restarting and upgrading db2055 for kernel downgrade
  • 14:40 bblack: restart cp1074 backend (mailbox)
  • 14:08 godog: shut ms-be1021 for BBU replacement - T163777
  • 13:39 jynus: restarting and upgrading db2055 for maintenance
  • 13:29 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool db2055 for maintenance (duration: 00m 41s)
  • 13:22 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1077 back to high load after maintenance (duration: 00m 41s)
  • 13:04 elukey: restart cassandra-a on aqs1004 to test https://gerrit.wikimedia.org/r/354107
  • 12:41 akosiaris: cordon kubernetes100{2,3,4} for testing calico-node on kubernetes1001
  • 10:01 elukey: restart HDFS datanode daemons on all the hadoop worker nodes for jvm upgrades
  • 09:39 elukey: reimage analytics1030 to Debian Jessie - T165529
  • 09:35 elukey: restart Yarn nodemanager daemons on all the hadoop worker nodes for jvm upgrades
  • 09:28 godog: ban commons object on request in ulsfo
  • 09:07 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1077 after maintenance with low weight (duration: 00m 41s)
  • 08:25 jynus: stopping and restarting db1077
  • 08:03 volans: resuming slow upgrade of facter across the fleet checking is a noop T166203
  • 07:58 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1077 for maintenance and upgrade (duration: 00m 41s)
  • 07:40 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2057 - T166278 (duration: 00m 41s)
  • 07:28 godog: roll-restart jessie ms-be2* for linux 4.9 update - T162029
  • 06:21 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 - T166206 (duration: 00m 55s)
  • 05:58 marostegui: Start pt-table-checksum on s1 - T162807
  • 02:51 l10nupdate@tin: ResourceLoader cache refresh completed at Thu May 25 02:51:53 UTC 2017 (duration 6m 38s)
  • 02:45 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 18s)
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 09m 46s)
  • 00:18 aaron@tin: Synchronized wmf-config/ProductionServices.php: Enable HTTPs for Swift usage (duration: 00m 41s)
  • 00:15 aaron@tin: Synchronized wmf-config/filebackend.php: Enable HTTPs for Swift usage (duration: 00m 41s)
  • 00:10 twentyafterfour: phabricator upgrade complete, service is online
  • 00:06 twentyafterfour: upgrading phabricator, expect momentary downtime

2017-05-24

  • 23:53 ejegg: updated payments-wiki from 5fa4a70 to 4786e7c
  • 23:49 XenoRyet: updated civicrm from 9b7a74c to 9c06bd2
  • 23:28 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Allow page images outside the lead on Wikivoyage wikis (T166251) (duration: 00m 41s)
  • 23:19 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable related pages for everyone (T155079) (duration: 00m 42s)
  • 23:18 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable print styles in Minerva (T163287) (duration: 00m 42s)
  • 23:10 catrope@tin: Synchronized multiversion/MWMultiVersion.php: Allow absolute script path for getMediaWikiCli() (duration: 00m 44s)
  • 22:33 krinkle@tin: Synchronized php-1.30.0-wmf.2/extensions/wikihiero: Fix styles queue warning - T92459 (duration: 00m 42s)
  • 22:02 mutante: terbium: dbtree: git stash and git pull origin to fix unclean repo state, deploy fix to syntax error
  • 21:53 urandom: T164865: Disabling range delete-based render culling, dev env
  • 21:34 Dereckson: Run fixProofreadIndexPagesContentModel.php new version (with Gerrit:355534 fix) to every wikisource
  • 21:10 Dereckson: Fixed wikisource Index: content model for ta.wikisource, en.wikisource and not wikisource databases (frrwiki + test2 + sourceswiki)
  • 21:10 demon@tin: Synchronized php-1.30.0-wmf.2/extensions/ProofreadPage/maintenance/fixProofreadIndexPagesContentModel.php: Now with proper batch support (duration: 00m 41s)
  • 20:38 demon@tin: Synchronized scap/plugins/clean.py: cleanups (duration: 00m 41s)
  • 20:29 Dereckson: Run fixProofreadIndexPagesContentModel on vec.wikisource (requested by Tpt), aborted after 50k (as that's greater than the expected number of rows)
  • 20:08 ejegg: reverted payments-wiki to 5fa4a70
  • 20:04 ejegg: updated payments-wiki from 5fa4a70 to 4786e7c
  • 19:25 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.2
  • 19:20 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 01s)
  • 19:20 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:14 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 02s)
  • 19:14 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:12 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 01s)
  • 19:12 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:12 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 02s)
  • 19:12 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:12 demon@tin: Synchronized wmf-config/: Dropping old ExtensionMessages (duration: 00m 42s)
  • 19:11 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 02s)
  • 19:11 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:11 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 02s)
  • 19:11 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:07 demon@tin: Synchronized wmf-config/: Dropping old contribution-tracking-setup.php -- finally (duration: 00m 42s)
  • 19:03 demon@tin: Synchronized wmf-config/CommonSettings.php: Dropping old ContribTracking config (duration: 00m 41s)
  • 19:02 demon@tin: Synchronized .gitignore: Completeness (duration: 00m 41s)
  • 19:00 thcipriani@tin: Finished scap: SWAT: Use file width/height instead of metadata for getContentHeaders Batch/pipeline backend operations in refreshFileHeaders T150741 (duration: 03m 12s)
  • 18:57 thcipriani@tin: Started scap: SWAT: Use file width/height instead of metadata for getContentHeaders Batch/pipeline backend operations in refreshFileHeaders T150741
  • 18:56 thcipriani@tin: Synchronized php-1.30.0-wmf.2/extensions/TimedMediaHandler/handlers: SWAT: Make getContentHeaders rely on fallback width/height T150741 (duration: 00m 41s)
  • 18:55 thcipriani@tin: Synchronized php-1.30.0-wmf.2/extensions/PagedTiffHandler/PagedTiffHandler_body.php: SWAT: Update getContentHeaders signature T150741 (duration: 00m 42s)
  • 18:54 thcipriani@tin: Synchronized php-1.30.0-wmf.2/extensions/PdfHandler/PdfHandler_body.php: SWAT: Update getContentHeaders signature T150741 (duration: 00m 40s)
  • 18:31 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: mobileFrontend: Move first paragraph before infobox T150325 (duration: 00m 41s)
  • 18:18 thcipriani: running mwscript namespaceDupes.php trwiki --fix
  • 18:17 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create a new namespace "Vikiproje" for trwiki T166102 (duration: 00m 41s)
  • 18:10 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgUploadNavigationUrl on srwiki T165901 (duration: 00m 42s)
  • 18:00 urandom: T164865: Upgrading Cassandra from 3.7.3-instaclustr to 3.10
  • 17:45 ottomata: rolling druid back to 0.9.0
  • 16:58 moritzm: installing ghostscript regression update on trusty (jessie security update was not affected)
  • 16:56 jynus: restarting and upgrading db2047
  • 16:54 volans: pause slowly upgrading facter across the fleet, resuming tomorrow T166203
  • 16:37 marostegui: Stop pt-table-checksum on s1 - T162807
  • 16:26 bblack: restarting varnish backend on cp1099 (mailbox lag)
  • 15:45 godog: test-upgrade grafana 4.3.1 on labmon1001
  • 15:35 krinkle@tin: Synchronized php-1.30.0-wmf.2/resources/Resources.php: Restore mediawiki.page.watch.ajax dependency - Iebfda85c7 (duration: 00m 42s)
  • 15:00 godog: deploy thumbor 0.1.39 for memcache-based throttling - T151065
  • 14:54 moritzm: uploaded gerrit 2.13.8+wmf2 to apt.wikimedia.org
  • 14:04 moritzm: installing jasper security updates on trusty (jessie already fixed)
  • 13:59 marostegui: Start running pt-table-checksum on s1 (will not run over night for now) - T162807
  • 13:59 paravoid: cr2-esams: enabling netflows experimentally
  • 13:54 elukey: upgrade Druid daemons on druid100[123] to 0.10 - T164008
  • 13:28 volans: slowly upgrading facter across the fleet checking is a noop T166203
  • 13:14 godog: upload prometheus-hhvm-exporter 0.3-1 to jessie-wikimedia - T158286
  • 12:20 moritzm: upgrade application servers using HHVM 3.18 to the latest 3.18.2+wmf4 build
  • 12:09 moritzm: updating puppet on puppetmaster2002
  • 12:08 godog: bounce pybal on lvs1003 - T134893
  • 11:52 XioNoX: pregressively adding "remove-private" to ix4/6 and transit4/6 bgp groups on cr2-esams T83037
  • 11:36 moritzm: uploaded puppet_3.8.5-2~bpo8+2 to apt.wikimedia.org
  • 10:50 akosiaris: repool esams T133387
  • 10:46 volans: stopped temporarily ircecho to avoid alert spam
  • 10:43 ema: upgrade prometheus-node-exporter on lvs hosts to 0.14.0~git20170523-0 T160156
  • 10:43 ema: upgrade prometheus-node-exporter on cache hosts to 0.14.0~git20170523-0 T160156
  • 10:05 volans: forcing puppet run on failed hosts only in esams T133387
  • 09:59 XioNoX: asw-esams back up (T133387)
  • 09:53 XioNoX: rebooting asw-esams for upgrade (T133387)
  • 09:49 ema: upgrade prometheus-node-exporter on cache hosts to 0.14.0~git20170523-0 T147569
  • 09:26 godog: upload prometheus-node-exporter 0.14.0~git20170523-0 to jessie-wikimedia - T160156
  • 09:15 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: cp3036.esams.wmnet
  • 09:10 akosiaris: drain esams for network tests for T133387
  • 08:52 marostegui: Deploy alter table on codfw master (db2019 and let it replicate) on s4 - T166206
  • 08:51 joal@tin: Finished deploy [analytics/refinery@9377d9c]: Deploying to fix yesterday's deploy bugs (duration: 02m 44s)
  • 08:49 akosiaris: depool cp3036 for T133387 testing
  • 08:49 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: cp3036.esams.wmnet
  • 08:48 joal@tin: Started deploy [analytics/refinery@9377d9c]: Deploying to fix yesterday's deploy bugs
  • 07:29 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1094 - T164530 (duration: 00m 41s)
  • 07:17 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1086, depool db1094 - T164530 (duration: 00m 41s)
  • 07:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1079, depool db1086 - T164530 (duration: 00m 42s)
  • 06:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1079 - T164530 (duration: 00m 54s)
  • 06:34 marostegui: Deploy alter table on s2.fawiki directly on codfw master (db2029) after running the clean up duplicates script - https://phabricator.wikimedia.org/T164530
  • 06:04 marostegui: Run pt-table-checksum on s7.frwiktionary - https://phabricator.wikimedia.org/T163190
  • 06:02 marostegui: Deploy alter table on s2 db1047 - https://phabricator.wikimedia.org/T162611
  • 03:04 l10nupdate@tin: ResourceLoader cache refresh completed at Wed May 24 03:04:03 UTC 2017 (duration 6m 45s)
  • 02:57 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 13m 38s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 30s)

2017-05-23

  • 21:00 mepps: deployed payment wiki 0c06f8e
  • 20:24 bblack: enable BBR for all caches - T147569
  • 20:20 bblack: enable BBR for all caches @ codfw - T147569
  • 20:10 bblack: enable BBR for all caches @ ulsfo - T147569
  • 20:06 bblack: disabling puppet on all caches for BBR deploy control
  • 19:52 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.30.0-wmf.2
  • 19:34 thcipriani@tin: Finished scap: testwiki to php-1.30.0-wmf.2 and rebuild l10n cache (duration: 27m 52s)
  • 19:07 bblack: resetting cp1074 queues again: "fq flow_limit 200 buckets 10240"
  • 19:06 thcipriani@tin: Started scap: testwiki to php-1.30.0-wmf.2 and rebuild l10n cache
  • 18:43 bblack: resetting cp1074 queues again: "fq flow_limit 200 buckets 4096"
  • 17:40 bblack: fq on cp1074 reset to flow_limit 200 (resets counters)
  • 17:24 ladsgroup@tin: Finished deploy [ores/deploy@4874809]: Trying again with deploying ores (duration: 21m 30s)
  • 17:09 thcipriani: starting branch cut for 1.30.0-wmf.2 T163512
  • 17:03 ladsgroup@tin: Started deploy [ores/deploy@4874809]: Trying again with deploying ores
  • 16:50 volans: upgrading facter on mw[2250-2259] as a test batch
  • 16:49 bblack: BBR: enabling bbr on cp1074 - T147569
  • 16:43 bblack: BBR: enabling mq+fq on cp1074 - T147569
  • 16:26 bblack: puppet re-enables on caches
  • 16:24 demon@tin: Synchronized README: testing (duration: 00m 38s)
  • 16:17 bblack: disabled puppet on all cp* for RPS-related deployments (just in case!)
  • 16:16 bblack: disabled puppet on all lvs* for RPS-related deployments
  • 16:15 ema: cp1074: enable prometheus node_exporter qdisc collector T147569
  • 15:50 marostegui: Stop replication on dbstore1002 s7 thread for maintenance - T163190
  • 15:23 volans: re-enabled raid_handler and puppet on tegmen
  • 15:02 otto@tin: Finished deploy [eventlogging/analytics@UNKNOWN]: (no justification provided) (duration: 00m 02s)
  • 15:01 otto@tin: Started deploy [eventlogging/analytics@UNKNOWN]: (no justification provided)
  • 14:56 otto@tin: Finished deploy [eventlogging/analytics@25f8096]: (no justification provided) (duration: 00m 04s)
  • 14:56 otto@tin: Started deploy [eventlogging/analytics@25f8096]: (no justification provided)
  • 14:42 volans: temporarily disabled raid_handler and puppet on tegmen
  • 14:25 jynus: deploying new check_raid monitoring write policy for megacli T166108
  • 14:21 Dereckson: EU SWAT done
  • 14:21 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable NewUserMessage on dty.wikipedia (T166121) (duration: 00m 38s)
  • 14:09 XioNoX: re-enabling BGP session to Init7 - T165288
  • 14:03 moritzm: installing nutcracker update in codfw (T163795)
  • 13:37 marostegui: Run CleanDuplicateScores script to clean up possible duplicates on fawiki before starting to create the UNIQUE keys - https://phabricator.wikimedia.org/T164530
  • 13:23 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Add *.esa.int to CopyUploadsDomains (T164643) (duration: 00m 39s)
  • 12:47 elukey@tin: Finished deploy [analytics/refinery@679aeea]: Updated stat1002 with the last refinery deployment (duration: 00m 42s)
  • 12:46 elukey@tin: Started deploy [analytics/refinery@679aeea]: Updated stat1002 with the last refinery deployment
  • 12:46 elukey@tin: Finished deploy [analytics/refinery@679aeea]: (no justification provided) (duration: 00m 01s)
  • 12:45 elukey@tin: Started deploy [analytics/refinery@679aeea]: (no justification provided)
  • 12:39 joal@tin: Finished deploy [analytics/refinery@679aeea]: Weekly deploy (2 weeks late, big deploy)-2 (duration: 01m 35s)
  • 12:38 joal@tin: Started deploy [analytics/refinery@679aeea]: Weekly deploy (2 weeks late, big deploy)-2
  • 12:24 joal@tin: Finished deploy [analytics/refinery@679aeea]: Weekly deploy (with 2 weeks late, big deploy) (duration: 04m 24s)
  • 12:20 moritzm: upgrading mw1261-mw1265 to hhvm 3.18.2+dfsg-1+wmf4
  • 12:20 joal@tin: Started deploy [analytics/refinery@679aeea]: Weekly deploy (with 2 weeks late, big deploy)
  • 12:13 joal@tin: Finished deploy [analytics/refinery@222d0c0]: (no justification provided) (duration: 03m 56s)
  • 12:09 joal@tin: Started deploy [analytics/refinery@222d0c0]: (no justification provided)
  • 12:09 moritzm: uploaded hhvm 3.18.2+dfsg-1+wmf4 to apt.wikimedia.org (contains extended upstream fix for XML reader crash) (T162586)
  • 11:56 elukey: set vm.dirty_backround_bytes=25165824 on aqs1004 as part of testing for https://gerrit.wikimedia.org/r/#/c/354107 (Rollback: set vm.dirty_backround_ratio=10)
  • 11:51 _joe_: uploaded calico-cni 1.8.3-1~wmf1 to jessie-wikimedia
  • 11:51 _joe_: uploaded calicoctl 1.2.0-1~wmf1 to jessie-wikimedia
  • 11:44 _joe_: pushed calico/node:1.2.0 to the docker registry
  • 11:42 _joe_: pushed calico/kube-policy-controller:0.6.0 to the docker registry
  • 11:19 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1093 - T164530 (duration: 00m 38s)
  • 11:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1088, depool db1093 - T164530 (duration: 00m 38s)
  • 10:54 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1085, depool db1088 - T164530 (duration: 00m 38s)
  • 10:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1085 - T164530 (duration: 00m 38s)
  • 10:27 godog: upload kafkatee 0.1.5 to jessie-wikimedia, remove unused kafkatee 0.1.4 from trusty-wikimedia - T149451
  • 10:14 marostegui: Run pt-table-checksum on s7.frwiktionary - T165743
  • 09:56 moritzm: restarting cassandra on restbase1013, restbase1014, restbase1015, restbase1017 to pick up Java security updates
  • 09:49 godog: swift eqiad-prod: ms-be1028/ms-be1039 object weight 3500 - T160640
  • 09:46 addshore: addshore@terbium:~$ ~/mymwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php et+wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log T164407
  • 09:24 moritzm: restarting cassandra on restbase1007, restbase1009, restbase1012 to pick up Java security updates
  • 09:16 hashar: Restarting Jenkins on contint1001
  • 09:15 elukey: reverted manual hack on mw1161 with scap pull
  • 08:15 elukey: apply manually https://gerrit.wikimedia.org/r/#/c/351854/2/wmf-config/jobqueue.php (persistent connections between hhvm and redis) to mw1161 as production test
  • 08:13 marostegui: Force WB as a default policy on db1031 because of degraded BBU
  • 08:00 addshore: the last script I started is now stopped
  • 07:48 addshore: addshore@terbium:~$ ~/mymwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php et+wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log T164407
  • 07:25 moritzm: installing openjdk security updates on maps and wdqs clusters
  • 07:13 marostegui: Deploy schema change on ruwiki.ores_classification directly on codfw master (db2028) - T164530
  • 07:07 marostegui: Rename gather_list gather_list_flag gather_list_item on db1078 db1094 and db1089 - T166097
  • 06:29 marostegui: Deploy alter table on s7.frwiktionary db2040 and db1034 - T165743
  • 06:20 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1021 - T162611 (duration: 00m 38s)
  • 06:20 marostegui: Deploy alter table on s2 eqiad master db1054 - T162611
  • 02:29 l10nupdate@tin: ResourceLoader cache refresh completed at Tue May 23 02:29:17 UTC 2017 (duration 6m 16s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 25s)

2017-05-22

  • 23:51 aaron@tin: Synchronized wmf-config: Move swift auth URL to ProductionServices (duration: 00m 52s)
  • 23:49 aaron@tin: Synchronized static/images/project-logos/hywiki-2x.png: Fix hy.wikipedia high resolution logos (duration: 00m 38s)
  • 23:48 aaron@tin: Synchronized static/images/project-logos/hywiki-1.5x.png: Fix hy.wikipedia high resolution logos (duration: 00m 38s)
  • 23:34 demon@tin: Synchronized wmf-config/ProductionServices.php: I4b19b4 (duration: 00m 38s)
  • 23:33 demon@tin: Synchronized wmf-config/filebackend.php: I4b19b4 (duration: 00m 38s)
  • 23:20 aaron@tin: Synchronized wmf-config/filebackend.php: Move swift auth URL to ProductionServices (duration: 00m 38s)
  • 23:19 aaron@tin: Synchronized wmf-config/ProductionServices.php: Move swift auth URL to ProductionServices (duration: 00m 38s)
  • 23:15 aaron@tin: Synchronized wmf-config/logging.php: Include DB shard in production SPI log entries (duration: 00m 38s)
  • 21:11 bblack: BBR: cp1065: reverted back to cubic+pfifo_fast - T147569
  • 21:10 bblack: BBR: cp1074: reverted back to cubic+pfifo_fast - T147569
  • 20:56 ladsgroup@tin: Finished deploy [ores/deploy@4874809]: Second deploy of ores for enabling frwiki damaging (duration: 05m 23s)
  • 20:50 ladsgroup@tin: Started deploy [ores/deploy@4874809]: Second deploy of ores for enabling frwiki damaging
  • 20:46 arlolra: Updated Parsoid to ebac1890 (T165139)
  • 20:43 ladsgroup@tin: Finished deploy [ores/deploy@263255a]: (no justification provided) (duration: 29m 07s)
  • 20:40 arlolra@tin: Finished deploy [parsoid/deploy@a9f2229]: Updating Parsoid to ebac1890 (duration: 07m 54s)
  • 20:32 arlolra@tin: Started deploy [parsoid/deploy@a9f2229]: Updating Parsoid to ebac1890
  • 20:14 ladsgroup@tin: Started deploy [ores/deploy@263255a]: (no justification provided)
  • 20:14 Amir1: starting deploy of ores:68cca85 to prod
  • 19:30 bblack: BBR: cp1074: switching congestion control to bbr manually - T147569
  • 19:29 bblack: BBR: cp1074: switching qdisc to mq+fq manually - T147569
  • 19:25 bblack: BBR: cp1065: switching congestion control to bbr manually - T147569
  • 19:16 bblack: BBR: cp1065: switching qdisc to mq+fq manually - T147569
  • 18:57 demon@tin: Synchronized README: forcing co-master sync (duration: 00m 42s)
  • 18:56 demon@tin: Pruned MediaWiki: 1.29.0-wmf.20 (duration: 01m 21s)
  • 18:22 ejegg: updated payments-wiki from 3b84521 to 5fa4a70
  • 18:18 bblack: rebooting acamar
  • 18:06 mepps: updated thank you send drush command
  • 18:01 mepps: updated civicrm 9b7a74c
  • 18:00 mepps: updated process control for new thank you send drush command 7c9572b
  • 17:49 ejegg: turned off paypal audit parser
  • 16:06 akosiaris: re-enable notifications in icinga
  • 15:27 _joe_: restarted puppetmasters in codfw
  • 13:23 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: Use wikitech db group instead of labswiki+ labtestwiki (duration: 00m 39s)
  • 13:22 akosiaris: silence icinga
  • 13:17 dcausse@tin: Synchronized wmf-config/CommonSettings.php: Enable TimedMediaHandler's new video player Beta Feature in Labs (duration: 00m 43s)
  • 13:02 _joe_: restarted etcdmirror on conf1002, consequence of https://gerrit.wikimedia.org/r/354095
  • 09:59 moritzm: repooled mw2221 (was down for hardware error)
  • 09:37 marostegui: Deploy alter table s7.frwiktionary on db1039 - https://phabricator.wikimedia.org/T165743
  • 09:15 marostegui: Drop table MediaWikiInstallPingback_15732959 from db1046, db1047 and dbstore1002 - T165836
  • 09:09 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026, depool db1045 - T164530 (duration: 00m 39s)
  • 08:55 marostegui: Restart mysql on db1069 to apply new replication filters - T165977
  • 08:50 marostegui: Restart mysql on db1095 to apply new replication filters - T165977
  • 08:46 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1026 - T164530 (duration: 00m 38s)
  • 08:02 marostegui: Deploy alter table on s2 (revision table) db1021 - https://phabricator.wikimedia.org/T162611
  • 08:02 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1021 - T162611 (duration: 00m 38s)
  • 07:56 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2035 - T162611 (duration: 00m 38s)
  • 07:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db2035 - T162611 (duration: 00m 38s)
  • 07:54 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1036 - T162611 (duration: 00m 39s)
  • 07:22 moritzm: installing openjdk-7 security updates on jessie
  • 07:14 marostegui: Deploy alter table on s5 wikidatawiki.ores_classification directly on codfw master - T164530
  • 07:07 marostegui: Run CleanDuplicateScores script to clean up possible duplicates on wikidatawiki before starting to create the UNIQUE keys - T164530
  • 06:56 marostegui: Deploy alter table s7.frwiktionary on dbstore1001 - https://phabricator.wikimedia.org/T165743
  • 06:53 marostegui: Deploy alter table s7.frwiktionary on db2029 (codfw master) - https://phabricator.wikimedia.org/T165743
  • 06:47 marostegui: Deploy alter table on db2035 and db1036 for s2. bgwiktionary,eowiki, idwiki - T162611
  • 06:47 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2035 - T162611 (duration: 00m 38s)
  • 06:37 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1036 - T162611 (duration: 00m 39s)
  • 06:02 smalyshev@tin: Finished deploy [wdqs/wdqs@e4301da]: Redeploy GUI due to breakage in T165228 (duration: 01m 50s)
  • 06:00 smalyshev@tin: Started deploy [wdqs/wdqs@e4301da]: Redeploy GUI due to breakage in T165228
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Mon May 22 02:26:59 UTC 2017 (duration 6m 0s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 18s)

2017-05-21

  • 09:42 Reedy: force ran puppet on deployment-tin to pickup dbname in wmf-beta-update-database.py
  • 09:07 smalyshev@tin: Finished deploy [wdqs/wdqs@227ab25]: Redeploy GUI due to breakage in T165228 (duration: 00m 19s)
  • 09:06 smalyshev@tin: Started deploy [wdqs/wdqs@227ab25]: Redeploy GUI due to breakage in T165228
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Sun May 21 02:27:46 UTC 2017 (duration 6m 3s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 30s)

2017-05-20

  • 21:54 Dereckson: Run namespaceDupe on fr.wikisource and en.wikisource
  • 17:29 addshore: addshore@terbium:/srv/mediawiki/php-1.30.0-wmf.1$ mwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log T164407
  • 17:29 addshore: addshore@terbium:/srv/mediawiki/php-1.30.0-wmf.1$ mwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log
  • 09:08 thcipriani: restarting jenkins on contint1001
  • 08:24 smalyshev@tin: Finished deploy [wdqs/wdqs@227ab25]: Whitelist update (duration: 02m 32s)
  • 08:22 smalyshev@tin: Started deploy [wdqs/wdqs@227ab25]: Whitelist update
  • 07:52 gehel: restart wdqs-updater on all wdqs clusters (stuck on too large update)
  • 02:29 l10nupdate@tin: ResourceLoader cache refresh completed at Sat May 20 02:29:14 UTC 2017 (duration 6m 13s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 17s)

2017-05-19

  • 16:44 reedy@tin: Synchronized wmf-config/throttle.php: Wikimedia Vienna Hackathon (duration: 00m 39s)
  • 15:40 mutante: planet10001 - manually deleting cron job for deleted sr.planet (should puppetize the "absence" too)
  • 13:58 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2047 - T165743 (duration: 00m 38s)
  • 13:47 marostegui: Deploy alter table s7.frwiktionary db1033 - T165743
  • 13:31 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2047 - T165743 (duration: 00m 39s)
  • 13:09 moritzm: downgraded mw1161 to HHVM 3.12 (crashes often compared to app servers, downgrade over the weekend)
  • 12:58 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2068 - T165743 (duration: 00m 39s)
  • 12:40 marostegui: Deploy alter table s7.frwiktionary on db2068 - T165743
  • 12:40 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2068 - T165743 (duration: 00m 40s)
  • 11:39 marostegui: Deploy alter table s2.revision table on labsdb1003 - T162611
  • 11:05 moritzm: uploaded nutcracker 0.4.1-1+wm3~jessie1 to apt.wikimedia.org (T163795)
  • 10:31 ebernhardson: restarting elsaticsearch on relforge1001 to pull in remote reindex
  • 10:19 moritzm: powercycling mw2221, stuck in reboot and serial console unresponsive
  • 10:08 _joe_: moved stale repos to /srv/deployment/STALE on tin, T129290
  • 10:07 moritzm: rebooting mw2220/mw2221 for update to Linux 4.9 / HHVM 3.18 / nutcracker tests
  • 09:15 reedy@tin: Synchronized dblists/: Update size dblists (duration: 00m 39s)
  • 09:01 reedy@tin: Synchronized php-1.30.0-wmf.1/extensions/WikimediaMaintenance/makeSizeDBLists.php: Catch a silly error (duration: 00m 39s)
  • 08:14 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool db2048 for reimage (duration: 00m 39s)
  • 07:36 akosiaris: reboot kubernetes2001 for tests
  • 06:51 moritzm: installing openjdk-7/trusty regression update
  • 06:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1051 - T159753 T164530 (duration: 00m 38s)
  • 06:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1051 - T159753 T164530 (duration: 00m 39s)
  • 06:09 marostegui: Deploy alter table s2.revision table - db1018 - https://phabricator.wikimedia.org/T162611
  • 06:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060 - T162611 (duration: 00m 40s)
  • 05:56 jynus: shutting down db2049 and preparing it for reimage
  • 02:28 l10nupdate@tin: ResourceLoader cache refresh completed at Fri May 19 02:28:08 UTC 2017 (duration 6m 0s)
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 53s)

2017-05-18

  • 20:47 mutante: wasat - git pull - bring to latest, the last changed had never been deployed here like on terbium, but it's also not a backend for dbtree yet (T163141)
  • 20:44 mutante: terbium / dbtree - deploying gerrit:353388 (sudo -u mwdeploy git pull origin in /srv/dbtree) (T163143)
  • 20:03 urandom: T164865: restarting RESTBase-dev, range delete-based render retention
  • 19:52 urandom: T164865: restarting RESTBase-dev to apply range delete-based render retention
  • 19:06 urandom: T164865: configure RESTBase tables for size-tiered compaction (dev env only)
  • 18:37 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/SecurePoll/includes/pages/DumpPage.php: Revert "Dump should return decrypted votes" (T145695) (duration: 00m 48s)
  • 17:10 robh: mr1-ulsfo having oob connection re-routed at ulsfo, will flap a bit from 1700-1730 gmt
  • 17:09 moritzm: upgrading mw2130-mw2139 to Linux 4.9 and HHVM 3.18
  • 16:28 moritzm: restarting cassandra on restbase1010, restbase1011, restbase1016, restbase1018 to pick up OpenJDK security updates
  • 16:11 elukey: upgraded cassandra-tools-wmf on aqs hosts
  • 15:54 _joe_: uploaded package cni to jessie-wikimedia
  • 15:34 marostegui: Deploy alter table s2.revision table - db1060 - https://phabricator.wikimedia.org/T162611
  • 15:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1074, depool db1060 - T162611 (duration: 00m 39s)
  • 14:46 moritzm: rebooting restbase1008 for update to Linux 4.9 and to pick up OpenJDK security updates
  • 14:32 XioNoX: rebooting mr1-ulsfo for software upgrade - T164970
  • 14:12 akosiaris: perform a final reboot on kubernetes200X
  • 13:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 - T159753 T164530 (duration: 00m 39s)
  • 13:40 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 - T159753 T164530 (duration: 01m 03s)
  • 13:33 jynus: stopping mariadb and preparing for reimage at db2051
  • 13:14 elukey: AMEND prev: reloaded kafkatee on oxygen
  • 13:14 elukey: reloaded kafkatee to test T151748
  • 12:57 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1066 - T159753 T164530 (duration: 00m 38s)
  • 12:51 moritzm: upgrading mw1209-mw1219 to Linux 4.9 and HHVM 3.18
  • 12:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1072, depool db1066 - T159753 T164530 (duration: 00m 38s)
  • 12:44 marostegui: Deploy alter table s2.revision table - db1074 - https://phabricator.wikimedia.org/T162611
  • 12:44 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1076, depool db1074 - T159753 T164530 (duration: 00m 39s)
  • 12:42 moritzm: upgrading mw1161 (job runner) to HHVM 3.18
  • 11:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1073, depool db1072 - T159753 T164530 (duration: 00m 39s)
  • 11:10 marostegui: Run pt-table-checksum on s7.metawiki - T163190
  • 09:49 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1080, depool db1073 - T159753 T164530 (duration: 00m 39s)
  • 09:47 moritzm: upgrading image scalers in codfw to Linux 4.9 and HHVM 3.18
  • 09:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1083, depool db1080 - T159753 T164530 (duration: 00m 38s)
  • 09:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1083 - T159753 T164530 (duration: 00m 39s)
  • 09:07 moritzm: upgrading image scalers mw1294/mw1295 to Linux 4.9 and HHVM 3.18
  • 09:06 marostegui: Deploy alter table s2.revision table - db1076 - https://phabricator.wikimedia.org/T162611
  • 09:06 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1090, depool db1076 - T162611 (duration: 00m 39s)
  • 08:52 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 - T159753 T164530 (duration: 00m 39s)
  • 08:46 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T159753 T164530 (duration: 00m 39s)
  • 08:32 moritzm: upgrading mw1180-mw1188, mw1200-mw1208 to new hhvm-luasandbox/hhvm-luasandbox-dbg packages
  • 08:16 apergos: reboot dataset1001 for kernel update
  • 08:09 marostegui: Deploy alter table on s1.enwiki directly on codfw master (db2016) after running the clean up duplicates script - https://phabricator.wikimedia.org/T164530
  • 08:01 moritzm: reboot rhenium for update to Linux 4.9
  • 07:36 moritzm: installing freetype security updates on trusty (jessie already fixed)
  • 07:27 akosiaris: restart nagios-nrpe-server on dbstore2001
  • 07:01 marostegui: Deploy alter table on s2.plwiki directly on codfw master (db2017) after running the clean up duplicates script - https://phabricator.wikimedia.org/T164530
  • 06:43 moritzm: installing tiff security updates
  • 06:24 marostegui: Deploy alter table on s2.ptwiki directly on codfw master (db2017) after running the clean up duplicates script - https://phabricator.wikimedia.org/T164530
  • 06:21 marostegui: Deploy alter table s2.revision table - labsdb1001 - T162611
  • 06:10 marostegui: Deploy alter table s2.revision table - dbstore1001 - T162611
  • 06:10 marostegui: Deploy alter table s2.revision table - db1090 - T162611
  • 06:09 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1090 - T162611 (duration: 00m 38s)
  • 06:05 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2062 - T116557 (duration: 00m 39s)
  • 05:01 Jamesofur: insert decryption key for WMF Board Election
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Thu May 18 02:26:11 UTC 2017 (duration 5m 59s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 14s)

2017-05-17

  • 23:45 cwd: re-enabled p-c jobs
  • 23:06 cwd: disabled p-c jobs
  • 22:27 ejegg: updated SmashPig from 0145e2d to 4f84d88
  • 22:00 urandom: T164865: altering compaction strategy to sizetiered, local_group_wikipedia_T_parsoid_html.data (in RESTBase dev)
  • 21:50 ejegg: rolled back SmashPig to 0145e2d
  • 21:47 ejegg: updated SmashPig from 0145e2d to 1affad1
  • 20:36 ejegg: updated paypal EC fallback currency in payments-wiki config
  • 19:21 robh: mr1-ulsfo replacement underway
  • 18:54 urandom: T164865: restarting RESTBase in dev env to apply range-delete probability bug-fix
  • 18:30 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.init.js: Do not check for visual editor availability when loading source editor (Gerrit:354126) (duration: 00m 39s)
  • 18:24 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable wgCiteResponsiveReferences on ilo. and ms.wikipedia (T164230, T165247) (duration: 00m 39s)
  • 18:08 paravoid: reprepro include facter 2.4.6 to jessie-wikimedia/trusty-wikimedia
  • 16:52 bblack: restarting varnish backend on cp1099 (mailbox)
  • 16:42 moritzm: upgrading mw2120-mw2129 to Linux 4.9 and HHVM 3.18
  • 15:08 moritzm: upgrading mw1189-mw1199 to new hhvm-luasandbox/hhvm-luasandbox-dbg packages
  • 14:50 marostegui: Deploy alter table on s2.revision table on db1069 - T162611
  • 14:26 demon@tin: Synchronized README: No-op, forcing co-master sync (duration: 00m 40s)
  • 14:20 demon@tin: Pruned MediaWiki: 1.29.0-wmf.21 [keeping static files] (duration: 00m 22s)
  • 14:20 moritzm: upgrading mw1170-mw1179 to new hhvm-luasandbox/hhvm-luasandbox-dbg packages
  • 14:19 demon@tin: Pruned MediaWiki: 1.29.0-wmf.19 (duration: 01m 07s)
  • 14:17 demon@tin: Pruned MediaWiki: 1.29.0-wmf.19 [keeping static files] (duration: 00m 12s)
  • 13:47 cmjohnson1: replacing optics on cr1-3/1/2 and/or asw-c-eqiad:xe-8/0/38 T165008
  • 13:47 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/TwoColConflict/modules/: SWAT Fix issues with column alignment T165129 (duration: 00m 39s)
  • 13:44 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT Take RevisionSlider out of beta on all sites NOOP PT 2/2 (duration: 00m 39s)
  • 13:42 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT Take RevisionSlider out of beta on all sites T163685 PT 1/2 (duration: 00m 40s)
  • 13:42 elukey: shutdown analytics1030 for T165529
  • 13:41 moritzm: upgrading mw1261-mw1265 to new hhvm-luasandbox/hhvm-luasandbox-dbg packages
  • 13:23 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Harden zerowiki config (T162771) (duration: 00m 41s)
  • 12:42 marostegui: Deploy alter table on s2.trwiki directly on codfw master (db2017) after running the clean up duplicates script - T164530
  • 11:27 moritzm: uploaded php-luasandbox_2.0.12~jessie3 to apt.wikimedia.org (adds a separate debug package hhvm-luasandbox-dbg)
  • 11:17 moritzm: rebooting restbase2012 for update to Linux 4.9 and to pick up openjdk security updates
  • 10:58 moritzm: rebooting restbase2011 for update to Linux 4.9 and to pick up openjdk security updates
  • 10:47 jynus: stopping db2052 and preparing it for reimage
  • 10:26 moritzm: rebooting restbase2010 for update to Linux 4.9 and to pick up openjdk security updates
  • 09:58 moritzm: rebooting restbase2009 for update to Linux 4.9 and to pick up openjdk security updates
  • 09:31 moritzm: rebooting restbase2008 for update to Linux 4.9 and to pick up openjdk security updates
  • 08:50 marostegui: Deploy alter table on codfw master (db2016) and let ir replicate - T159753
  • 06:56 marostegui: Drop already renamed tables from labtestweb2001 (labtestwiki) - T164887
  • 06:54 marostegui: Drop already renamed tables from silver (labswiki) - T164887
  • 06:52 marostegui: Deploy alter table on s2 (revision table) dbstore1002 - T162611
  • 06:26 marostegui: Deploy alter table on s2 (revision table) db2017 (codfw master) - https://phabricator.wikimedia.org/T1626111
  • 06:22 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2041 and db2049 - T162611 (duration: 00m 39s)
  • 06:01 marostegui: Resume pt-table-checksum on s7.centralauth - https://phabricator.wikimedia.org/T163190
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Wed May 17 02:25:59 UTC 2017 (duration 6m 1s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 06m 58s)

2017-05-16

  • 23:25 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Update wmde-policy RSS feed on meta. (T165285) (duration: 00m 39s)
  • 22:42 Dereckson: Tin has now an up-to-date /srv/mediawiki-staging HEAD, with operations/mediawiki-config repo = prod = staging
  • 20:22 mobrovac@tin: Started restart [restbase/deploy@d98af6f] (dev-cluster): Apply the revision range deletion algorithm, take 2 - T164865
  • 20:10 mobrovac@tin: Started restart [restbase/deploy@d98af6f] (dev-cluster): Apply the revision range delition algorithm - T164865
  • 18:49 jynus: rolled back to HEAD~2 on tin to leave things the way I found them
  • 18:41 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 after reimage (duration: 00m 39s)
  • 18:26 bblack: cp1074: run-no-puppet varnish-backend-restart (has high mailbox lag, causing small 503 spikes)
  • 17:23 cmjohnson1: swapping optics asw-c-eqiad xe-8/0/38 T165008
  • 17:05 moritzm: upgrading mw2017/mw2099 to Linux 4.9 and HHVM 3.18
  • 16:40 moritzm: upgrading mw2190-mw2199 to Linux 4.9 and HHVM 3.18
  • 16:22 robh@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2098.codfw.wmnet
  • 15:48 jynus: restarting and upgrading db1095
  • 15:01 marostegui@tin: Synchronized wmf-config/db-codfw.php: Remove old comment (duration: 00m 39s)
  • 14:53 marostegui: Deploy alter table on s2 (revision table) db2041 - https://phabricator.wikimedia.org/T162611
  • 14:53 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2056, depool db2041 - T162611 (duration: 00m 41s)
  • 14:50 mobrovac@tin: Started restart [restbase/deploy@d98af6f]: Apply new puppet role/profile paradigm
  • 14:36 kartik@tin: Finished deploy [cxserver/deploy@6118dda]: Update cxserver to 740641f (duration: 02m 21s)
  • 14:34 kartik@tin: Started deploy [cxserver/deploy@6118dda]: Update cxserver to 740641f
  • 14:27 moritzm: upgrading mw2180-mw2189 to Linux 4.9 and HHVM 3.18
  • 14:08 jynus: rolling restart labsdb1009,10,11 for mariadb upgrade (and kernel upgrade)
  • 14:06 moritzm: rebooting restbase2007 for update to Linux 4.9 and to pick up openjdk security updates
  • 13:53 moritzm: upgrading mw2170-mw2179 to Linux 4.9 and HHVM 3.18
  • 13:48 addshore: SWAT done
  • 13:48 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/QuickSurveys/extension.json: SWAT: Explicitly add mediawiki.cookie dependency (duration: 00m 39s)
  • 13:40 moritzm: rebooting restbase2006 for update to Linux 4.9 and to pick up openjdk security updates
  • 13:39 addshore@tin: Synchronized wmf-config/throttle.php: SWAT: Raise the account creation limit for www.enwp.org/WP:Meetup/Eugene/WikiAPA T165421 (duration: 00m 39s)
  • 13:36 addshore@tin: Synchronized wmf-config/: SWAT: #1 T164502, #2, #3 (duration: 00m 41s)
  • 13:19 moritzm: upgrading mw2163-mw2169 to HHVM 3.18
  • 13:07 moritzm: upgrading mw2110-mw2117 to HHVM 3.18
  • 12:55 marostegui: Run pt-table-checksum on s7.centralauth - https://phabricator.wikimedia.org/T163190
  • 12:06 marostegui: Deploy alter table on s2 (revision table) db2049 - https://phabricator.wikimedia.org/T162611
  • 12:05 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2049 - T162611 (duration: 00m 39s)
  • 11:37 moritzm: upgrading mw1190-mw1208 to Linux 4.9 and HHVM 3.18
  • 11:28 jynus: stopping db1055 before reimage for backup
  • 11:27 Amir1: ladsgroup@terbium:~$ mwscript extensions/ORES/maintenance/CleanDuplicateScores.php --wiki=enwiki
  • 11:25 Amir1: cleaning up is completely done current number of rows: 9,261,264 T159753
  • 11:24 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 for reimage (duration: 00m 39s)
  • 10:56 moritzm: upgrading codfw app servers already using HHVM 3.18 to 3.18.2+wmf3
  • 10:49 marostegui: Deploy schema change on testwikidatawiki.wb_terms on s3 codfw master - T165246
  • 10:36 jynus: upgrading and restarting db2062's mariadb service
  • 10:30 moritzm: installing openjdk-7 security updates on trusty hosts
  • 10:28 addshore: T164407 addshore@terbium mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist --batch-size=1000
  • 10:27 addshore: addshore@terbium mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist --batch-size=1000
  • 10:14 moritzm: upgrading mw1185-mw1189 to Linux 4.9 and HHVM 3.18
  • 09:26 moritzm: upgrading mw1189 / mw1293 from HHVM 3.18.2+wmf2 to 3.18.2+wmf3
  • 08:59 moritzm: upgrading mw1170-mw1184 from HHVM 3.18.2+wmf2 to 3.18.2+wmf3
  • 08:45 moritzm: upgrading git packages on tin/naos from local 2.11 backport to the version from jessie-backports
  • 08:22 moritzm: installing git security updates on trusty (jessie already fixed)
  • 07:39 godog: upload prometheus-mysqld-exporter 0.10.0 to jessie-wikimedia - T161296
  • 07:10 moritzm: upgrading mw1261-mw1265 to HHVM 3.18.2+wmf3
  • 07:06 Amir1_: start of cleaning up ores_classification table in enwiki last round (four hours) (T159753)
  • 06:58 moritzm: restarted hhvm on mw1165 (stuck in HPHP::Treadmill deadlock)
  • 06:37 marostegui: Stop replication at the same position on db1044 and db2018 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 06:32 marostegui: Disable replication codfw > eqiad on s3 https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 06:08 marostegui: Run pt-table-checksum on s7.viwiki - T163190
  • 06:02 marostegui: Deploy alter table on s2 (revision table) db2056 - T162611
  • 06:02 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2063, depool db2056 - T162611 (duration: 00m 40s)
  • 05:17 XioNoX: fyi, one of the links between codfw and eqiad is down for a scheduled Zayo maintenance. No outage, traffic routed around.
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Tue May 16 02:26:19 UTC 2017 (duration 6m 3s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 12s)
  • 00:30 ejegg: updated payments-wiki from 57451de to 3b84521
  • 00:10 ejegg: updated CiviCRM from 061cd61 to 4ece34c

2017-05-15

  • 23:41 bd808@tin: Synchronized php-1.30.0-wmf.1/resources/src/mediawiki.rcfilters/mw.rcfilters.Controller.js: RCFilters: Actually read/write highlight parameter (T165107) (duration: 00m 40s)
  • 22:23 mobrovac@tin: Finished deploy [restbase/deploy@d98af6f]: Wt2lint bug fix - T163091 (duration: 06m 44s)
  • 22:16 mobrovac@tin: Started deploy [restbase/deploy@d98af6f]: Wt2lint bug fix - T163091
  • 21:19 mobrovac@tin: Finished deploy [restbase/deploy@c52add0]: Expose the new /transform/wikitext/to/lint end point to the public - T163091 (duration: 06m 32s)
  • 21:13 mobrovac@tin: Started deploy [restbase/deploy@c52add0]: Expose the new /transform/wikitext/to/lint end point to the public - T163091
  • 20:48 gilles: run refreshImageMetadata --force for group1 + group2 wikis except commons on terbium T150741
  • 20:20 subbu: Updated Parsoid to a182c227 (T141226, T164792, T37247, T153107, T163091, T164006, T161151, T162920, T163549)
  • 20:11 ssastry@tin: Finished deploy [parsoid/deploy@132d0e5]: Updating Parsoid to a182c227 (duration: 07m 21s)
  • 20:04 ssastry@tin: Started deploy [parsoid/deploy@132d0e5]: Updating Parsoid to a182c227
  • 19:42 catrope@tin: Synchronized php-1.30.0-wmf.1/includes/api/ApiQueryRevisions.php: T165100 (duration: 00m 40s)
  • 18:45 catrope@tin: Synchronized php-1.30.0-wmf.1/extensions/MobileFrontend/: Revert "Use csrf token for watching" (T165209) (duration: 00m 41s)
  • 18:45 RoanKattouw: Canary failing on mw1279 due to Wikimedia\Rdbms\Database::makeList: empty input for field rev_id from ApiQueryRevisions
  • 18:20 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Disable test reader QuickSurveys (T131949, T164769, T164894, T164960, T164943) (duration: 00m 40s)
  • 17:24 mobrovac@tin: Finished deploy [restbase/deploy@c70a1e1] (dev-cluster): Bring RESTBase up to date in the Dev Cluster (duration: 01m 51s)
  • 17:22 mobrovac@tin: Started deploy [restbase/deploy@c70a1e1] (dev-cluster): Bring RESTBase up to date in the Dev Cluster
  • 15:39 akosiaris: upgrade pybal to 1.13.6 across the LVS fleet
  • 15:10 mobrovac@tin: Finished deploy [citoid/deploy@3ed34ef]: Better publishing date extraction support - T132308 (duration: 02m 49s)
  • 15:07 mobrovac@tin: Started deploy [citoid/deploy@3ed34ef]: Better publishing date extraction support - T132308
  • 14:24 mobrovac@tin: Started restart [restbase/deploy@c70a1e1] (dev-cluster): Restart after applying https://gerrit.wikimedia.org/r/#/c/352851/
  • 13:50 moritzm: upgrading mwdebug servers to 3.18.2+wmf3
  • 13:48 addshore@tin: Synchronized php-1.30.0-wmf.1/includes/media/DjVu.php: SWAT: Add X-Content-Dimensions support to DjVu T150741 (duration: 00m 39s)
  • 13:47 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/TimedMediaHandler/handlers: SWAT: Fix X-Content-Dimensions support T150741 (duration: 00m 40s)
  • 13:37 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/VisualEditor: SWAT: #1 #2 T165238 T165238 VisualEditor (duration: 00m 41s)
  • 13:27 moritzm: uploaded HHVM 3.18.2+dfsg-1+wmf3 to apt.wikimedia.org (addresses segfault in XML reader (T162586, T165074)
  • 13:20 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/Cognate/maintenance/populateCognatePages.php: SWAT: Add a clear-first option to populatePages script T164407 PT 2/2 (duration: 00m 39s)
  • 13:19 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/Cognate/src/CognateStore.php: SWAT: Add a clear-first option to populatePages script T164407 PT 1/2 (duration: 00m 40s)
  • 13:10 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add QuickSurvey for reader segmentation research T131949 T164769 T164894 T164960 T164963 (duration: 00m 40s)
  • 12:54 akosiaris: upload pybal 1.13.6 to apt.wikimedia.org/jessie-wikimedia/main
  • 12:33 aude@tin: Synchronized wmf-config/Wikibase-production.php: Enable data type for tabular data (duration: 00m 41s)
  • 11:09 Amir1_: cleaning up ores_classification has finished 18M rows deleted, current number of rows 38,937,217 (T159753)
  • 10:36 moritzm: rebooting mw2224-mw2242 for update to Linux 4.9
  • 10:18 moritzm: installing batik security updates on trusty
  • 10:14 moritzm: installing fop security updates on trusty
  • 09:34 moritzm: installing bind security updates (we're using client-side libs/tools only)
  • 09:10 godog: swift codfw-prod: more ms-be2001/ms-be2012 decom - T162785
  • 08:29 godog: swift eqiad-prod: ms-be1028/ms-be1039 object weight 3000 - T160640
  • 08:26 moritzm: installing rtmpdump security updates on jessie
  • 08:17 Amir1_: start of cleaning up ores_classification table
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Mon May 15 02:27:02 UTC 2017 (duration 5m 59s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 42s)
  • 01:25 bblack: depooled cp1053 from all services (possible hardware issues)

2017-05-14

  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Sun May 14 02:26:33 UTC 2017 (duration 6m 2s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 17s)

2017-05-13

  • 12:27 gehel: restarting wdqs updater on wdqs cluster
  • 02:33 l10nupdate@tin: ResourceLoader cache refresh completed at Sat May 13 02:33:34 UTC 2017 (duration 6m 9s)
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 07s)
  • 01:18 mobrovac: zotero restart as memis above 50%
  • 00:54 urandom: T165139: Truncating RESTBase feed_aggregated tables (corruption)
  • 00:31 urandom: T165139: Truncating RESTBase summary tables (corruption)

2017-05-12

  • 20:55 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Touch (duration: 00m 39s)
  • 20:49 demon@tin: Synchronized wmf-config/: Swapping DynamicSidebar to normal extension registration (duration: 00m 19s)
  • 19:20 thcipriani@tin: Synchronized php-1.30.0-wmf.1/extensions/TextExtracts/includes/ApiQueryExtracts.php: API: Change memcache key to clear cache T165161 (duration: 00m 39s)
  • 19:02 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: Add RejectParserCacheValue handler for mw-parser-output T165161 (duration: 00m 40s)
  • 18:47 bblack: starting spaced-out ~4h run of "run-no-puppet varnish-frontend-restart" on cache_upload+cache_text to re-set transient storage levels (in screen on neodymium)
  • 18:10 thcipriani@tin: Finished scap: Revert "Wrap parser output in
    " 4/4 (duration: 19m 13s)
  • 17:51 thcipriani@tin: Started scap: Revert "Wrap parser output in
    " 4/4
  • 17:51 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/api/ApiParse.php: Revert "Wrap parser output in
    " 3/4 (duration: 00m 42s)
  • 17:50 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/cache/MessageCache.php: Revert "Wrap parser output in
    " 2/4 (duration: 00m 39s)
  • 17:49 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/parser/Parser.php: Revert "Wrap parser output in
    " 1/4 (duration: 00m 39s)
  • 17:38 ema: cp4010: upgrade varnish back to 4.1.6-1wm1, transient storage issues are unrelated
  • 17:33 krinkle@tin: Synchronized php-1.30.0-wmf.1/includes/resourceloader/ResourceLoaderClientHtml.php: (no justification provided) (duration: 00m 40s)
  • 16:53 moritzm: powercycling mw1294 (machine unacessible/locked up)
  • 16:23 moritzm: repooled mw1172 after scap pull (was down with hardware error)
  • 14:10 moritzm: rebooting mw2163-mw2179 for update to Linux 4.9
  • 13:47 moritzm: rebooting mw2110-mw2117 for update to Linux 4.9
  • 13:06 moritzm: repooled mw2098 (was down with hardware error)
  • 12:53 moritzm: downgrading mw1161 (job runner) to HHVM 3.12, some known instabilities and fix for one HHVM 3.18 will likely be available next week, so going the conversative way over the weekend
  • 11:35 gehel: cleaning old elasticsearch and logstash logs on logstash cluster
  • 10:38 _joe_: moved hpssacli.tar.gz to /root on puppetmaster1001
  • 09:59 hashar@tin: Synchronized php-1.30.0-wmf.1/extensions/MobileFrontend: Correctly handle the mw-parser-output wrapper - T164733 (duration: 00m 43s)
  • 09:02 akosiaris: move planet2001 to ganeti nodegroup row_A
  • 08:58 marostegui: Rename semantic tables before dropping them on wikitech hosts (silver and labtestweb2001) - T164887
  • 06:05 marostegui: Deploy alter table on s2 (revision table) db2063 - https://phabricator.wikimedia.org/T162611
  • 06:05 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2064, depool db2063 - T162611 (duration: 00m 39s)
  • 05:53 marostegui: Stop MySQL dbstore2001 for testing - T165033
  • 02:30 l10nupdate@tin: ResourceLoader cache refresh completed at Fri May 12 02:30:11 UTC 2017 (duration 6m 16s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 06m 49s)

2017-05-11

  • 23:49 thcipriani@tin: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable saving RC Filters on Beta Cluster (beta-only-change) (duration: 00m 39s)
  • 23:39 thcipriani@tin: Synchronized php-1.30.0-wmf.1/resources/src/mediawiki.rcfilters: SWAT: Gate option to save RC filters to default false 3/3 (duration: 00m 39s)
  • 23:39 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/specials/SpecialRecentchanges.php: SWAT: Gate option to save RC filters to default false 2/3 (duration: 00m 39s)
  • 23:38 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/DefaultSettings.php: SWAT: Gate option to save RC filters to default false 1/3 (duration: 00m 39s)
  • 23:30 thcipriani@tin: Synchronized php-1.30.0-wmf.1/extensions/TemplateData/extension.json: SWAT: Fix styles queue violation for "ext.templateData" T92459 (duration: 00m 39s)
  • 23:23 twentyafterfour: restart apache on iridium to apply hotfix for T163967
  • 23:21 thcipriani@tin: Synchronized php-1.30.0-wmf.1/resources/src/mediawiki/mediawiki.Upload.Dialog.js: SWAT: mw.Upload.Dialog: Define .static.name T164999 (duration: 00m 40s)
  • 23:12 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable OOUI for EditPage on MW.org (duration: 00m 40s)
  • 23:09 Amir1: clean up for ores_classification is finished for now, 9M rows cleaned, current number of row: 55,959,017 (T159753)
  • 21:19 twentyafterfour@tin: Synchronized php-1.30.0-wmf.1/includes/specials/SpecialSearch.php: hotfix T165091 (duration: 00m 39s)
  • 21:02 Amir1: start of cleaning up ores_classification in enwiki for two hours (T159753)
  • 20:57 hashar: CI Phpunit jobs were segfaulting due to an upgrade of HHVM to 3.18. Got rolled back to 3.12 - T165074
  • 20:06 demon@tin: Synchronized scap/plugins/prep.py: scap prep is fast now (duration: 00m 44s)
  • 19:41 demon@tin: Synchronized scap/plugins/clean.py: no-op, completeness (duration: 00m 42s)
  • 19:35 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.1
  • 18:53 thcipriani@tin: Synchronized php-1.30.0-wmf.1/extensions/Gadgets/includes/GadgetResourceLoaderModule.php: SWAT: Revert "Move gadget styles from main stylesheet request to site request" T165040 T165031 (duration: 00m 42s)
  • 18:47 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable OOUI in EditPage for fawiki T162849 (duration: 00m 42s)
  • 18:39 hoo: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds
  • 18:23 thcipriani@tin: Synchronized php-1.30.0-wmf.1/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: SWAT: RecentChangesClicks: Address minor performance concerns T158458 (duration: 00m 42s)
  • 15:35 ladsgroup@tin: Synchronized wmf-config: Set oresDamagingPref default to values that actually exist (T165011) (duration: 00m 44s)
  • 15:35 Amir1: starts of ladsgroup@tin:/srv/mediawiki-staging$ scap sync-dir wmf-config 'Set oresDamagingPref default to values that actually exist (T165011)'
  • 15:30 chasemp: rotate novaadmin in /labtest/ ldappasswd -H ldap://labtestservices2001.wikimedia.org -x -D "uid=novaadmin,ou=people,dc=wikimedia,dc=org" -W -A -S
  • 14:37 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: name=sca2004.codfw.wmnet
  • 14:36 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=sca2004.codfw.wmnet
  • 14:10 gehel@tin: Finished deploy [wdqs/wdqs@bc30531]: (no justification provided) (duration: 01m 23s)
  • 14:08 gehel@tin: Started deploy [wdqs/wdqs@bc30531]: (no justification provided)
  • 14:07 gehel: deploying WDQS to fix T165029
  • 14:01 mobrovac@tin: Started restart [zotero/translation-server@50f216a]: Zotero unresponsive
  • 13:59 aude@tin: Synchronized php-1.30.0-wmf.1/extensions/Wikidata: Update quality constraints (duration: 02m 14s)
  • 13:56 mobrovac@tin: Started restart [zotero/translation-server@6a4a828]: (no justification provided)
  • 13:48 addshore@tin: Synchronized wmf-config/jobqueue-labs.php: SWAT: LABS ONLY Re-enable persistent connection to Redis for jobrunners in lab (duration: 00m 41s)
  • 13:33 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: (notask) wgRevisionSliderAlternateSlider true everywhere PT 2/2 (duration: 00m 42s)
  • 13:33 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: (notask) wgRevisionSliderAlternateSlider true everywhere PT 1/2 (duration: 00m 43s)
  • 13:31 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: T162796 Stop prerendering thumbs at 2560/2880 pixels (duration: 00m 41s)
  • 13:23 moritzm: rebooting restbase2005 for update to Linux 4.9 / new openjdk
  • 13:21 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/Cognate/src/CognateStore.php: SWAT: T165005 Dont pass ConnectionRefs to ConnectionManager::releaseConnection (duration: 00m 42s)
  • 13:10 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: T164888 Correct alias(es) from es.wikisource to eo.wikisource (duration: 00m 42s)
  • 12:55 akosiaris: migrate sca2004 to ganeti nodegroup row_A
  • 12:33 marostegui: Run pt-table-checksum on s7.ukwiki - https://phabricator.wikimedia.org/T163190
  • 12:19 elukey: reboot kafka100[23] for kernel upgrades (kafka main-eqiad, eventbus eqiad)
  • 11:03 marostegui: Deploy alter table on s2 (revision table) db2064 - T162611
  • 11:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2064 - T162611 (duration: 00m 42s)
  • 10:15 akosiaris: reboot ganeti200{5,6,7,8} for network reconfiguration
  • 10:10 marostegui: Run pt-table-checksum on s7.rowiki - https://phabricator.wikimedia.org/T163190
  • 10:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1056 original load (duration: 00m 49s)
  • 09:46 ema: cp4010: downgrade varnish to 4.1.5-1wm4 and check frontend transient memory usage
  • 09:12 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=logstash1003.eqiad.wmnet
  • 09:12 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=logstash1002.eqiad.wmnet
  • 09:12 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=logstash1001.eqiad.wmnet
  • 09:10 moritzm: upgrading mw1170-mw1188 to HHVM 3.18 / Linux 4.9 (also pruning HHVM CLI bytecode since downtimed anyway)
  • 08:55 moritzm: migrating mw1161 (job runner) to HHVM 3.18 and Linux 4.9
  • 08:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1056 load (duration: 00m 43s)
  • 08:35 marostegui: Run pt-table-checksum on s7.kowiki - https://phabricator.wikimedia.org/T163190
  • 08:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1056 load (duration: 00m 42s)
  • 08:26 moritzm: migrating mw1189 (API server) to HHVM 3.18 and Linux 4.9
  • 07:53 godog: roll-restart ms-fe1* for linux 4.9 upgrade - T162029
  • 06:50 moritzm: migrating mw1293 (image scaler) to HHVM 3.18 and Linux 4.9
  • 06:30 marostegui: Drop mira user on wikitech database - T164968
  • 06:11 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 with less load (duration: 00m 43s)
  • 05:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1067 - T147166 T130067 (duration: 00m 57s)
  • 03:14 l10nupdate@tin: ResourceLoader cache refresh completed at Thu May 11 03:14:51 UTC 2017 (duration 6m 44s)
  • 03:08 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 13m 33s)
  • 02:46 Jamesofur: all election emails out
  • 02:41 Jamesofur: Sending English and all other language election emails via terbium
  • 02:35 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 13m 22s)
  • 02:21 Jamesofur: sending Chinese election emails via terbium
  • 02:18 Jamesofur: sending uk and vi election emails via terbium
  • 02:10 Jamesofur: sending pt,pt-br and ru election emails via terbium
  • 01:55 Jamesofur: sending polish and dutch election emails via terbium
  • 01:32 Jamesofur: sending Italian and Japanese election emails via terbium
  • 01:21 Jamesofur: sending he, hi and id election emails via terbium
  • 01:08 Jamesofur: sending French election emails via terbium
  • 01:05 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.1
  • 01:00 Jamesofur: sending farsi election emails via terbium
  • 00:50 Jamesofur: sending Spanish election emails via terbium
  • 00:36 Jamesofur: sending german election emails via terbium
  • 00:29 Jamesofur: sending bg and bn election emails via terbium
  • 00:11 Jamesofur: sending arabic election emails via terbium
  • 00:03 maxsem@tin: Finished deploy [kartotherian/deploy@9401f38]: Try https://gerrit.wikimedia.org/r/#/c/352886/ and https://gerrit.wikimedia.org/r/#/c/353184/ on test hosts (duration: 145m 42s)

2017-05-10

  • 23:50 twentyafterfour@tin: Finished scap: Sync fix for T164983 plus i18n files leftover from swat. refs T162954 (duration: 30m 37s)
  • 23:19 twentyafterfour@tin: Started scap: Sync fix for T164983 plus i18n files leftover from swat. refs T162954
  • 23:13 catrope@tin: Synchronized php-1.30.0-wmf.1/extensions/WikimediaEvents/: T164617 (duration: 00m 42s)
  • 23:08 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable archive search on select wikis (T162302) (duration: 00m 41s)
  • 21:38 twentyafterfour@tin: Synchronized php-1.30.0-wmf.1/extensions/ORES/includes/Hooks.php: sync fix for T164984 refs T162954 (duration: 00m 42s)
  • 21:38 maxsem@tin: Started deploy [kartotherian/deploy@9401f38]: Try https://gerrit.wikimedia.org/r/#/c/352886/ and https://gerrit.wikimedia.org/r/#/c/353184/ on test hosts
  • 20:55 elukey: restart hhvm on mw1268 (HHVM 3.12, HPHP::Treadmill::getAgeOldestRequest issue)
  • 20:37 demon@tin: Synchronized README: no-op, comaster sync (duration: 00m 42s)
  • 20:36 Dereckson: Run namespaceDupes.php on es.wikisource (T164195)
  • 20:35 bsitzmann@tin: Finished deploy [mobileapps/deploy@5d3b34a]: Update mobileapps to 75b135e (duration: 03m 55s)
  • 20:33 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Restore Autor: and Portal: namespaces on es.wikisource (T164195) (duration: 00m 42s)
  • 20:31 bsitzmann@tin: Started deploy [mobileapps/deploy@5d3b34a]: Update mobileapps to 75b135e
  • 19:51 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.21
  • 19:51 twentyafterfour: rolling group1 back to 1.29.0-wmf.21 due to T164984
  • 19:45 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.1
  • 19:33 twentyafterfour: deploying 1.30.0-wmf.1 to group1 wikis. refs T162954
  • 19:29 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/TimedMediaHandler/: Store original media dimensions as additional header (T150741) (duration: 00m 43s)
  • 19:28 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/PdfHandler/PdfHandler_body.php: Store original media dimensions as additional header (T150741) (duration: 00m 42s)
  • 19:27 dereckson@tin: Synchronized wmf-config/interwiki.php: Interwiki map update (disable __list sorting, T145337) (duration: 00m 41s)
  • 19:26 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/PagedTiffHandler/PagedTiffHandler_body.php: Store original media dimensions as additional header (T150741) (duration: 00m 42s)
  • 19:17 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/TwoColConflict/: Add "oojs-ui" dep to ext.TwoColConflict.filterOptionsJs (duration: 00m 42s)
  • 18:57 paravoid: mr1-ulsfo: request system snapshot media internal slice alternate; request system reboot
  • 18:53 dereckson@tin: Synchronized php-1.29.0-wmf.21/extensions/TwoColConflict/: Add "oojs-ui" dep to ext.TwoColConflict.filterOptionsJs (duration: 00m 42s)
  • 18:30 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/CirrusSearch/maintenance/forceSearchIndex.php: Fix index usage on archive indexing (duration: 00m 42s)
  • 18:14 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Put Cognate in write mode for all wiktionaries (T164407) (duration: 00m 42s)
  • 17:46 jynus: setting db1056's cpu scaling_governor to performance, rather than powersave
  • 17:20 moritzm: installing groovy security updates
  • 17:03 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Revert "Create Autor and Portal namespaces on Spanish Wikisource" (PT164195) (duration: 00m 43s)
  • 16:30 godog: roll-restart swift object servers to apply https://gerrit.wikimedia.org/r/#/c/353078
  • 15:44 moritzm: instaling git security updates on jessie systems
  • 15:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1067 - T147166 T130067 (duration: 01m 43s)
  • 15:18 moritzm: uploaded HHVM 3.18.2 and HHVM extensions to apt.wikimedia.org/main (previously only in experimental)
  • 15:03 jynus: shutting down db1056 for pysical maintenance T164944
  • 14:57 elukey: reboot kafka1001 for kernel upgrades (kafka main-eqiad, eventbus eqiad)
  • 14:50 marostegui: Stop replication at the same position on db1067 and db2016 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 14:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1067 - T147166 T130067 (duration: 00m 43s)
  • 14:43 marostegui: Run pt-table-checksum on s7.huwiki - https://phabricator.wikimedia.org/T163190
  • 14:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1097 (duration: 00m 43s)
  • 14:39 jynus: disabling puppet to solve disk mount issues T164915
  • 14:36 godog: roll-restart swift-proxy to apply https://gerrit.wikimedia.org/r/#/c/353078/
  • 14:36 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1005.eqiad.wmnet
  • 14:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 (duration: 00m 43s)
  • 14:27 hashar: European SWAT completed
  • 14:21 moritzm: upgrading mw1263-mw1265 to latest HHVM package (including the redis QUIT patch)
  • 14:19 hashar@tin: Finished scap: Store original media dimensions as additional header - T150741 (duration: 03m 53s)
  • 14:15 hashar@tin: Started scap: Store original media dimensions as additional header - T150741
  • 14:15 hashar@tin: scap aborted: Store original media dimensions as additional header - T150741 (duration: 00m 00s)
  • 14:15 hashar@tin: Started scap: Store original media dimensions as additional header - T150741
  • 14:15 hashar@tin: scap aborted: (no justification provided) (duration: 00m 00s)
  • 14:15 hashar@tin: Started scap: (no justification provided)
  • 14:13 hashar: ValueError: /srv/mediawiki-staging/php-1.30.0-wmf.1/extensions/Collection/.eslintrc.json is an invalid JSON file
  • 13:53 elukey: reboot kafka200[23] for kernel upgrades (kafka main-codfw cluster, eventbus codfw)
  • 13:35 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Clean up inappropriate usages of wmg - T151891 (duration: 00m 42s)
  • 13:34 hashar@tin: Synchronized wmf-config/CommonSettings.php: Clean up inappropriate usages of wmg - T151891 (duration: 00m 42s)
  • 13:28 marostegui: Disable replication codfw > eqiad on s1 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 13:24 hashar@tin: Synchronized php-1.29.0-wmf.21/extensions/Popups: eventLogging: Discard events with duplicate tokens - T161769 T163198 (duration: 00m 43s)
  • 13:19 hashar@tin: Synchronized php-1.30.0-wmf.1/extensions/Popups: eventLogging: Discard events with duplicate tokens - T161769 T163198 (duration: 01m 08s)
  • 13:17 hashar@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ar.svg: (no justification provided) (duration: 00m 42s)
  • 13:13 hashar@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ar.svg: Add new Arabic Wikipedia logo - T164648 (duration: 00m 44s)
  • 13:12 akosiaris: restart pybal on lvs1006, lvs1009, lvs1012 to pick up the kubemaster LVS service
  • 13:09 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Add new Arabic Wikipedia logo - T164648 && Disable page previews beta features on various projects - T164740 (duration: 00m 42s)
  • 13:07 marostegui: Run pt-table-checksum on s7.hewiki - https://phabricator.wikimedia.org/T163190
  • 13:04 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Import sources on dty.wikipedia - T164573 (duration: 00m 43s)
  • 12:47 moritzm: installing irqbalance updates from jessie point update
  • 12:45 akosiaris: rebooting ganeti2007, ganeti2008 for networking config update
  • 12:34 moritzm: installing logback security updates
  • 11:27 jynus: stopping mariadb and preparing db1056 for reimage
  • 11:22 marostegui: Stop replication at the same position on db1049 and db2023
  • 11:14 marostegui: Stop replication at the same position on db1050 and db2028
  • 10:50 marostegui: Stop replication at the same position on db1033 and db2029 - T147166 T130067
  • 10:44 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1056 for reimage (duration: 00m 43s)
  • 10:43 marostegui: Disable replication codfw > eqiad on s7 - T147166 T130067
  • 09:36 godog: roll-restart ms-fe2* for linux 4.9 upgrade - T162029
  • 09:11 moritzm: installing vim security updates on jessie
  • 09:05 volans: updated CI puppet compiler facts from production
  • 08:59 moritzm: installing wget security updates on jessie
  • 08:35 moritzm: rebooting mx2001 for update to Linux 4.9
  • 08:35 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: wmgUseTwoColConflict true for all wikis (duration: 00m 54s)
  • 07:30 marostegui: Stop replication at the same position on db10418 and db2017 - T147166 https://phabricator.wikimedia.org/T130067
  • 07:16 marostegui: Disable replication codfw > eqiad on s2 -T147166 T130067
  • 07:13 Amir1: another round of cleaning up ores_classification is done, 12M rows deleted. Current number of rows: 64,902,521 (T159753)
  • 06:36 moritzm: installing rtmpdump security updates on trusty
  • 06:15 marostegui: Deploy alter table wikidatawiki.wb_terms on dbstore1001 - T162539 T163190
  • 06:08 marostegui: Run pt-table-checksum on s7.frwiktionary - T163190
  • 05:04 Amir1: start of cleaning up ores_classification rows for three hours
  • 04:49 kartik@tin: Finished deploy [cxserver/deploy@533b4f4]: Update cxserver to 534619c (duration: 02m 38s)
  • 04:46 kartik@tin: Started deploy [cxserver/deploy@533b4f4]: Update cxserver to 534619c
  • 03:02 l10nupdate@tin: ResourceLoader cache refresh completed at Wed May 10 03:02:23 UTC 2017 (duration 6m 37s)
  • 02:55 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 06m 50s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 09s)
  • 00:29 maxsem@tin: Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/352980/3 (duration: 00m 42s)
  • 00:12 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable ORES on fiwiki (T163011) (duration: 00m 43s)
  • 00:10 RoanKattouw: Running extensions/ORES/maintenance/PopulateDatabase.php on fiwiki

2017-05-09

  • 23:54 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable RCFilters beta feature on all remaining wikis (T144458) (duration: 00m 44s)
  • 23:33 mutante: db1040 - remove from puppet, puppet node clean/deactivate, deleted salt-key, remove from icinga by running puppet on tegmen after that (T164057)
  • 23:23 demon@tin: Finished scap: rebuilding l10n for extension-list swap (duration: 34m 10s)
  • 23:13 mutante: analytics1027 - decom: revoke puppet cert, delete salt key, puppet node clean/deactivate, check icinga removal (T161597)
  • 22:49 demon@tin: Started scap: rebuilding l10n for extension-list swap
  • 22:46 reedy@tin: Synchronized wmf-config/extension-list-wikitech: Consistency (duration: 00m 42s)
  • 22:20 reedy@tin: Synchronized wmf-config/wikitech.php: Disable Semantic extensions (duration: 00m 42s)
  • 22:03 reedy@tin: scap aborted: (no justification provided) (duration: 00m 03s)
  • 22:03 reedy@tin: Started scap: (no justification provided)
  • 21:40 twentyafterfour: Mediawiki train group0 finished, will resume tomorrow with group 1 wikis. refs T162954
  • 21:32 twentyafterfour: group0 wikis to 1.30.0-wmf.1 refs T162954
  • 21:32 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.1
  • 20:52 twentyafterfour@tin: Finished scap: MediaWiki sync new branch wmf/1.30.0-wmf.1 + localization cache and deploy to testwikis refs T162954 (duration: 29m 41s)
  • 20:22 twentyafterfour@tin: Started scap: MediaWiki sync new branch wmf/1.30.0-wmf.1 + localization cache and deploy to testwikis refs T162954
  • 19:47 maxsem@tin: Finished deploy [kartotherian/deploy@740235c]: https://gerrit.wikimedia.org/r/#/c/352886/ (duration: 05m 35s)
  • 19:42 maxsem@tin: Started deploy [kartotherian/deploy@740235c]: https://gerrit.wikimedia.org/r/#/c/352886/
  • 18:39 bblack: varnish: manually etting runtime lru_interval / nuke_limit via varnishadm for all clusters' backends to match start-time change in https://gerrit.wikimedia.org/r/#/c/352827/
  • 18:26 subbu: updated Parsoid to 9d8badc8 (T151277)
  • 18:22 ssastry@tin: Finished deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8 (duration: 07m 09s)
  • 18:16 mepps: updated SmashPig from 200f63e to 0145e2d
  • 18:15 ssastry@tin: Started deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8
  • 17:29 elukey: executing varnish-backend-restart on cp1072 as attempt to mitigate "FetchError Could not get storage" and "ExpKill LRU_Fail" - T145661
  • 17:25 elukey: executing varnish-backend-restart on cp1074 as attempt to mitigate "FetchError Could not get storage" and "ExpKill LRU_Fail" - T145661
  • 17:23 twentyafterfour: Preparing to branch 1.30.0-wmf.1 [ T162954 ]
  • 16:08 elukey: playing with mw2146 for T163674
  • 16:00 elukey: stopping Hadoop daemons and shutting down analytics[1032-1033,1040].eqiad.wmnet - T132256
  • 15:20 moritzm: installing rpcbind/libtirpc security updates on ms1001
  • 15:15 moritzm: uploaded kubernetes 1.5.5-1+wmf1 to stretch-wikimedia/experimental
  • 15:02 urandom: starting instances restbase2005
  • 14:55 moritzm: repooled mw1264 after hardware error has been fixed (and scap pull)
  • 14:45 hashar: European SWAT completed
  • 14:39 bblack: varnish: varnishadm runtime set default_ttl=86400 for text+upload fe+be layers via cumin, to match deployed start-time changes in https://gerrit.wikimedia.org/r/#/c/352826/
  • 14:22 hashar@tin: Finished scap: (no justification provided) (duration: 03m 10s)
  • 14:19 hashar@tin: Started scap: (no justification provided)
  • 14:16 elukey: correction: reboot kafka2001 for kernel upgrades (eventbus codfw)
  • 14:16 elukey: reboot kafka1001 for kernel upgrades (eventbus codfw)
  • 14:10 hashar@tin: Finished scap: TwoColConflict update (duration: 19m 30s)
  • 14:09 marostegui: Stop MySQL and shutdown db1048 (phabricator slave) to replace BBU - T160731
  • 14:06 marostegui: Run pt-table-checksum on s7.fawiki - T163190
  • 13:51 hashar@tin: Started scap: TwoColConflict update
  • 13:49 hashar@tin: Synchronized php-1.29.0-wmf.21/extensions/TwoColConflict: BACKPORTS from master - T162806 T163886 (duration: 00m 41s)
  • 13:47 hashar@tin: Synchronized wmf-config/Wikibase-production.php: Enable sending Wikidata notification on Wikivoyage - T142103 (duration: 00m 39s)
  • 13:46 gehel: upgrade deployment-prep cluster to elasticsearch 5.3.2 - T163707
  • 13:44 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Create Autor and Portal namespaces on Spanish Wikisource - T164195 (duration: 00m 39s)
  • 13:39 gehel: cancel upgrading elasticsearch on relforge (plugin under test is missing a release for 5.3.2) - T163703
  • 13:35 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Allow page move only autopatrolled at hiwiki - T164239 (duration: 00m 42s)
  • 13:33 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Allow new page patroll for autoconfirmed users on bnwiki - T164159 (duration: 00m 40s)
  • 13:26 ayounsi@tin: Finished deploy [librenms/librenms@b10cc7c]: (no justification provided) (duration: 00m 04s)
  • 13:26 ayounsi@tin: Started deploy [librenms/librenms@b10cc7c]: (no justification provided)
  • 13:25 hashar@tin: Synchronized php-1.29.0-wmf.21/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: Fix the container calculation for template editor - T163105 (duration: 00m 40s)
  • 13:23 gehel: upgrading elasticsearch on relforge - T163703
  • 13:11 reedy@tin: Synchronized wmf-config/extension-list: PageTriage to extension.json in extension-list (duration: 00m 39s)
  • 13:08 reedy@tin: Synchronized wmf-config/mobile.php: wfLoadExtension for ZeroBanner (duration: 00m 41s)
  • 13:02 moritzm: rebooting restbase2004 for update to Linux 4.9 and new OpenJDK
  • 12:34 gehel: upgrade ELK on deplyoment-logstash2
  • 12:19 moritzm: rebooting restbase2003 for update to Linux 4.9 and new OpenJDK
  • 11:47 marostegui: Stop replication at the same position on db1049 and db2023 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 11:45 marostegui: Disable replication codfw > eqiad on s5 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 11:35 moritzm: rebooting restbase2002 for update to Linux 4.9 and new OpenJDK
  • 11:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore original weight for db1097 - T147166 T130067 (duration: 00m 39s)
  • 11:03 elukey: forced net.netfilter.nf_conntrack_tcp_timeout_time_wait = 65 to all the kafka brokers
  • 10:39 ayounsi@tin: Finished deploy [librenms/librenms@259e998]: (no justification provided) (duration: 00m 09s)
  • 10:39 ayounsi@tin: Started deploy [librenms/librenms@259e998]: (no justification provided)
  • 10:35 akosiaris@tin: Finished deploy [librenms/librenms@259e998]: (no justification provided) (duration: 00m 02s)
  • 10:35 akosiaris@tin: Started deploy [librenms/librenms@259e998]: (no justification provided)
  • 10:34 elukey: reboot kafka1022 for kernel upgrades
  • 10:09 elukey: reboot kafka1020 for kernel upgrades
  • 09:57 moritzm: restarting hhvm on mw1190, deadlocked in HPHP::Treadmill::getAgeOldestRequest
  • 09:41 marostegui: Stop replication at the same position on db1097 and db2019 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 09:37 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 - T147166 T130067 (duration: 00m 41s)
  • 09:21 marostegui: Disable replication codfw > eqiad on s4 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 09:12 marostegui: Run pt-table-checksum on s7.eswiki - T163190
  • 09:07 hoo: Removed 2fa from global account Jcornelius (T164682)
  • 08:05 godog: roll-restart swift proxy for ratelimit middleware - T162793
  • 07:53 moritzm: uploaded kubernetes 1.4.2-6 for stretch-wikimedia to apt.wikimedia.org
  • 07:34 moritzm: removing unneeded rpcbind/nfs-common packages (T106477)
  • 07:31 marostegui: Stop replication at the same position on db1050 and db2028 - T147166 T130067
  • 07:27 marostegui: Disable replication codfw > eqiad on s6 - T147166 T130067
  • 07:12 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool es2019 - T149526 (duration: 00m 39s)
  • 07:11 elukey: reboot kafka1014 for kernel upgrades
  • 07:01 _joe_: installing the new version of python-service-checker across the fleet
  • 06:37 marostegui: Run pt-table-checksum on s7.cawiki - T163190
  • 06:01 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2038 - T162539 T163548 (duration: 00m 41s)
  • 05:54 marostegui: Deploy alter table on wikidatawiki.wb_terms on codfw master db2023 - https://phabricator.wikimedia.org/T162539 - https://phabricator.wikimedia.org/T163548
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Tue May 9 02:27:46 UTC 2017 (duration 5m 58s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 17s)

2017-05-08

  • 23:21 bd808@tin: Synchronized wmf-config/wikitech.php: Revert "Disable creation of new forms on wikitech" (T53642) (duration: 01m 10s)
  • 22:54 bd808@tin: Finished deploy [striker/deploy@00e8545]: openstack: Role modifications require global admin rights (T164787) (duration: 00m 27s)
  • 22:54 bd808@tin: Started deploy [striker/deploy@00e8545]: openstack: Role modifications require global admin rights (T164787)
  • 22:17 bd808: Deleted 2fa for user Mdann52 on wikitech after verifying account ownership via ssh file creation. T164804
  • 22:01 andrewbogott: rebooting labservices1002 to mess with the bios
  • 21:55 bblack: depooled cp3035 (memory issues - already schedule for FE restart to fix, which will repool when it's reached in the list...)
  • 21:25 mobrovac@tin: Finished deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046 (duration: 01m 39s)
  • 21:24 mobrovac@tin: Started deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046
  • 21:22 mobrovac@tin: Finished deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046 (duration: 03m 51s)
  • 21:18 mobrovac@tin: Started deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046
  • 21:17 mobrovac@tin: Finished deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046 (duration: 00m 38s)
  • 21:17 mobrovac@tin: Started deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046
  • 21:12 mobrovac@tin: Finished deploy [restbase/deploy@c70a1e1]: Remove the mobile-text end point - T158128 (duration: 06m 23s)
  • 21:06 mobrovac@tin: Started deploy [restbase/deploy@c70a1e1]: Remove the mobile-text end point - T158128
  • 21:05 arlolra@tin: Finished deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8 (duration: 02m 43s)
  • 21:02 arlolra@tin: Started deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8
  • 20:48 arlolra@tin: Finished deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8 (duration: 01m 36s)
  • 20:47 arlolra@tin: Started deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8
  • 20:37 gehel: silencing elasticsearch shard incinga check, recovery after upgrade is going to take a long time - T161908
  • 20:34 arlolra@tin: Finished deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8 (duration: 04m 50s)
  • 20:30 arlolra@tin: Started deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8
  • 20:27 gehel: restarted kibana on logstash cluster - T161908
  • 20:21 gehel: upgrading kibana on logstash cluster - T161908
  • 20:02 gehel: restarting elasticsearch on logstash cluster after upgrade - T161908
  • 19:47 gehel: logstash / elasticsearch downtime coming up - T161908
  • 19:34 bd808: Deployment of Striker for T162508 complete; will continue debug keystone issue that is preventing Tool Labs membership requests from being approved
  • 19:34 bblack: restarted varnishxcache service on cp3031, was malfunctioning and sending crazy stats to grafana...
  • 19:28 gehel: starting ELK (logstash) upgrade - T161908
  • 19:17 bd808@tin: Finished deploy [striker/deploy@3836477]: Implement Tool Labs membership application and processing (T162508) (duration: 00m 32s)
  • 19:17 bd808@tin: Started deploy [striker/deploy@3836477]: Implement Tool Labs membership application and processing (T162508)
  • 19:15 bd808: Forced puppet run on californium to provision new striker config settings
  • 19:07 bd808: Applied database migration for T162508 to striker database on m5-master
  • 18:58 MaxSem: Restarted tilerator and tileratorui across the cluster
  • 18:52 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: T164621 (duration: 00m 39s)
  • 18:50 bblack: running varnish frontend restarts to fix memory sizing on 256G+ hosts over the next ~4.5 h (mostly text+upload hosts)
  • 18:49 bblack: cp4006 repooled (frontend restarted)
  • 18:45 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: T164498 (duration: 00m 39s)
  • 18:44 bblack: running varnish frontend restarts to fix memory sizing on 96GB and 192GB hosts over the next ~45m (mostly maps+misc hosts)
  • 18:41 catrope@tin: Synchronized php-1.29.0-wmf.21/extensions/Popups/: T163198 (duration: 00m 39s)
  • 18:40 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp4006.ulsfo.wmnet
  • 18:40 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp4006.ulsfo.wmnet
  • 18:38 catrope@tin: Synchronized php-1.29.0-wmf.21/extensions/VisualEditor/extension.json: T164472 (duration: 00m 39s)
  • 18:36 catrope@tin: Synchronized php-1.29.0-wmf.21/extensions/VisualEditor/modules/ve-mw/dm/metaitems/ve.dm.MWFlaggedMetaItem.js: T164054 (duration: 00m 38s)
  • 18:33 catrope@tin: Synchronized php-1.29.0-wmf.21/includes: T100999 (duration: 01m 24s)
  • 18:33 maxsem@tin: Finished deploy [tilerator/deploy@001811e]: 001811e, was in testing for 3 weeks (duration: 00m 20s)
  • 18:32 maxsem@tin: Started deploy [tilerator/deploy@001811e]: 001811e, was in testing for 3 weeks
  • 18:30 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: T164614 (duration: 00m 40s)
  • 18:26 catrope@tin: Synchronized static/images/project-logos/: T163048 (duration: 00m 39s)
  • 18:15 gehel: restarting wdqs-updater
  • 17:08 gehel@tin: Finished deploy [wdqs/wdqs@e637cf0]: (no justification provided) (duration: 01m 36s)
  • 17:07 gehel@tin: Started deploy [wdqs/wdqs@e637cf0]: (no justification provided)
  • 16:27 _joe_: installing the new service-checker on restbase2001,scb2001
  • 16:01 papaul: ganeti200[7-8] - signing puppet certs, salt-key, initial run
  • 15:40 papaul: OS install on ganeti200[7-8]
  • 15:28 bblack: cp4016 repooled
  • 14:23 _joe_: uploading new version of service-checker to reprepro
  • 14:20 zeljkof: eu swat finished!
  • 14:19 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: $wmgRelatedArticlesShowInSidebar is now undefined (duration: 00m 39s)
  • 14:19 marostegui: Run pt-table-checksum on s7.arwiki - T163190
  • 14:15 chasemp: touch /forcefsck && /sbin/reboot labservices1002
  • 14:09 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Bengali logo to mobile site (T164652) (duration: 00m 39s)
  • 14:08 zfilipin@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-bn.svg: SWAT: Add Bengali logo to mobile site (T164652) (duration: 00m 39s)
  • 14:02 zeljkof: extending eu swat for a few minutes
  • 13:55 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: pagePreviews: Fix NavPopups gadget detection (T164044) (duration: 00m 39s)
  • 13:47 chasemp: labservices1002 'touch /forcefsck && sudo reboot'
  • 13:45 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Wikivoyage should show related pages in footer of skin (T164391) (duration: 00m 39s)
  • 13:44 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Wikivoyage should show related pages in footer of skin (T164391) (duration: 00m 39s)
  • 13:42 moritzm: depooled mw1264 (set to inactive), since the host is down (T164725)
  • 13:07 moritzm: restarting cassandra on restbase2001 to pick up openjdk security updates
  • 11:15 ladsgroup@tin: Synchronized wmf-config/Wikibase-production.php: USe redis lockManager for change dispatching (T159826) (duration: 00m 56s)
  • 11:14 Amir1: start of ladsgroup@tin:/srv/mediawiki-staging$ scap sync-file wmf-config/Wikibase-production.php 'USe redis lockManager for change dispatching (T159826)'
  • 09:54 moritzm: upgrading mw1261-mw1264 to Linux 4.9
  • 09:30 godog: swift eqiad-prod: ms-be1028/ms-be1039 object weight 2000 - T160640
  • 09:25 elukey: rolling restart of cassandra on aqs* hosts to pick up new jvm upgrades
  • 09:17 godog: swift codfw-prod: more ms-be2001/ms-be2012 decom - T162785
  • 08:55 elukey: restart Kafka mirror maker on kafka101[24]
  • 08:47 elukey: reboot kafka1013 for kernel upgrades
  • 08:25 godog: swift eqiad-prod: ms-be1028/ms-be1039 container/account full weight - T160640
  • 08:06 Amir1: clean up party of ores_classification is done now (T159753) 10M rows deleted. Current number of rows: 76,586,043
  • 06:31 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Decommission db1024 - T162699 (duration: 00m 39s)
  • 06:30 marostegui@tin: Synchronized wmf-config/db-codfw.php: Decommission db1024 - T162699 (duration: 00m 39s)
  • 06:18 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2045, depool db2038 - T162539 T163548 (duration: 00m 40s)
  • 05:09 Amir1: start of cleaning up ores_classification rows for two hours (T159753)
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Mon May 8 02:27:37 UTC 2017 (duration 5m 58s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 25s)

2017-05-07

  • 21:09 elukey: depooled cp4016.ulsfo.wmnet (sudo -i depool from localhost) due to issues with vhtcpd (segfaults in dmesg).
  • 17:20 andrewbogott: clearing out broken instances in the nova fullstack queue and restarting the tests.
  • 17:12 andrewbogott: rebooting labservices1002 in hopes of getting its IO unstuck
  • 16:52 andrewbogott: switching primary designate server from labservices1002 to labservices1001
  • 16:07 andrewbogott: restarted designate-central on labservices1002 due to many log messages like 'Deadlock detected. Retrying...'
  • 16:05 andrewbogott: restarted pdns and pdns-recursor on labcontrol1002
  • 09:08 ema: cp4018: restart vhtcpd and varnish services; repool
  • 08:43 elukey: depooled cp4018.ulsfo.wmnet (sudo -i depool from localhost) due to issues with HTCP)
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Sun May 7 02:27:14 UTC 2017 (duration 5m 59s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 07m 49s)

2017-05-06

  • 02:30 l10nupdate@tin: ResourceLoader cache refresh completed at Sat May 6 02:30:10 UTC 2017 (duration 6m 2s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 07m 38s)

2017-05-05

  • 20:21 demon@tin: Synchronized scap/scap.cfg: no-op (duration: 00m 39s)
  • 18:54 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 after maintenance (duration: 00m 40s)
  • 18:21 mutante: ocg1002 - apt-get clean'ed for disk space
  • 16:09 jynus: shutting down db1070 for hw maintenance T160392
  • 16:06 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 for hw maintenance (duration: 00m 39s)
  • 15:30 jynus: running schema change on puppet.fact_values (m1)
  • 15:28 marostegui: Deploy alter table on wikidatawiki.wb_terms - db2045 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 15:28 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2052, depool db2045 - T162539 T163548 (duration: 00m 41s)
  • 15:18 elukey: increase nginx error log verbosity on mw2146 as test for T163674 (correct task)
  • 15:13 elukey: increase nginx error log verbosity on mw2146 as test for T164586
  • 15:04 bblack: nginx upgrading to 1.11.10-1+wmf1 on cache_upload
  • 14:59 bblack: nginx upgrading to 1.11.10-1+wmf1 on cache_text
  • 14:41 bblack: restarting all maps+misc varnish frontends for mem sizing update (spread over the next ~1.5h)
  • 14:30 bblack: restarting varnish frontend on cp4010 (text) for mem size update
  • 13:45 moritzm: installing remaining freetype security updates
  • 13:40 akosiaris@tin: Finished deploy [librenms/librenms@c0aa3ca]: Deploy WMF specific pages to librenms (duration: 00m 03s)
  • 13:39 akosiaris@tin: Started deploy [librenms/librenms@c0aa3ca]: Deploy WMF specific pages to librenms
  • 13:28 urandom: T163292: bootstrapping Cassandra on restbase1008-c
  • 13:25 chasemp: labstore1005/1004 'dpkg -i /home/jmm/*deb' for rpcbind fix (these are new security packages from mortizm)
  • 12:34 akosiaris@tin: Finished deploy [librenms/librenms@9fa1391]: (no justification provided) (duration: 00m 07s)
  • 12:34 akosiaris@tin: Started deploy [librenms/librenms@9fa1391]: (no justification provided)
  • 12:16 elukey: reboot kafka1018 for kernel upgrades
  • 11:30 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 03s)
  • 11:30 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:29 moritzm: installing openjdk-8 security updates/cassandra restarts on restbase staging clusters
  • 11:26 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s)
  • 11:26 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:17 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s)
  • 11:17 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:09 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 14s)
  • 11:08 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:02 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 23s)
  • 11:02 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:01 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 01s)
  • 11:01 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:01 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s)
  • 11:00 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:00 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 03s)
  • 11:00 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 10:58 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s)
  • 10:58 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 10:57 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 13s)
  • 10:57 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 09:00 elukey: re-arm keyholder on mira (new scap key added for librenms)
  • 08:48 elukey: re-arming keyholder on naos
  • 08:46 godog: swift codfw-prod: ms-be2001 - ms-be2012 weight 700 - T162785
  • 07:49 marostegui: Deploy alter table on wikidatawiki.wb_terms - dbstore1002 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 07:11 marostegui: Deploy alter table on wikidatawiki.wb_terms - dbstore2001 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 06:45 ema: starting cache_upload upgrades to varnish 4.1.6-1wm1
  • 05:55 marostegui: Deploy alter table on wikidatawiki.wb_terms - db2052 - T162539 T163548
  • 05:55 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2059, depool db2052 - T162539 T163548 (duration: 00m 40s)
  • 04:21 mutante: scheduled long downtime for mailman I/O stats on fermium - until we find better ways to deal with the normal spikes causing alerts
  • 02:38 l10nupdate@tin: ResourceLoader cache refresh completed at Fri May 5 02:38:35 UTC 2017 (duration 5m 14s)
  • 02:33 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 29s)
  • 01:26 urandom: T163292: starting bootstrap of restbase1018-b

2017-05-04

  • 20:06 maxsem@tin: Synchronized php-1.29.0-wmf.21/extensions/JsonConfig: https://gerrit.wikimedia.org/r/#/c/351749/ (duration: 00m 40s)
  • 19:01 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: T164407 wgCognateReadOnly false for medium wikis (duration: 00m 39s)
  • 18:18 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: T164407 wgCognateReadOnly false for small wikis (duration: 00m 40s)
  • 17:32 bblack: nginx upgrading to 1.11.10-1+wmf1 on cache_maps
  • 17:30 thcipriani@tin: Synchronized php-1.29.0-wmf.21/extensions/Cognate: Add stats tracking for CognateRepo method usage (duration: 00m 39s)
  • 17:01 thcipriani@tin: Synchronized wmf-config: Revert revert Enable Cognate for Wiktionary in Read Only mode T164407 (duration: 00m 40s)
  • 16:59 thcipriani@tin: Synchronized php-1.29.0-wmf.21/extensions/Cognate/src/CognateStore.php: Construct DBReadOnlyError with null db (duration: 00m 39s)
  • 16:55 urandom: T163292: Starting bootstrap of restbase1018-a
  • 16:49 thcipriani@tin: Synchronized wmf-config: Revert Enable Cognate for Wiktionary in Read Only mode T164407 (duration: 00m 40s)
  • 16:42 thcipriani@tin: Synchronized wmf-config: Enable Cognate for Wiktionary in Read Only mode T164407 (duration: 00m 40s)
  • 16:29 thcipriani@tin: Synchronized php-1.29.0-wmf.21/extensions/Cognate: SWAT: Add read only mode T164407 (duration: 00m 56s)
  • 16:18 bblack: nginx upgraded to 1.11.10-1+wmf1 on all cache_misc
  • 16:14 thcipriani@tin: Synchronized README: test tin is back (duration: 01m 06s)
  • 16:09 filippo@tin: scap aborted: README (duration: 00m 28s)
  • 16:09 filippo@tin: Started scap: README
  • 16:03 urandom: T160759: restoring default Cassandra tombstone_threshold in eqiad
  • 16:00 godog: switch deployment server back to tin.eqiad.wmnet
  • 15:57 jynus@naos: Synchronized wmf-config/db-eqiad.php: Remove all read traffic from x1, es2 & es3-master-eqiad (duration: 01m 08s)
  • 15:45 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-rw,name=codfw
  • 15:45 bblack: nginx upgraded to 1.11.10-1+wmf1 on cp1051 (cache_misc)
  • 15:42 bblack: nginx upgraded to 1.11.10-1+wmf1 on cp1045 (cache_misc)
  • 15:36 godog: run-puppet-agent on cache_upload in codfw/swift for swift a/p in codfw
  • 15:34 chasemp: add cwd to acl*procurement-review for phab S4
  • 15:32 godog: run-puppet-agent on cache_upload in codfw/swift for swift a/a
  • 15:31 oblivian:: Setting swift-rw in eqiad UP
  • 15:31 oblivian:: Setting switft-rw in codfw DOWN
  • 15:16 marostegui: Deploy alter table on wikidatawiki.wb_terms - db2059- https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 15:15 chasemp: labsdb1003 maintain-views --databases ptwikimedia,pawikisourcewbwikimedia,dtywiki --replace-all --debug T164103
  • 15:14 marostegui@naos: Synchronized wmf-config/db-codfw.php: Repool db2066, depool db2059 - T162539 T163548 (duration: 01m 06s)
  • 15:03 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Restore db1070 original weight - T160392 (duration: 00m 57s)
  • 14:46 oblivian:: Setting wdqs in codfw UP
  • 14:44 oblivian:: Setting restbase-async in eqiad DOWN
  • 14:43 oblivian:: Setting restbase in codfw DOWN
  • 14:43 _joe_: forcing a puppet run on cache (text,maps, misc) in eqiad/codfw to complete the switchback
  • 14:40 oblivian:: Setting restbase in eqiad UP
  • 14:39 oblivian:: Setting restbase-async in codfw UP
  • 14:36 moritzm: installing mysql-connector-java security updates on hadoop cluster
  • 14:35 _joe_: running puppet on varnishes in eqiad (text,misc,maps) to pick up the a/a traffic to services
  • 14:29 jynus: dropping and recreating user for maintain-views on labsdb1001 T164103
  • 14:24 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Increase db1070 weight - T160392 (duration: 01m 10s)
  • 14:23 chasemp: maintain-meta_p --databases dtywiki,pawikisource,ptwikimedia,wbwikimedia --debug labsdb1003 for T164103
  • 14:16 chasemp: maintain-meta_p --all-databases --purge --debug labsdb1001 for T164103
  • 14:09 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1070 with less weight - T160392 (duration: 01m 16s)
  • 14:03 chasemp: maintain-meta_p --all-databases --purge --debug labsdb1009/1010/1011 for T164103
  • 13:31 gehel: restart services on maps eqiad
  • 13:21 dereckson@naos: Synchronized wmf-config/throttle.php: Lift Account registration limit for cywiki for an event / T164482 (duration: 01m 08s)
  • 13:18 gehel: restart services on maps codfw
  • 13:15 gehel: restart services on maps-test
  • 12:42 marostegui: Stop MySQL db1070 for maintenance - T160392
  • 12:40 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Depool db1070 for maintenance - T160392 (duration: 01m 35s)
  • 12:28 marostegui: Deploy alter table enwiki.revision on dbstore1001 - T132416
  • 11:56 moritzm: installing mysql-connector-java security updates
  • 11:45 ema: starting cache_text upgrades to varnish 4.1.6-1wm1
  • 11:38 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Remove db1022 from config files as it will be decommissioned - T163778 (duration: 01m 06s)
  • 11:36 marostegui@naos: Synchronized wmf-config/db-codfw.php: Remove db1022 from config files as it will be decommissioned - T163778 (duration: 01m 25s)
  • 10:48 moritzm: installing tomcat security updates
  • 10:22 elukey: executed DEL ocg_job_status on rdb1007:6379 (new ocg_job_status hash is stored on the ocg* hosts) - T159850
  • 10:11 moritzm: restarting hhvm on mediawiki canaries to pick up freetype security update
  • 10:05 ema: restart varnish-be on cp2024 without RT experiment
  • 09:40 elukey: stop kafka on kafka1012 and reboot the host for kernel upgrade
  • 09:16 joal@naos: Finished deploy [analytics/refinery@9d35029]: (no justification provided) (duration: 02m 58s)
  • 09:13 joal@naos: Started deploy [analytics/refinery@9d35029]: (no justification provided)
  • 08:50 marostegui@naos: Synchronized wmf-config/db-codfw.php: Remove db1040 from config files as it will be decommissioned - T164057 (duration: 00m 48s)
  • 08:49 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Remove db1040 from config files as it will be decommissioned - T164057 (duration: 00m 55s)
  • 08:23 gehel: restart elasticsearch on relforge for JDK update
  • 07:59 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Remove tempdb2001 from config files as it will be decommissioned - T161712 (duration: 01m 07s)
  • 07:58 marostegui@naos: Synchronized wmf-config/db-codfw.php: Remove tempdb2001 from config files as it will be decommissioned - T161712 (duration: 01m 25s)
  • 07:25 _joe_: restarted cp3043 backend varnish at 7:13 UTC while trying to debug issues
  • 06:58 moritzm: installing freetype security updates
  • 06:26 marostegui@naos: Synchronized wmf-config/db-codfw.php: Depool tempdb2001, no longer needed - T161712 (duration: 01m 08s)
  • 06:17 marostegui: Stop MySQL on tempdb2001 to take a backup and prepare to decomission - T161712
  • 06:10 marostegui: Deploy alter table on wikidatawiki.wb_terms - db2066 - T162539 T163548
  • 06:10 marostegui@naos: Synchronized wmf-config/db-codfw.php: Depool db2066 - T162539 T163548 (duration: 01m 25s)
  • 06:09 Dereckson: CentralAuth: Removed MediaWiki 2FA for Alexsh (T164265)
  • 06:03 marostegui: Deploy alter table on wikidatawiki.wb_terms - dbstore2002 - T162539 T163548
  • 02:31 l10nupdate@naos: ResourceLoader cache refresh completed at Thu May 4 02:31:22 UTC 2017 (duration 5m 21s)
  • 02:26 l10nupdate@naos: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 02s)
  • 02:08 mobrovac@naos: Finished deploy [restbase/deploy@4d04dfd]: blacklist dewiki page, take 3a (duration: 08m 37s)
  • 02:00 urandom: T160759: lowering tombstone threshold to 1000 on all eqiad nodes
  • 01:59 mobrovac@naos: Started deploy [restbase/deploy@4d04dfd]: blacklist dewiki page, take 3a
  • 01:58 mobrovac@naos: Finished deploy [restbase/deploy@4d04dfd]: blacklist dewiki page, take 3 (duration: 03m 29s)
  • 01:54 mobrovac@naos: Started deploy [restbase/deploy@4d04dfd]: blacklist dewiki page, take 3
  • 01:51 mobrovac@naos: Finished deploy [restbase/deploy@4d04dfd]: Blacklist a page on dewiki (duration: 03m 28s)
  • 01:47 mobrovac@naos: Started deploy [restbase/deploy@4d04dfd]: Blacklist a page on dewiki
  • 01:47 mobrovac@naos: Finished deploy [restbase/deploy@4d04dfd]: Blacklist a page on dewiki (duration: 04m 12s)
  • 01:42 mobrovac@naos: Started deploy [restbase/deploy@4d04dfd]: Blacklist a page on dewiki
  • 01:22 urandom_: T160759: lowering tombstone_threshold on restbase1013 & restbase1014
  • 01:09 urandom_: T160759: starting restbase1012-a

2017-05-03

  • 22:59 RainbowSprinkles: gerrit: Quick restart to pick up logging config change
  • 22:47 ejegg: updated fundraising tools from 20afe9d to f2522cd
  • 22:23 ejegg: updated fundraising tools from a1e9342 to 20afe9d
  • 21:06 demon@naos: Synchronized README: No-op, forcing co-master sync (duration: 02m 28s)
  • 20:35 mutante: mw1167 - same as mw1166 (jobrunners) - there was a hhvm[12547]: Fatal error: unknown exception followed by mysql slow query, SELECT MASTER_TID_WAIT... | systemctl restart hhvm recovers it
  • 20:30 mutante: mw1166 - restart hhvm service (Fatal error: request has exceeded memory limit)
  • 20:13 urandom: T160759: restoring default tombstone thresholds, restbase10{3,4,6}
  • 19:57 mutante: mw1287 - also restarting hhvm (with systemctl restart)
  • 19:56 mutante: mw1287 - restarted crashed apache (proxy_fcgi:error)
  • 19:48 demon@naos: Finished scap: Cleaning up some unused branches, no-op (duration: 15m 13s)
  • 19:33 demon@naos: Started scap: Cleaning up some unused branches, no-op
  • 19:32 demon@naos: Pruned MediaWiki: 1.29.0-wmf.18 (duration: 00m 19s)
  • 19:30 demon@naos: Pruned MediaWiki: 1.29.0-wmf.20 [keeping static files] (duration: 00m 44s)
  • 19:27 ppchelko@naos: Finished deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 attempt #2 - checks timeout (duration: 01m 39s)
  • 19:26 ppchelko@naos: Started deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 attempt #2 - checks timeout
  • 19:25 ppchelko@naos: Finished deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 (duration: 07m 39s)
  • 19:18 ppchelko@naos: Started deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759
  • 18:48 papaul: db2084 - signing puppet certs, salt-key, initial run
  • 18:48 urandom: T160759: reducing tombstone threshold to 1000, restbase1014
  • 18:46 urandom: T160759: reducing tombstone threshold to 1000, restbase1016
  • 18:39 urandom: T160759: reducing tombstone threshold to 1000, restbase1013
  • 18:35 urandom: restarting restbase1016-c
  • 18:34 urandom: restarting restbase1013-b
  • 18:00 bblack: restart cp2005 backend (lag)
  • 17:34 moritzm: uploaded openjdk-8 u131 to apt.wikimedia.org
  • 17:14 jynus@naos: Synchronized wmf-config/InitialiseSettings.php: Disable cognate- it is causing an outage on x1 (duration: 01m 06s)
  • 16:30 jynus@naos: Synchronized wmf-config/db-eqiad.php: Fine-tune per-server load to reduce db connection errors (duration: 01m 27s)
  • 16:17 mutante: install2002 / db2084 - reverting live hack, re-enabling puppet. db2084 doesnt even talk to DHCP, all other new db servers are fine, just this one out of 22 is not. seems to be actually broken NIC, cable was switched, switch config was checked too
  • 16:08 mutante: install2002 - temp stop puppet to debug dhcp issue of db2084
  • 15:13 catrope@naos: Synchronized php-1.29.0-wmf.21/includes/logging/LogPager.php: Replace FORCE INDEX(ls_field_val) with IGNORE INDEX(ls_log_id) (https://gerrit.wikimedia.org/r/#/c/351653/ for T17441) (duration: 01m 14s)
  • 15:09 RoanKattouw: Live-hacked (cherry-picked) https://gerrit.wikimedia.org/r/#/c/351653/ onto naos and synced to mwdebug1002 for testing
  • 14:54 gehel: restart of elasticsearch on relforge
  • 14:43 END: (PASS) - Rolling restart of parsoid in codfw and eqiad - t09_restart_parsoid (switchdc/oblivian@neodymium)
  • 14:27 START: - Rolling restart of parsoid in codfw and eqiad - t09_restart_parsoid (switchdc/oblivian@neodymium)
  • 14:26 END: (PASS) - Update Tendril tree to start from the core DB masters in eqiad - t09_tendril (switchdc/oblivian@neodymium)
  • 14:25 START: - Update Tendril tree to start from the core DB masters in eqiad - t09_tendril (switchdc/oblivian@neodymium)
  • 14:25 godog: start swiftrepl on ms-fe1005
  • 14:24 END: (PASS) - Start MediaWiki jobrunners, videoscalers and maintenance in eqiad - t09_start_maintenance (switchdc/oblivian@neodymium)
  • 14:22 START: - Start MediaWiki jobrunners, videoscalers and maintenance in eqiad - t09_start_maintenance (switchdc/oblivian@neodymium)
  • 14:21 END: (PASS) - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/oblivian@neodymium)
  • 14:21 START: - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/oblivian@neodymium)
  • 14:20 END: (PASS) - Set MediaWiki in read-write mode in eqiad (db-eqiad config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 14:20 MediaWiki: read-only period ends at: 2017-05-03 14:20:28.286697 (switchdc/oblivian@neodymium)
  • 14:20 root@naos: Synchronized wmf-config/db-eqiad.php: Set MediaWiki in read-write mode in datacenter eqiad (duration: 00m 32s)
  • 14:19 START: - Set MediaWiki in read-write mode in eqiad (db-eqiad config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 14:19 END: (PASS) - Set core DB masters in read-write mode in eqiad, ensure masters in codfw are read-only - t07_coredb_masters_readwrite (switchdc/oblivian@neodymium)
  • 14:19 START: - Set core DB masters in read-write mode in eqiad, ensure masters in codfw are read-only - t07_coredb_masters_readwrite (switchdc/oblivian@neodymium)
  • 14:19 END: (PASS) - Switch the Redis masters from codfw to eqiad and invert the replication - t06_redis (switchdc/oblivian@neodymium)
  • 14:19 START: - Switch the Redis masters from codfw to eqiad and invert the replication - t06_redis (switchdc/oblivian@neodymium)
  • 14:18 END: (PASS) - Switch traffic flow to the appservers from codfw to eqiad - t05_switch_traffic (switchdc/oblivian@neodymium)
  • 14:17 START: - Switch traffic flow to the appservers from codfw to eqiad - t05_switch_traffic (switchdc/oblivian@neodymium)
  • 14:16 END: (FAIL) - Switch MediaWiki master datacenter and read-write discovery records from codfw to eqiad - t05_switch_datacenter (switchdc/oblivian@neodymium)
  • 14:16 root@naos: Synchronized wmf-config/CommonSettings.php: Switch MediaWiki active datacenter to eqiad (duration: 00m 31s)
  • 14:15 START: - Switch MediaWiki master datacenter and read-write discovery records from codfw to eqiad - t05_switch_datacenter (switchdc/oblivian@neodymium)
  • 14:15 END: (PASS) - Wipe and warmup caches in eqiad - t04_cache_wipe (switchdc/oblivian@neodymium)
  • 14:12 elukey: restart kafka-mirror-main-eqiad_to_analytics.service on kafka1012
  • 14:12 END: (PASS) - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 14:09 START: - Wipe and warmup caches in eqiad - t04_cache_wipe (switchdc/oblivian@neodymium)
  • 14:08 START: - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 14:08 END: (PASS) - Set core DB masters in read-only mode in codfw, ensure all masters are read-only - t03_coredb_masters_readonly (switchdc/oblivian@neodymium)
  • 14:08 START: - Set core DB masters in read-only mode in codfw, ensure all masters are read-only - t03_coredb_masters_readonly (switchdc/oblivian@neodymium)
  • 14:08 END: (PASS) - Set MediaWiki in read-only mode in codfw (db-codfw config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 14:07 root@naos: Synchronized wmf-config/db-codfw.php: Set MediaWiki in read-only mode in datacenter codfw (duration: 00m 45s)
  • 14:07 MediaWiki: read-only period starts at: 2017-05-03 14:07:08.261300 (switchdc/oblivian@neodymium)
  • 14:07 START: - Set MediaWiki in read-only mode in codfw (db-codfw config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 14:06 END: (PASS) - Stop MediaWiki jobrunners, videoscalers and cronjobs in codfw - t01_stop_maintenance (switchdc/oblivian@neodymium)
  • 14:01 START: - Stop MediaWiki jobrunners, videoscalers and cronjobs in codfw - t01_stop_maintenance (switchdc/oblivian@neodymium)
  • 14:00 godog: stop swiftrepl on ms-fe1005
  • 13:59 END: (PASS) - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/oblivian@neodymium)
  • 13:59 START: - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/oblivian@neodymium)
  • 13:59 END: (PASS) - Disabling puppet on selected hosts in codfw and eqiad - t00_disable_puppet (switchdc/oblivian@neodymium)
  • 13:58 START: - Disabling puppet on selected hosts in codfw and eqiad - t00_disable_puppet (switchdc/oblivian@neodymium)
  • 13:16 hashar: Restarting Jenkins
  • 13:06 marostegui: db1028: Increased /srv/ by 20G to clear the warning
  • 11:59 moritzm: rebooted kubernetes1002, not 1003
  • 11:59 moritzm: rebooting kubernetes1003 for update to Linux 4.9
  • 11:39 moritzm: rebooting kubernetes1001 for update to Linux 4.9
  • 11:37 oblivian@naos: Synchronized wmf-config: Changing the read-only reason for the DC switchover (T164177) (duration: 01m 20s)
  • 11:25 moritzm: uploaded nodepool 0.1.1+wmf7 to apt.wikimedia.org
  • 11:23 hashar: Upgrading Jenkins 2.46.1 -> 2.46.2 - T144106
  • 11:16 jynus: restarting replication on s*, and x1 eqiad -> codfw
  • 11:02 hashar: Restarting Nodepool
  • 10:58 moritzm: upgrading nodepool on labnodepool1001 to a package including https://gerrit.wikimedia.org/r/351608
  • 10:18 END: (PASS) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/oblivian@neodymium)
  • 10:17 START: - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/oblivian@neodymium)
  • 10:14 END: (PASS) - Set MediaWiki in read-write mode in codfw (db-codfw config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 10:14 START: - Set MediaWiki in read-write mode in codfw (db-codfw config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 10:14 END: (PASS) - Set MediaWiki in read-only mode in eqiad (db-eqiad config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 10:13 START: - Set MediaWiki in read-only mode in eqiad (db-eqiad config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 10:13 _joe_: testing reverted steps of switchdc, non-dry-run --dc-from eqiad --dc-to codfw (should be noop)
  • 10:05 moritzm: installing icu security updates on trusty (jessie already fixed)
  • 09:50 marostegui: Restart db1097 to change its binlog to STATEMENT - T155099
  • 09:19 elukey: reboot mc[1019-1036].eqiad.wmnet for kernel upgrades
  • 09:18 moritzm: rebooting restbase1018 for update to Linux 4.9
  • 09:05 godog: rebuild mismounted FSes on ms-be1035 - T163673
  • 08:53 _joe_: rebooting restbase1018 T163280
  • 08:24 _joe_: deactivating restbase1018-vg for RAID failover and rebuild T163280
  • 08:01 hashar: Rolling back Jenkins 2.46.2 -> 2.46.1 - T144106
  • 07:53 hashar: Upgrading Jenkins 2.46.1 -> 2.46.2 - T144106
  • 07:42 _joe_: rebuilding RAIDs on restbase1018 T163280
  • 07:35 hashar: Restarting Nodepool to catch up with python-jenkins 0.4.14
  • 07:35 moritzm: updated python-jenkins on labnodepool1001 to 0.4.14 (needed by latest Jenkins LTS)
  • 02:48 l10nupdate@naos: ResourceLoader cache refresh completed at Wed May 3 02:48:33 UTC 2017 (duration 5m 21s)
  • 02:43 l10nupdate@naos: scap sync-l10n completed (1.29.0-wmf.21) (duration: 14m 02s)
  • 01:41 mutante: kubernetes - puppet fails because "E: Unable to locate package cni

2017-05-02

  • 23:42 TimStarling: EtcdConfig changes all reverted
  • 23:17 tstarling@puppetmaster1001: conftool action : set/@read-only.yaml; selector: name=ReadOnly,scope=eqiad
  • 23:07 TimStarling: scap pull on mw2017 and mwdebug1001 for etcd testing
  • 23:00 TimStarling: locking scap on naos for deployment of EtcdConfig https://gerrit.wikimedia.org/r/#/c/351132/
  • 22:57 _joe_: upgrading python-conftool across the fleet
  • 22:38 mutante: gerrit (cobalt/gerrit2001) - deployed firewall change to allow ssh between gerrit servers for clustering, new iptables rules exist now (T152525)
  • 21:52 jynus: running previously failed alter tables on s3-eqiad T163912
  • 21:33 jynus: creating missing math table on bdwikimedia (s3)
  • 20:04 hashar: Restarting Jenkins for plugin rollback
  • 17:51 bblack: codfw->eqiad switchback: end-user edge traffic back to normal @ eqiad ( https://gerrit.wikimedia.org/r/#/c/351330/ ) - 10 minute TTL for bulk traffic pattern shift starts now.
  • 17:50 mobrovac@naos: Finished deploy [restbase/deploy@6adb0f2]: Include displaytitle and page_id in the summary output and bump the content type version - T163729 T164079 (duration: 06m 04s)
  • 17:48 papaul: new db servers signing puppet certs,salt-key, initial run
  • 17:44 mobrovac@naos: Started deploy [restbase/deploy@6adb0f2]: Include displaytitle and page_id in the summary output and bump the content type version - T163729 T164079
  • 17:40 END: (PASS) - Start MediaWiki jobrunners, videoscalers and maintenance in codfw - t09_start_maintenance (switchdc/volans@neodymium)
  • 17:39 mobrovac@naos: Finished deploy [restbase/deploy@6adb0f2]: (no justification provided) (duration: 01m 34s)
  • 17:38 START: - Start MediaWiki jobrunners, videoscalers and maintenance in codfw - t09_start_maintenance (switchdc/volans@neodymium)
  • 17:37 mobrovac@naos: Started deploy [restbase/deploy@6adb0f2]: (no justification provided)
  • 17:37 END: (PASS) - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/volans@neodymium)
  • 17:37 START: - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/volans@neodymium)
  • 17:35 END: (PASS) - Set MediaWiki in read-write mode in codfw - t08_stop_mediawiki_readonly (switchdc/volans@neodymium)
  • 17:35 MediaWiki: read-only period ends at: 2017-05-02 17:35:48.111079 (switchdc/volans@neodymium)
  • 17:35 START: - Set MediaWiki in read-write mode in codfw - t08_stop_mediawiki_readonly (switchdc/volans@neodymium)
  • 17:35 oblivian@puppetmaster1001: conftool action : set/val=test; selector: name=ReadOnly,scope=codfw
  • 17:33 END: (PASS) - Set core DB masters in read-write mode in codfw, ensure masters in eqiad are read-only - t07_coredb_masters_readwrite (switchdc/volans@neodymium)
  • 17:33 START: - Set core DB masters in read-write mode in codfw, ensure masters in eqiad are read-only - t07_coredb_masters_readwrite (switchdc/volans@neodymium)
  • 17:32 END: (PASS) - Switch the Redis masters from eqiad to codfw and invert the replication - t06_redis (switchdc/volans@neodymium)
  • 17:32 START: - Switch the Redis masters from eqiad to codfw and invert the replication - t06_redis (switchdc/volans@neodymium)
  • 17:31 END: (PASS) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:31 START: - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:23 END: (FAIL) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:23 START: - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:20 END: (PASS) - Switch traffic flow to the appservers from eqiad to codfw - t05_switch_traffic (switchdc/volans@neodymium)
  • 17:17 START: - Switch traffic flow to the appservers from eqiad to codfw - t05_switch_traffic (switchdc/volans@neodymium)
  • 17:08 END: (FAIL) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:08 START: - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:05 catrope@naos: Synchronized php-1.29.0-wmf.21/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: T164157 (duration: 01m 00s)
  • 17:03 END: (FAIL) - Set core DB masters in read-only mode in eqiad, ensure all masters are read-only - t03_coredb_masters_readonly (switchdc/volans@neodymium)
  • 17:03 START: - Set core DB masters in read-only mode in eqiad, ensure all masters are read-only - t03_coredb_masters_readonly (switchdc/volans@neodymium)
  • 16:58 END: (FAIL) - Set MediaWiki in read-only mode in eqiad - t02_start_mediawiki_readonly (switchdc/volans@neodymium)
  • 16:57 MediaWiki: read-only period starts at: 2017-05-02 16:57:37.952132 (switchdc/volans@neodymium)
  • 16:57 START: - Set MediaWiki in read-only mode in eqiad - t02_start_mediawiki_readonly (switchdc/volans@neodymium)
  • 16:56 ppchelko@naos: Finished deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements. Restart after a check timeout (duration: 07m 56s)
  • 16:53 END: (FAIL) - Stop MediaWiki jobrunners, videoscalers and cronjobs in eqiad - t01_stop_maintenance (switchdc/volans@neodymium)
  • 16:53 START: - Stop MediaWiki jobrunners, videoscalers and cronjobs in eqiad - t01_stop_maintenance (switchdc/volans@neodymium)
  • 16:52 END: (PASS) - Disabling puppet on selected hosts in eqiad and codfw - t00_disable_puppet (switchdc/volans@neodymium)
  • 16:51 START: - Disabling puppet on selected hosts in eqiad and codfw - t00_disable_puppet (switchdc/volans@neodymium)
  • 16:51 END: (PASS) - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/volans@neodymium)
  • 16:50 START: - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/volans@neodymium)
  • 16:50 END: (FAIL) - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/volans@neodymium)
  • 16:50 START: - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/volans@neodymium)
  • 16:48 ppchelko@naos: Started deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements. Restart after a check timeout
  • 16:47 volans: testing (not dry-run) tasks for tomorrow's switchover in reverse mode eqiad->codfw
  • 16:43 ppchelko@naos: Started deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements. Restart after a check fail
  • 16:42 ppchelko@naos: Finished deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements (duration: 05m 47s)
  • 16:37 ppchelko@naos: Started deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements
  • 16:36 END: (PASS) - Wipe and warmup caches in codfw - t04_cache_wipe (switchdc/oblivian@neodymium)
  • 16:32 END: (PASS) - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 16:32 _joe_: message about cache warmup is wrong, it is being executed in eqiad
  • 16:29 START: - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 16:29 START: - Wipe and warmup caches in codfw - t04_cache_wipe (switchdc/oblivian@neodymium)
  • 16:29 _joe_: testing (not dry-run) cache wipe/warmup and redis resync for the switchover codfw->eqiad
  • 16:25 papaul: OS install on new db servers
  • 16:16 elukey@naos: Synchronized wmf-config/ProductionServices.php: Replace Redis lock IPs after hw refresh (duration: 01m 16s)
  • 15:53 oblivian@puppetmaster1001: conftool action : set/@read-only.yaml; selector: name=ReadOnly,scope=eqiad
  • 15:36 ema: cache_misc: upgrade varnish to 4.1.6-1wm1
  • 15:24 _joe_: restarting confd in eqiad/esams to pick up the server change
  • 15:20 godog: add 100G to graphite1003 and graphite2002
  • 15:01 elukey: stop and masked memcached on mc10[01-18].eqiad.wmnet
  • 14:35 moritzm: rebooting rdb1007 for update to latest 4.4 kernel
  • 14:22 moritzm: rebooting rdb1005 for update to latest 4.4 kernel
  • 13:52 moritzm: rebooting rdb1003 for update to latest 4.4 kernel
  • 13:39 moritzm: rebooting rdb1001 for update to latest 4.4 kernel
  • 13:26 gehel: stopping load on elastic2020 - T149006
  • 13:15 ema: cache_maps: upgrade varnish to 4.1.6-1wm1
  • 13:13 gehel: load testing elastic2020 before putting it back in the cluster - T149006
  • 13:03 godog: rebuild mismounted FSes on ms-be1036 - T163673
  • 12:22 moritzm: rebooting rdb1008 for kernel update to Linux 4.9
  • 12:19 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=pdf,name=ocg1001.eqiad.wmnet
  • 12:15 _joe_: manually set ocg1001,3 to be redis slaves of ocg1002
  • 11:47 moritzm: rebooting rdb1006 for kernel update to Linux 4.9
  • 11:37 gehel: restart of relforge cluster to activate hebrew plugin
  • 11:30 moritzm: rebooting rdb1004 for kernel update to Linux 4.9
  • 11:23 hashar: Restarting Nodepool
  • 11:23 moritzm: downgraded python-jenkins on labnodepool1001 to 0.2.1 (0.4.11 is still broken with the new Jenkins LTS)
  • 11:06 moritzm: rebooting rdb1002 for kernel update to Linux 4.9
  • 10:51 hashar: Restarting Nodepool with python-jenkins 0.4.11
  • 10:50 moritzm: upgrading python-jenkins on labnodepool1001 to 0.4.11
  • 10:44 akosiaris: create new ganeti nodegroup called row_A holding ganeti2005, ganeti2006. Renamed the default nodegroup to row_B. T164011
  • 10:20 elukey: restart ocg on ocg1002 (localhost:8000 - frontend - not reachable)
  • 10:12 hashar: Upgrading Jenkins to 2.46.1 - T144106
  • 10:11 jynus: stopping replication on db1015
  • 09:58 END: (PASS) - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 09:56 START: - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 09:55 _joe_: testing pre-switchover the step to restart & resync redises in dc_to (eqiad)
  • 09:48 jynus@naos: Synchronized wmf-config/db-codfw.php: Add db1097 (duration: 01m 00s)
  • 09:47 jynus@naos: Synchronized wmf-config/db-eqiad.php: Depool db1015 & add db1097 (duration: 01m 17s)
  • 09:36 hashar: Jenkins/CI is back up!
  • 09:34 hashar: Nodepool can not add instances to Jenkins any more. Roll backing Jenkins to 2.32.3
  • 09:29 akosiaris: Set description for ganeti2005, ganeti2006 on asw-a-codfw. T164011
  • 09:27 akosiaris: create interface range ganeti on asw-a-codfw. T164011
  • 09:24 akosiaris: remove configuration from ge-8/0/0, ge-8/0/3 from asw-b-codfw for ganeti2005, ganeti2006 move to row A. T164011
  • 09:21 hashar: Starting Nodepool
  • 09:16 hashar: Stopping Nodepool
  • 09:14 hashar: OpenStack / wmflabs fails to create new instances
  • 08:40 hashar: Upgrading Jenkins to 2.46.2 - T144106
  • 08:40 elukey: run puppet and restart nutcracker on eqiad hosts with profile::mediawiki::nutcracker
  • 08:33 hashar: Upgrading Jenkins to 2.32.3 - T144106
  • 08:32 elukey: stop and mask redis on mc1001-mc1018 - T137345
  • 08:26 hashar: Upgrading Jenkins to 2.19.4 - T144106
  • 08:14 hashar: Installing Jenkins Pipeline plugin
  • 08:04 hashar: Installing Jenkins plugin Pipeline: Stage View https://plugins.jenkins.io/pipeline-stage-view
  • 08:04 hashar: Upgrading Jenkins to 2.7.4 - T144106
  • 07:59 elukey: Swap mc1001->mc1012 with mc1019->mc2030 - T137345 (more informative :)
  • 07:58 elukey: wap mc1001->mc1012 with mc1019->mc2030
  • 07:36 _joe_: starting etcd replication codfw => eqiad
  • 06:46 _joe_: disabling etcd auth on conf1*, converting to use nginx for TLS/auth T159687
  • 03:10 mattflaschen@naos: Synchronized php-1.29.0-wmf.21/extensions/FlaggedRevs/: Urgent deploy: Fix FlaggedRevs fatal, and also a filter issue: T164096 and T164049 (duration: 00m 56s)
  • 02:45 tstarling@naos: Synchronized php-1.29.0-wmf.21/includes/config/EtcdConfig.php: EtcdConfig backported bug fixes (duration: 01m 02s)
  • 02:34 tstarling@naos: Synchronized wmf-config/CommonSettings.php: siteinfo hook (duration: 02m 39s)
  • 00:33 tstarling@puppetmaster1001: conftool action : set/@read-write.yaml; selector: name=ReadOnly
  • 00:33 tstarling@puppetmaster1001: conftool action : set/@dc-codfw.yaml; selector: name=WMFMasterDatacenter
  • 00:25 TimStarling: populating production etcd with initial mediawiki config keys

2017-05-01

  • 23:41 mutante: netmon1002 - signed puppet cert, initial puppet run, accept salt-key,.. (T159756)
  • 23:15 mutante: netmon1002 - boot into PXE, initial OS install (T159756)
  • 23:06 bd808: Ran puppet cert clean striker-deploy03.striker.eqiad.wmflabs on labcontrol1001
  • 19:43 ejegg: updated payments-wiki from 4c56302 to 57451de
  • 19:10 mobrovac@naos: Finished deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version (duration: 02m 08s)
  • 19:08 mobrovac@naos: Started deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version
  • 18:46 mutante: temp. re-enabling puppet on restbase1018 and running it once to fix icinga config syntax error. then disabling it again. restbase service stopped before and after. this box has a broken disk.
  • 18:35 mutante: brought mc1018 back up, ran puppet on it and then on Icinga. parent was adjusted from asw-d-eqiad to asw2-2-eqiad. reduced icinga config errors by 50% :p (1 of 2 left, restbase1018)
  • 18:28 mutante: powercycling mc1018
  • 18:19 mutante: manually removed asw-d-eqiad remnants from /etc/icinga/puppet_hosts.cfg to fix icinga config after gerrit:351167 / T148506. fixes Icinga config error. then puppet adds it back
  • 18:03 andrewbogott: restarting nova-fullstack tests but saving instance 2d60e8c5-fb2a-4681-ac0a-ae2162bb13fb for future research
  • 17:03 mutante: phab2001 - start/stop phd service - that fixed "systemd state" icinga check, even though phd does not run just like before
  • 16:53 bblack: reverting inter-caching routing from codfw-switchover period: https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchback
  • 16:52 bblack@neodymium: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=cache_upload,name=cp107[1234].eqiad.wmnet
  • 16:19 mobrovac@naos: Finished deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514 (duration: 02m 19s)
  • 16:17 mobrovac@naos: Started deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514
  • 15:46 jynus: shutting down db1063 for maintenance T164107
  • 15:13 bblack: restarting varnish backend on cp2002 (mailbox issues)
  • 12:58 Amir1: cleaning ores_classification rows half an hour or so (T159753)
  • 11:31 jynus: running alter table on categorylinks on db1054, 68, 62 T164185
  • 11:25 jynus: running alter table on enwiki.categorylinks on db1052 T164185
  • 03:46 tstarling@naos: Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 01m 01s)
  • 03:44 tstarling@naos: Synchronized wmf-config/etcd.php: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 02m 39s)

2017-04-30

  • 16:35 urandom: T160759: Restoring default tombstone_threshold on restbase1009
  • 16:29 ppchelko@naos: Finished deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues (duration: 07m 27s)
  • 16:21 ppchelko@naos: Started deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues
  • 15:31 elukey: set tombstone_failure_threshold=1000 to restbase1009-a with P5165 on restbase1009-a - T160759
  • 15:24 elukey: set tombstone_failure_threshold=10000 to restbase1009-a with P5165 on restbase1009-a - T160759
  • 07:45 elukey: deleted /srv/cassandra-a/commitlog/CommitLog-5-1490738321543.log from restbase1009-a (empty commit log file created before OOM - backup in /home/elukey)

2017-04-29

  • 10:50 elukey: set sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 to kafka[1018,1020,1022].eqiad.wmnet (was 120 - maybe related to T136094 ?)
  • 10:39 elukey: start ferm on kafka1020/18 (nodes were previously down for maintenance, not sure why ferm wasn't started)
  • 09:59 reedy@naos: Synchronized wmf-config/CommonSettings.php: Revert pdf processor firejails T164045 (duration: 02m 41s)

2017-04-28

  • 21:24 Dereckson: End of live debug on mwdebug1001, restored previous state with a local scap pull
  • 21:00 ejegg: updated payments-wiki from 1620b82 to 4c56302
  • 20:23 Dereckson: Live debug on mwdebug1001 for T164059
  • 19:30 jynus: shutting down db1063 - I see high temperatures reported, and going up T164107
  • 19:09 urandom: T163936: reenabling puppet on restbase-dev1001
  • 18:14 urandom: T163936: disabling puppet on restbase-dev1001 (t-shooting c-m-c)
  • 17:09 jynus: restarting replication on all nodes on s7-eqiad T164092
  • 16:38 jynus: stopping replication on all nodes on s7-eqiad in case db1062 boots up in a corrupted state
  • 16:36 jynus: restarting db1062 once more T164092
  • 15:56 godog: poweroff prometheus1004 for ram upgrade - T163385
  • 15:40 jynus: deploying new events_coredb_slave.sql on codfw T160984
  • 15:21 godog: poweroff prometheus1003 for ram upgrade - T163385
  • 14:55 gehel: shutting down elastic2020 for mainboard replacement - T149006
  • 14:32 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Change db1063 IP and rack - T163895 (duration: 00m 48s)
  • 14:31 marostegui@naos: Synchronized wmf-config/db-codfw.php: Change db1063 IP and rack - T163895 (duration: 00m 50s)
  • 14:04 marostegui: Stop and shutdown db1063 - T163895
  • 14:04 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Change db1062 rack location - T163895 (duration: 00m 52s)
  • 13:59 moritzm: installing ghostscript security updates
  • 13:56 urandom: T163936: restarting cassandra-metrics-collector, restbase production
  • 13:55 urandom: $ readlink /usr/local/lib/cassandra-metrics-collector/cassandra-metrics-collector.jar
  • 13:50 ema: varnish 4.1.6-1wm1 uploaded to apt.w.o
  • 13:46 urandom: T163936: restarting cassandra-metrics-collector on restbase1007
  • 13:46 marostegui@naos: Synchronized wmf-config/db-codfw.php: Change db1061 IP - T163895 (duration: 01m 00s)
  • 13:44 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Change db1061 IP - T163895 (duration: 01m 19s)
  • 13:44 urandom: T163936: forcing puppet run on restbase1007
  • 13:30 marostegui: Stop MySQL and shutdown db1061 - T163895
  • 13:26 marostegui: Stop MySQL and shutdown db1062 - T163895
  • 10:47 akosiaris: migrate/evacuate ganeti2005, ganeti2006 for T164011
  • 10:42 akosiaris: reboot oresrdb1002 for kernel upgrade
  • 09:56 moritzm: installing libxslt security updates on trusty
  • 09:29 marostegui: upgrade mariadb db1059,db1056 from 10.0.22 to 10.0.28
  • 09:17 marostegui: upgrade mariadb db1071 from 10.0.23 to 10.0.28
  • 09:15 akosiaris: reboot oresrdb1001 for kernel upgrade
  • 09:02 marostegui: Upgrade mariadb on db1081 and db1084 from 10.0.23 to 10.0.28
  • 08:03 Amir1: cleanup done, 4M rows deleted (T159753)
  • 07:58 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1045 - T162539 T163548 (duration: 02m 38s)
  • 06:48 Amir1: cleaning around 5-10M rows in ores_classification in enwiki (half-an-hour script, T159753)
  • 01:18 ejegg: rolled payments-wiki back to 1620b82
  • 01:15 ejegg: udated payments-wiki from 1620b82 to 4c56302

2017-04-27

  • 23:36 catrope@naos: Synchronized php-1.29.0-wmf.21/extensions/SecurePoll/includes/pages/CreatePage.php: Stop gap for fix global election creation (T164043) (duration: 00m 43s)
  • 23:34 catrope@naos: Synchronized wmf-config/InitialiseSettings.php: Enable WikidataPageBanner on viwikivoyage (T163662) (duration: 00m 46s)
  • 23:29 ejegg: rolled back payments-wiki to 1620b82
  • 23:29 catrope@naos: Synchronized wmf-config/InitialiseSettings.php: Enable responsive references on elwiki (T163074) (duration: 00m 49s)
  • 23:27 ejegg: udated payments-wiki from 1620b82 to 4c56302
  • 23:22 catrope@naos: Synchronized wmf-config/InitialiseSettings.php: Set ORES thresholds in new format for all enabled wikis (T162760) (duration: 00m 53s)
  • 23:16 catrope@naos: Synchronized php-1.29.0-wmf.21/includes/deferred/LinksUpdate.php: Release prior row locks beforehand in LinksUpdate::updateCategoryCounts (T163801) (duration: 01m 01s)
  • 23:13 catrope@naos: Synchronized wmf-config/CirrusSearch-common.php: Enable sistersearch title profile for wikivoyage (duration: 01m 19s)
  • 21:57 cwd: updated process-control to 1.0.6
  • 21:56 volans: shutting down gadolinium, it came up 1h25m ago and stole the public IP from meitnerium
  • 21:08 ppchelko@naos: Finished deploy [restbase/deploy@61c1ceb]: Automatically rerender parsoid, only store summaries if they are changed, don't rerender data-parsoid (duration: 07m 16s)
  • 21:01 ppchelko@naos: Started deploy [restbase/deploy@61c1ceb]: Automatically rerender parsoid, only store summaries if they are changed, don't rerender data-parsoid
  • 20:53 ppchelko@naos: Finished deploy [restbase/deploy@fcfc537]: Automatically rerender parsoid, only store summaries if they are changed (duration: 11m 33s)
  • 20:53 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.29.0-wmf.21
  • 20:47 twentyafterfour@naos: Synchronized php-1.29.0-wmf.21/extensions/FlaggedRevs: deploy fix for T163994 (duration: 01m 17s)
  • 20:42 ppchelko@naos: Started deploy [restbase/deploy@fcfc537]: Automatically rerender parsoid, only store summaries if they are changed
  • 20:37 mutante: ocg1001 - has been reinstalled but ocg package deployment fails currently "has the minion key been accepted", should not be repooled just yet
  • 20:32 mutante: ores/cache::misc: switch ores back to codfw-only - everything is like it was before the failed deploy yesterday again
  • 20:21 andrewbogott: stripping a bunch of unneeded extensions from wikitech-static
  • 20:20 mutante: ocg1001 - re-added to puppet, initial run, reinstall ongoing (T161158)
  • 20:18 mutante: ores is active/active now, for a short time
  • 20:16 mutante: ocg1001 - revoke old puppet cert, salt key
  • 20:15 mutante: run puppet on cache::misc to push ores change - cumin -b 5 -s 10 'R:class = role::cache::misc' 'run-puppet-agent -q'
  • 20:03 twentyafterfour: 1.29.0-wmf.21 is blocked by T163994
  • 20:01 mutante: ocg1001 - reboot into PXE, re-install
  • 19:59 twentyafterfour@naos: Synchronized php-1.29.0-wmf.21/extensions/FlaggedRevs/frontend/FlaggedRevsUI.hooks.php: deploy fix for T163994 (duration: 01m 04s)
  • 19:33 twentyafterfour: start mediawiki deployment train group 2 - all wikis to 1.29.0-wmf.21
  • 19:24 reedy@naos: Synchronized wmf-config/CommonSettings.php: Run pdf processors in firejails T164000 (duration: 01m 20s)
  • 19:20 XenoRyet: Updated paymentswiki from ee7d402 to 1620b82
  • 18:47 addshore: Morning SWAT Done!
  • 18:46 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT WMDE Spring campaign - Remove logging (no longer needed) (duration: 00m 47s)
  • 18:44 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT wmgUseGettingStarted true for dewiki (duration: 00m 48s)
  • 18:41 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable Cognate Logging (duration: 00m 48s)
  • 18:40 XenoRyet: Roll back paymentswiki from 030b2f9 to ee7d402
  • 18:34 addshore@naos: Synchronized php-1.29.0-wmf.21/extensions/CirrusSearch: SWAT #1 #2 (duration: 00m 59s)
  • 18:31 addshore@naos: Synchronized wmf-config/CirrusSearch-common.php: SWAT update name of sistersearch profile for wikivoyage (duration: 00m 49s)
  • 18:24 addshore@naos: Synchronized php-1.29.0-wmf.21/extensions/WikimediaEvents/WikimediaEventsHooks.php: SWAT WMDE Spring campaign - Remove hook PT2/2 (duration: 00m 52s)
  • 18:23 urandom: T163936: restarting cassandra-metrics-collector, restbase production
  • 18:22 addshore@naos: Synchronized php-1.29.0-wmf.21/extensions/WikimediaEvents/extension.json: SWAT WMDE Spring campaign - Remove hook PT1/2 (duration: 00m 57s)
  • 18:21 urandom: T163936: restarting cassandra-metrics-collector, restbase staging
  • 18:20 addshore@naos: Synchronized php-1.29.0-wmf.21/includes/api/ApiQueryPagePropNames.php: SWAT Do not add limit to ApiQueryPagePropNames when database type is mysql (duration: 01m 04s)
  • 18:17 twentyafterfour: restarting apache on iridium to hotfix T164005
  • 18:07 addshore@naos: Synchronized wmf-config/Wikibase-production.php: SWAT Fix echoIcon for wikibase in testwikis (duration: 01m 27s)
  • 17:44 XenoRyet: Updated paymentswiki from ee7d402 to 030b2f9
  • 17:36 ladsgroup@naos: Finished deploy [ores/deploy@68cca85]: (no justification provided) (duration: 21m 50s)
  • 17:30 _joe_: started pybal on lvs1006 after network was fixed
  • 17:25 XenoRyet: reverted paymentswiki from 030b2f9 to ee7d402
  • 17:20 XenoRyet: Updated paymentswiki from ee7d402 to 030b2f9
  • 17:15 ladsgroup@naos: Started deploy [ores/deploy@68cca85]: (no justification provided)
  • 17:15 Amir1: ladsgroup@naos:/srv/deployment/ores/deploy$ scap deploy (T163950)
  • 17:12 demon@naos: Pruned MediaWiki: 1.29.0-wmf.18 [keeping static files] (duration: 00m 20s)
  • 17:08 _joe_: stop pybal on lvs1006 to stop announcing via BGP
  • 17:08 demon@naos: Pruned MediaWiki: 1.29.0-wmf.16 (duration: 00m 13s)
  • 17:04 demon@naos: Synchronized scap/plugins/clean.py: One last fix (duration: 01m 04s)
  • 16:53 gehel: unbanning all elasticsearch servers in eqiad row D - T148506
  • 16:48 demon@naos: Synchronized scap/plugins/clean.py: --keep-static is nice now. Also need a co-master sync (duration: 01m 28s)
  • 16:45 andrewbogott: re-enabling labs instance creation/deletion
  • 16:42 demon@naos: Pruned MediaWiki: 1.29.0-wmf.19 [keeping static files] (duration: 00m 15s)
  • 16:32 gehel: unbanning elasticsearch servers in eqiad row D - elastic10(17|18|19|20) - T148506
  • 15:56 elukey: restart of jmxtrans on all the hadoop worker nodes
  • 15:51 andrewbogott: disabling labs instance create/delete to avoid hilarity during network maintenance
  • 15:50 elukey: forced 'service ferm start' on the failed analytics hosts
  • 15:46 marostegui: Upgrade db1091 mariadb from 10.0.23 to 10.0.28
  • 15:39 marostegui: Upgrade db1089 mariadb from 10.0.23 to 10.0.28
  • 15:34 marostegui: Upgrade db1090 mariadb from 10.0.23 to 10.0.28
  • 15:22 jynus: stopping all replication channels on dbstore1001 for topology changes
  • 14:34 ema: upgrade upload-codfw to varnish 4.1.5-1wm4 T145661
  • 14:29 marostegui: Stop MySQL and shutdown es2019 for HW replacement - T149526
  • 14:26 ema: varnish 4.1.5-1wm4 uploaded to apt.w.o T145661
  • 14:08 marostegui: Deploy alter table labswiki.revision on labtestweb2001 - T132416
  • 14:04 marostegui: Deploy alter table labswiki.revision on silver - T132416
  • 13:57 _joe_: restarting HHVM on mw2213, stuck in HPHP::Treadmill::getAgeOldestRequest
  • 13:52 ladsgroup@naos: Synchronized wmf-config/Wikibase-production.php: SWAT: Set echoIcon for notification of wikibase in test wikis (T142102) (duration: 00m 57s)
  • 13:52 Amir1: start of scap sync-file wmf-config/Wikibase-production.php 'SWAT: Set echoIcon for notification of wikibase in test wikis (T142102)'
  • 13:45 ladsgroup@naos: Synchronized portals: (no justification provided) (duration: 01m 05s)
  • 13:44 ladsgroup@naos: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 01m 21s)
  • 13:43 Amir1: ladsgroup@naos:/srv/mediawiki-staging$ portals/sync-portals (T128546)
  • 12:53 volans: disabled puppet on rdb*
  • 12:06 marostegui: Upgrade es1011 and es1014 from mariadb 10.0.22 to mariadb 10.0.28
  • 11:50 marostegui: Upgrade mariadb from 10.0.22 to 10.0.28 on es1015
  • 09:46 moritzm: upgrading mysql on bohrium/piwik
  • 09:25 _joe_: restarting all redis instances for jobqueues on eqiad to force a full resync with masters in codfw T163337
  • 08:55 jynus: deploying alter table to all wikis on s6 T163979
  • 08:54 _joe_: restarting redis rdb1001:6380 after cleaning up the current AOF files for investigation of T163337
  • 08:50 moritzm: installing django security updates
  • 08:29 godog: ms-be1039 issue "controller slot=3 pd 1I:1:5 modify disablepd" to force failed sdc - T163690
  • 08:25 ema: restart varnish-be on cp2024 with expiry thread RT experiment enabled
  • 08:19 ema: upgrade varnish to 4.1.5-1wm3 on cp2024
  • 07:56 elukey: aqs100[69] back serving AQS traffic
  • 07:55 ema: varnish 4.1.5-1wm3 uploaded to apt.w.o T145661
  • 07:16 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool hosts that needed to be moved for the network maintenance - T162681 (duration: 02m 32s)
  • 06:53 marostegui: Reboot es1014 for kernel upgrade - T162029
  • 06:50 elukey: executed kafka preferred-replica-election to rebalance topic leaders in the analytics cluster after maintenance
  • 06:45 marostegui: Reboot es1011 for kernel upgrade - T162029
  • 06:39 marostegui: Logging for the record: drop table hashs from s2, s3 and s7 (only places where it existed) - T54927
  • 06:23 _joe_: moving orphaned objects in ms-be1039's root partition in sdc1/stale_root to save space
  • 06:17 marostegui: Deploy schema change on s7 metawiki.pagelinks to remove partitioning on db1041 - T153300
  • 06:14 marostegui: Deploy alter table on s5 (wikidatawiki) on db1049 - T163548
  • 06:14 marostegui: Deploy alter table on s5 (wikidatawiki) on db1070 (running locally instead of neodymium as this host will be affected by the network maintenance) - T163548
  • 06:11 marostegui: Deploy alter table on s5 (wikidatawiki) on db1070 (running locally instead of neodymium as this host will be affected by the network maintenance) - T130067 T162539
  • 06:09 marostegui: Deploy alter table on s5 (wikidatawiki) on db1049 - T130067 T162539
  • 05:59 marostegui: Deploy alter table labsdb1003 (wikidatawiki) https://phabricator.wikimedia.org/T162539%C2%A0https://phabricator.wikimedia.org/T163548
  • 05:24 Amir1: cleaning some rows in ores_classification in enwiki (T159753)
  • 03:44 ottomata: starting kafka broker on kafka1020
  • 03:40 ottomata: running kafka replica election to bring kafka1018 back as preferred leader
  • 02:21 Jamesofur: running populateEditCount.php in screen on wast for T163854, counting edits for board vote eligibility
  • 02:16 RoanKattouw: Reset 2FA for T163931 on labswiki
  • 00:14 twentyafterfour: starting phabricator update
  • 00:05 ebernhardson@naos: Synchronized php-1.29.0-wmf.21/extensions/CirrusSearch/includes/Searcher.php: cirrus: align sister search boost template config variable with documentation (duration: 00m 50s)

2017-04-26

  • 23:51 niharika29@naos: Synchronized php-1.29.0-wmf.21/includes/interwiki/ClassicInterwikiLookup.php: Interwiki: Dont override interwiki map order (T145337) (duration: 01m 00s)
  • 23:38 niharika29@naos: Synchronized php-1.29.0-wmf.21/extensions/CirrusSearch/: Align other index template boosting config names (duration: 00m 57s)
  • 23:34 niharika29@naos: Synchronized wmf-config/InitialiseSettings.php: Increase max field count for wikidata; Enable Flow beta feature on arwiki (T155720) (duration: 00m 58s)
  • 23:31 niharika29@naos: Synchronized wmf-config/InitialiseSettings.php: Increase max field count for wikidata; Enable Flow beta feature on arwiki (T155720) (duration: 01m 04s)
  • 23:29 niharika29@naos: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] Increase max field count for wikidata (duration: 01m 23s)
  • 21:42 mutante: running puppet on all cache::misc nodes via cumin to switch ORES to eqiad
  • 21:30 mutante: restarting uwsgi-ores service on all scb2* with systemctl restart
  • 21:15 twentyafterfour: finished with mediawiki deployment train for group1. Everything appears stable, no increase in logspam.
  • 21:12 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.21
  • 21:09 halfak@naos: Started restart [ores/deploy@cc12103]: (no justification provided)
  • 21:08 twentyafterfour@naos: Synchronized php-1.29.0-wmf.21/extensions/Flow/Hooks.php: sync https://gerrit.wikimedia.org/r/#/c/350481/ refs T163896 T161733 (duration: 01m 20s)
  • 21:05 arlolra: Updated Parsoid to 4949857a (T116508, T64270, T133673)
  • 20:55 arlolra@naos: Finished deploy [parsoid/deploy@8d109eb]: Updating Parsoid to 4949857a (duration: 06m 52s)
  • 20:48 arlolra@naos: Started deploy [parsoid/deploy@8d109eb]: Updating Parsoid to 4949857a
  • 20:48 twentyafterfour: deploying https://gerrit.wikimedia.org/r/#/c/350481/1 to get the train back on track refs T161733
  • 20:35 bsitzmann@naos: Finished deploy [mobileapps/deploy@b5afcb8]: Update mobileapps to 14bd4a5 (duration: 15m 17s)
  • 20:34 halfak@naos: Finished deploy [ores/deploy@cc12103]: T162892 (duration: 21m 28s)
  • 20:31 elukey: restart zookeeper on conf1003 after network maintenance
  • 20:20 bsitzmann@naos: Started deploy [mobileapps/deploy@b5afcb8]: Update mobileapps to 14bd4a5
  • 20:12 halfak@naos: Started deploy [ores/deploy@cc12103]: T162892
  • 19:50 elukey: restart kafka nodes (kafka1018 and kafka1020) after network maintenance
  • 19:45 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.20
  • 19:42 twentyafterfour: rolling back group1 to wmf.20 due to T163896 refs T161733
  • 19:31 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.21
  • 19:24 twentyafterfour: begin deployment train: group1 wikis to 1.29.0-wmf.21 refs T161733
  • 19:22 bblack: initiating cumin-based restart of all varnish backends for cache_upload in codfw to downgrade from experimental package. 30 minute spacing, 10 hosts, ~5h to completion...
  • 19:17 thcipriani@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable collectionsaveascommunitypage right on es.wikipedia T163767 (duration: 00m 49s)
  • 19:05 bblack: restarting varnish frontend and backend on cp3033 to downgrade
  • 19:03 bblack: restaring varnish-frontend on cp2014 to downgrade
  • 18:58 thcipriani@naos: Synchronized wmf-config/CommonSettings.php: SWAT: Workaround issue of overriding whitelist config variable T163114 (duration: 00m 53s)
  • 18:56 bblack: downgrading varnish back to 4.1.5-wm1 on all -wm2 hosts
  • 18:50 thcipriani@naos: Synchronized php-1.29.0-wmf.21/extensions/CirrusSearch: SWAT: Provide a way to blacklist a set of wikis for crosswiki search T163546 (duration: 01m 02s)
  • 18:44 thcipriani@naos: Synchronized wmf-config/CirrusSearch-common.php: SWAT: Adjust sistersearch against wikivoyage to require title matching T163547 (duration: 01m 11s)
  • 18:38 thcipriani@naos: Synchronized wmf-config/CirrusSearch-common.php: SWAT: Configure multimedia search template boosting T163223 (duration: 00m 53s)
  • 18:30 thcipriani@naos: Synchronized php-1.29.0-wmf.20/extensions/SecurePoll: SWAT: Add voter scripts for board/fdc election 2017 T163854 (duration: 00m 57s)
  • 18:26 thcipriani@naos: Synchronized php-1.29.0-wmf.21/extensions/SecurePoll: SWAT: Add voter scripts for board/fdc election 2017 T163854 (duration: 01m 00s)
  • 18:23 thcipriani@naos: Synchronized dblists/commonsuploads.dblist: SWAT: Enable local uploads on knwiki T133137 (duration: 01m 06s)
  • 18:16 ema: start varnish-frontend on cp2014
  • 18:14 jynus: running alter table on all wikis of s3 T163912
  • 17:49 jynus: rebooting es1019 for upgrading and to fix race condition on services
  • 17:46 elukey: restart nutcracker on the eqiad mw hosts to pick up the new shard config (spamming elasticsearch memcached and triggering alarms)
  • 17:44 elukey: unmasking and starting daemons on restbase-dev1003
  • 17:41 reedy@naos: Synchronized wmf-config/InitialiseSettings.php: touch (duration: 01m 23s)
  • 17:02 mobrovac@naos: Started restart [trending-edits/deploy@7112062]: Restart for ICU lib update
  • 17:01 mobrovac@naos: Started restart [mobileapps/deploy@5c2b9a9]: Restart for ICU lib update
  • 17:00 mobrovac@naos: Started restart [mathoid/deploy@7eb4092]: Restart for ICU lib update
  • 16:43 mobrovac@naos: Started restart [electron-render/deploy@9156760]: Restart for ICU lib update
  • 16:39 mobrovac@naos: Started restart [graphoid/deploy@128206b]: Restart for ICU lib update
  • 16:37 mobrovac@naos: Started restart [eventstreams/deploy@05bcc8f]: Restart for ICU lib update
  • 16:37 mobrovac@naos: Started restart [electron-render/deploy@9156760]: Restart for ICU lib update
  • 16:36 mobrovac@naos: Started restart [cxserver/deploy@6899032]: Restart for ICU lib update
  • 16:34 mobrovac@naos: Started restart [citoid/deploy@b8c4cb2]: Restart for ICU lib update
  • 16:14 elukey: stop and mask cassandra and restbase on restbase-dev1003 for row-d maintenance
  • 16:07 _joe_: disabled and masked strongswan, memcached, redis on mc1013-17 for decommissioning
  • 15:43 XioNoX: VRRP priority removed, interfaces cr2/asw2 renamed - T148506
  • 15:40 _joe_: shutting down conf1003 T148506
  • 15:33 XioNoX: "cr2-eqiad# delete interfaces ae4 disable" done, confirmed links and LACP are up - T148506
  • 15:33 XioNoX: "cr2-eqiad# delete interfaces ae4 disable" done, confirmed links and LACP are up
  • 15:24 marostegui: Shutdown es2019 for maintenance with papaul and Dell - T149526
  • 15:12 XioNoX: switch ports for rack D7 and D8 configured - T148506
  • 14:47 marostegui: Stop MySQL db1070 (just in case) to test drac cold restart
  • 14:47 bblack@neodymium: conftool action : set/pooled=no; selector: dc=eqiad,cluster=cache_upload,name=cp107[1234].eqiad.wmnet
  • 14:26 elukey: depooling aqs100[69] from AQS for network maintenance
  • 14:20 elukey: stop zookeeper on conf1003 for row-d maintenance (Hadoop, Kafka related)
  • 14:04 XioNoX: "cr2-eqiad# set interfaces ae4 disable" done, (1 ping loss) - T148506
  • 14:00 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1026, depool db1045 - T162539 T163548 (duration: 00m 53s)
  • 13:59 XioNoX: lowered VRRP priority for T148506
  • 13:58 andrewbogott: put labservices1001 into downtime to minimize (but probably not totally eliminate) alert spam
  • 13:56 andrewbogott: disabled instance creation on Horizon via https://gerrit.wikimedia.org/r/#/c/350414/ and on wikitech via a strategic edit in extensions/OpenStackManager/special/SpecialNovaInstance.php
  • 13:56 godog: downtime and poweroff ms-be 21 26 27 37 38 39 before switch relocation - T148506
  • 13:54 gehel: downtime "ElasticSearch health check for shards" checks for logstash and elasticsearch eqiad - T148506
  • 13:53 elukey: stop kafka on kafka1020 and kafka1018 for row-d extended maintenance (D2)
  • 13:44 _joe_: shutting down mc1013-18 for row D maintenance
  • 13:40 aude@naos: Synchronized wmf-config/CommonSettings-labs.php: (no justification provided) (duration: 00m 57s)
  • 13:32 aude@naos: Synchronized wmf-config/Wikibase-production.php: disable tabular-data for now on wikidata and enable echo notification on test wikis (duration: 01m 06s)
  • 13:29 marostegui: Deploy alter table on db1069 (wikidatawiki) https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 13:27 marostegui: Deploy alter table labsdb1001 https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 13:23 marostegui: Deploy alter table db1045 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 13:22 elukey: restart HDFS on analytics100[12] (Hadoop master nodes) to pick up recent topology changes for the cluster
  • 13:10 aude@naos: Synchronized wmf-config/throttle.php: (no justification provided) (duration: 01m 23s)
  • 13:02 ema@neodymium: conftool action : set/pooled=yes; selector: name=cp2014.codfw.wmnet,service=varnish-be
  • 13:00 ema: cp2017: restart varnish-be
  • 12:56 marostegui: Shutdown db1092 for maintenance - https://phabricator.wikimedia.org/T162681
  • 12:55 gehel: restart elasticsearch on relforge1001 to validate new config - T161830
  • 12:46 moritzm: installing mysql security updates (5.5 as packaged in Debian jessie)
  • 12:43 ema@neodymium: conftool action : set/pooled=no; selector: name=cp2014.codfw.wmnet,service=varnish-be
  • 11:32 jynus: applying new events_coredb_slave.sql on db2055 T160984
  • 11:31 moritzm: rebooting mwlog2001 for update to Linux 4.9
  • 10:47 ladsgroup@naos: Synchronized wmf-config/Wikibase-labs.php: T142104, part II (duration: 00m 56s)
  • 10:45 ladsgroup@naos: Synchronized static/images/wikibase/echoIcon.svg: T142104, part I (duration: 01m 04s)
  • 10:44 marostegui: Deploy alter table on s5, on db1063 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 10:39 jynus@naos: Synchronized wmf-config/db-eqiad.php: switch s5 eqiad master from db1049 to db1063 (duration: 01m 24s)
  • 09:48 jynus: migrating s5 eqiad replicas under db1063
  • 09:42 jynus: restarting mariadb at db1063
  • 09:24 marostegui: Shutdown db1094, db1093, db1091 for maintenance - T162681
  • 09:16 marostegui: Shutdown es1019 for maintenance - T162681
  • 08:32 elukey: Gracefully stopping hadoop daemons on Hadoop nodes affected by Row-D maintenance
  • 08:30 marostegui: Deploy alter table on change_tag and tag_summary on silver and labtestweb2001 - T147166
  • 08:27 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Depool hosts that need to be moved for the network maintenance - T162681 (duration: 02m 25s)
  • 08:22 moritzm: reimaging terbium to jessie
  • 07:59 jynus: shutting down mariadb on db1040 as a backup before decommissioning
  • 07:48 marostegui: Deploy alter table on s1, on db1052 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 07:30 marostegui: Deploy alter table on s7, on db1062 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 07:24 marostegui: Deploy alter table on s4, on db1068 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 07:09 marostegui: Deploy alter table on s6, on db1061 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 06:56 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1071 - T162539 T163548 (duration: 02m 24s)
  • 06:45 marostegui: Deploy alter table on s2, on db1054 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 06:10 marostegui: Deploy alter table on s3, on db1075 (eqiad master) for tables: change_tag and tag_summary - T147166
  • 05:57 marostegui: Deploy alter table enwiki.revision on labsdb1011 - T132416
  • 00:20 catrope@naos: Synchronized php-1.29.0-wmf.21/extensions/Flow/modules/flow/ui/widgets/mw.flow.ui.ReplyWidget.js: T163749 (duration: 01m 24s)

2017-04-25

  • 22:24 mutante: mediawiki maintenance servers: last log entry was _before_ merging https://gerrit.wikimedia.org/r/#/c/342777/ and making a change
  • 22:23 andrewbogott: re-enabling dns on labservices1001
  • 22:22 mutante: mediawiki maintenance servers: making wasat identical to terbium. wasat is currently the active server running crons. no change there at all. on terbium where crons are inactive, some log files were removed
  • 22:13 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.29.0-wmf.21
  • 22:08 madhuvishy: Reenabled labs instance creation and deletion on horizon
  • 22:05 twentyafterfour@naos: Finished scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #5) (duration: 21m 52s)
  • 22:02 andrewbogott: causing an intentional outage of labs-ns0 and labs-recursor0 to make sure we're properly girded for tomorrow's switch replacement.
  • 21:43 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #5)
  • 21:41 twentyafterfour@naos: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_66989801"/* "/srv/mediawiki-staging/php-1.29.0-wmf.21/cache/l10n"' returned non-zero exit status 1 (duration: 03m 38s)
  • 21:38 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #4)
  • 21:33 twentyafterfour@naos: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_930292683"/* "/srv/mediawiki-staging/php-1.29.0-wmf.21/cache/l10n"' returned non-zero exit status 1 (duration: 03m 46s)
  • 21:30 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #3)
  • 21:23 twentyafterfour@naos: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_2414756836"/* "/srv/mediawiki-staging/php-1.29.0-wmf.21/cache/l10n"' returned non-zero exit status 1 (duration: 00m 54s)
  • 21:23 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #2)
  • 21:09 twentyafterfour@naos: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_3498979833" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 01m 56s)
  • 21:07 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733
  • 20:00 madhuvishy: Labs instance creation and deletion on horizon temporarily disabled via https://gerrit.wikimedia.org/r/350266
  • 19:50 demon@naos: Synchronized wmf-config/CommonSettings-labs.php: no-op, beta change (duration: 01m 58s)
  • 18:55 chasemp: restart nova-fullstack on labnet1001
  • 18:50 chasemp: downtime labservices1001 as we fail away from it and puppet staleness on labservices1002
  • 18:38 andrewbogott: disabling nova-api for another try at labservices failover
  • 18:33 twentyafterfour: Deployment Train: Branching mediawiki wmf/1.29.0-wmf.21 from master refs T161733
  • 17:36 jynus: running test schema change on etwiki on eqiad (depooled) T17441
  • 17:35 RainbowSprinkles: gerrit: Quick reboot to pick up new bouncycastle library
  • 17:25 arlolra: Updated Parsoid to 55b90511 (T153885, T163330, T89262, T154709, T162919, T161306)
  • 17:20 moritzm: rebooting ruthenium for update to Linux 4.9
  • 17:19 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 07s)
  • 17:19 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 17:18 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 05s)
  • 17:18 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 17:18 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 08s)
  • 17:18 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 17:18 arlolra@naos: Finished deploy [parsoid/deploy@719d7bd]: Updating Parsoid to 55b90511 (duration: 08m 02s)
  • 17:17 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 07s)
  • 17:17 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 17:11 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 02m 18s)
  • 17:09 arlolra@naos: Started deploy [parsoid/deploy@719d7bd]: Updating Parsoid to 55b90511
  • 17:08 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 16:54 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 25s)
  • 16:53 godog: flush wikiwix cache from planet2001 and rebuild files
  • 16:53 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 16:53 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 07s)
  • 16:53 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 16:50 andrewbogott: labservices failover aborted due to cryptic routing/firewall issue
  • 16:45 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet,service=apache2
  • 16:44 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config (duration: 00m 20s)
  • 16:44 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config
  • 16:42 godog: flush wikiwix cache from planet1001 and rebuild files
  • 16:41 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet,service=apache2
  • 16:41 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet,service=apache2
  • 16:40 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet,service=apache2
  • 16:38 andrewbogott: stopping nova-api for labservices switchover
  • 16:36 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config (duration: 00m 53s)
  • 16:35 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config
  • 16:29 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config (duration: 00m 04s)
  • 16:29 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config
  • 16:18 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 06s)
  • 16:17 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 16:09 thcipriani@naos: Synchronized README: test new scap version (duration: 01m 03s)
  • 15:59 akosiaris: restart pybal on lvs[2001-2002].codfw.wmnet,lvs[3001-3002].esams.wmnet,lvs[4001-4002].ulsfo.wmnet,lvs[1001-1002].wikimedia.org T159687
  • 15:50 moritzm: installing libav security updates
  • 15:48 bawolff@naos: Synchronized wmf-config/CommonSettings-labs.php: Test account creation limits on labs (duration: 01m 14s)
  • 15:47 akosiaris: restart pybal on lvs2003.codfw.wmnet,lvs3003.esams.wmnet,lvs4003.ulsfo.wmnet,lvs1003.wikimedia.org T159687
  • 15:46 marostegui: Stop replication on db1086 and db1094 in sync - https://phabricator.wikimedia.org/T130067
  • 15:36 mobrovac@naos: Finished deploy [changeprop/deploy@7521b2f]: Bring back the concurrency level - T163292 (duration: 01m 13s)
  • 15:35 mobrovac@naos: Started deploy [changeprop/deploy@7521b2f]: Bring back the concurrency level - T163292
  • 15:33 jynus: stopping replication on dbstore1001 to change its replication topology
  • 15:33 akosiaris: restart pybal on lvs[2004-2006].codfw.wmnet,lvs3004.esams.wmnet,lvs4004.ulsfo.wmnet,lvs[1004-1006].wikimedia.org T159687
  • 15:28 filippo@neodymium: conftool action : set/pooled=yes; selector: name=mw2017.codfw.wmnet
  • 15:27 mobrovac@naos: Finished deploy [changeprop/deploy@e0e3684]: Bring back the concurrency level - T163292 (duration: 00m 10s)
  • 15:26 mobrovac@naos: Started deploy [changeprop/deploy@e0e3684]: Bring back the concurrency level - T163292
  • 15:18 ema: start cache_text upgrade to linux 4.9 T162029
  • 15:14 marostegui: Deploy alter table s7 on watchlist table directly on the master (db1062) - https://phabricator.wikimedia.org/T130067
  • 15:14 filippo@neodymium: conftool action : set/pooled=no; selector: name=mw2017.codfw.wmnet
  • 14:59 jynus@naos: Synchronized wmf-config/db-eqiad.php: switch s7 eqiad master from db1041 to db1062 (duration: 00m 54s)
  • 14:54 bblack: upgrading nginx on cp1008
  • 14:30 bawolff@naos: Synchronized private/PrivateSettings.php: rv change to T163477 to see if it fixes logging (duration: 01m 14s)
  • 14:27 bawolff: Logging has seemed to stop after last deploy to private settings :(
  • 14:20 bblack: uploaded WMF nginx-1.11.10-1+wmf1 packages to jessie-wikimedia repo
  • 14:17 marostegui: Stop replication in sync on db1089 and db1083 for maintenance - https://phabricator.wikimedia.org/T130067
  • 14:08 jynus: restarting mariadb on db1062
  • 14:07 jynus: moving s7 eqiad replicas under db1062
  • 14:02 godog: poweroff ms-be1016 for controller swap - T150206
  • 14:02 bawolff@naos: Synchronized wmf-config/PrivateSettings.php: Hopefully cause previous changes to be picked up try2 (duration: 00m 44s)
  • 13:58 bawolff@naos: Synchronized wmf-config/PrivateSettings.php: Hopefully cause previous changes to be picked up (duration: 00m 44s)
  • 13:51 hashar: European SWAT complete
  • 13:49 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Re-enable ContentTranslation - T163344 (duration: 00m 44s)
  • 13:37 hashar@naos: Synchronized php-1.29.0-wmf.20/includes/media/TransformationalImageHandler.php: media: Capture stderr when running convert --version - T158649 (duration: 00m 47s)
  • 13:35 moritzm: rebooting einsteinium for update to Linux 4.9
  • 13:31 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Fix namespace Wikipedia_talk for zh_classicalwiki - T162547 (duration: 00m 48s)
  • 13:24 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Two namespace aliases for zh_classicalwiki - T162547 (duration: 00m 49s)
  • 13:22 marostegui: Deploy alter table on s3 (only etwiki) for tag_summary and change_tag tables - T147166
  • 13:20 hashar@naos: Synchronized php-1.29.0-wmf.20/includes: Fix bogus field reference in Category::getCountMessage() callback - T162941 (duration: 01m 14s)
  • 13:16 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Add NS aliases for zh_classicalwiki - T162547 (duration: 01m 00s)
  • 13:15 marostegui: Deploy alter table on silver.watchlist and labtestweb2001.labtestwiki for the watchlist table - T130067
  • 13:12 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Add Draft namespace to zh_classicalwiki - T163655 (duration: 01m 19s)
  • 13:10 hashar: zh_classicalwiki : renamed broken page via namespaceDupes.php : id=73504 ns=0 dbk=模板:Protected_logo -> 模板:Protected_logobroken
  • 12:35 marostegui: Stop replication in sync on db1092 and db1087 for maintenance - https://phabricator.wikimedia.org/T130067
  • 11:57 gehel: banning elasticsearch row D node in preparation for maintenance
  • 11:46 marostegui: Deploy alter table s5 on watchlist table directly on the master (db1049) - https://phabricator.wikimedia.org/T130067
  • 11:28 jynus@naos: Synchronized wmf-config/db-eqiad.php: Depool db1022, promote db1061 as the s6 eqiad master (duration: 01m 17s)
  • 11:27 marostegui: Deploy alter table s1 on watchlist table directly on the master (db1052) - https://phabricator.wikimedia.org/T130067
  • 11:01 jynus: switching eqiad s6 master to db1061
  • 10:45 jynus: stopping replication on db1050
  • 10:39 marostegui: Stop replication in sync on db1090 and db1076 for maintenance - https://phabricator.wikimedia.org/T130067
  • 10:15 jynus: restarting db1061's mysql process
  • 10:12 jynus: moving all slaves of s6 eqiad under db1061
  • 09:49 marostegui: Stop replication in sync on db1091 and db1084 for maintenance - T130067
  • 09:46 marostegui: Deploy alter table s2 on watchlist table directly on the master (db1054) - T130067
  • 09:10 jynus@naos: Synchronized wmf-config/db-eqiad.php: Promote db1054 as the new s2 master on eqiad (duration: 01m 19s)
  • 08:56 marostegui: Stop replication on db1088 and db1093 in sync - T130067
  • 08:53 jynus: restarting stopping replication on s2-eqiad and restarting db1054
  • 08:52 marostegui: Deploy alter table s4 commonswiki.watchlist directly on db1068 (eqiad master) - T130067
  • 08:24 marostegui: Stop MySQL db1041 (eqiad master) to reclone db1062 from it - T163665
  • 08:03 jynus: moving all slaves of s2 eqiad under db1054
  • 07:14 ema: upgrade cp3033 varnish-be to varnish 4.1.5-1wm2, expiry thread lock/priority workaround T145661
  • 06:34 marostegui: Deploy alter table on s3, all the wikis to the watchlist table on db1075, eqiad master - T130067
  • 06:10 marostegui@naos: Synchronized wmf-config/db-codfw.php: Restore db2061 original weight (duration: 00m 57s)
  • 06:06 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1071, depool db1026 - T162539 T163548 (duration: 01m 17s)
  • 05:41 marostegui: Deploy alter table enwiki.revision on labsdb1009 and labsdb1010 - T132416
  • 02:22 bawolff: deployed patch for T163477
  • 01:42 MaxSem: Deployed security patches for T163166
  • 00:53 bawolff: unconfirming emails associated with T163477
  • 00:38 mutante: ocg1001 - powercycle into installer, was sitting at partman step with "failure to read from sda"...
  • 00:25 twentyafterfour: restarted apache2 on iridium to tune rate limiting value
  • 00:16 twentyafterfour@naos: Synchronized wmf-config/CommonSettings.php: fix "Notice: Undefined variable: wmgRelatedArticlesFooterWhitelistedSkins" (duration: 01m 11s)

2017-04-24

  • 23:41 twentyafterfour@naos: Synchronized wmf-config/: deploy https://gerrit.wikimedia.org/r/#/c/348472/ refs T163114 (duration: 01m 05s)
  • 23:22 ejegg: updated civicrm from 40d88c0 to 061cd61
  • 23:08 ejegg: updated civicrm from a11c108 to 40d88c0
  • 22:46 bawolff: deploy patch for T155277
  • 21:53 hoo: Updated the sites and site_identifiers tables on all Wikidata clients for dtywiki T161529.
  • 21:41 ejegg: updated civicrm from 51dbbad to a11c108
  • 19:52 mattflaschen@naos: Finished scap: Full scap (due to ORES i18n change earlier), plus additional $wgHiddenPrefs change (duration: 17m 06s)
  • 19:35 mattflaschen@naos: Started scap: Full scap (due to ORES i18n change earlier), plus additional $wgHiddenPrefs change
  • 19:10 bblack: cp2026: restart to wm2 varnish package
  • 18:42 thcipriani@naos: Synchronized wmf-config/throttle.php: SWAT: New throttle rule T163726 (duration: 01m 03s)
  • 18:19 thcipriani@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove defunct $wgForeignUploadTestEnabled for cross-wiki upload A/B test (duration: 00m 53s)
  • 18:18 jynus: disabling mysql replication eqiad -> codfw on s[1-7] and x1 shards T155099
  • 18:10 thcipriani@naos: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Full path to xvfb-run (beta only change) (duration: 01m 07s)
  • 17:53 marostegui@naos: Synchronized wmf-config/db-codfw.php: Increase db2061 weight (duration: 00m 47s)
  • 17:46 marostegui: Alter table labtestwiki.user_groups on labtestweb2001 - T155605
  • 17:43 bblack: installing varnish 4.1.5-1wm2 on all cache_upload hosts @ codfw (no restarts)
  • 17:41 marostegui@naos: Synchronized wmf-config/db-codfw.php: Increase db2043 and db2061 weight (duration: 00m 49s)
  • 17:36 demon@naos: Synchronized dblists/group0.dblist: moving labstestwiki to group0 (duration: 00m 54s)
  • 17:35 bblack: upgrade cp2024 varnish-be to varnish 4.1.5-1wm2, expiry thread lock/priority workaround T145661
  • 17:28 marostegui@naos: Synchronized wmf-config/db-codfw.php: Increase db2043 and db2061 weight - T163339 (duration: 00m 58s)
  • 17:19 gehel: restarting wdqs-updater for new configuration
  • 17:10 gehel@naos: Finished deploy [wdqs/wdqs@481346a]: (no justification provided) (duration: 01m 47s)
  • 17:08 gehel@naos: Started deploy [wdqs/wdqs@481346a]: (no justification provided)
  • 16:58 marostegui@naos: Synchronized wmf-config/db-codfw.php: Repool db2043 and db2061 with less weight - T163339 (duration: 01m 16s)
  • 16:56 godog: poweroff prometheus2004 for memory upgrade - T163386
  • 16:11 ema: upgrade cp2017 varnish-be to varnish 4.1.5-1wm2, expiry thread lock/priority workaround T145661
  • 15:44 jynus: stopping all slaves on dbstore1001 for maintenance
  • 15:44 godog: poweroff prometheus2003 for memory upgrade - T163386
  • 15:28 mattflaschen@naos: Synchronized wmf-config/CommonSettings.php: T163696: Only copy filter thresholds if they are set (duration: 01m 10s)
  • 15:10 matt_flaschen: GuidedTour/RCFilters/ORES deployment complete and tested
  • 15:09 XioNoX: disabling the bgp session between pfw-codfw and cr2 for T163447
  • 15:07 ema: varnish 4.1.5-1wm2 uploaded to apt.w.o T145661
  • 15:06 matt_flaschen: Preference updates (for ORES on enwiki) done, using naos instead of terbium
  • 14:54 mattflaschen@naos: Synchronized php-1.29.0-wmf.20/extensions/ORES: Make the preference for the "r" flag on the RC page also control highlighting (duration: 00m 48s)
  • 14:50 mattflaschen@naos: Synchronized wmf-config/: Release RC Filters on more wikis and prep changes for that (duration: 00m 53s)
  • 14:39 matt_flaschen: Deployment of T152827 ("Enable GuidedTour on all wikis") complete and tested
  • 14:38 Dereckson: Created linter table on ptwikimedia and dtywiki
  • 14:34 mattflaschen@naos: Synchronized wmf-config/InitialiseSettings.php: Enable GuidedTour on all wikis (duration: 00m 59s)
  • 14:27 marostegui: Deploy alter table on s3 etwiki on watchlist table directly on the master (db1075) - T130067
  • 14:17 marostegui: Stop MySQL db2043 and db2061 for maintenance - https://phabricator.wikimedia.org/T163339
  • 14:14 marostegui@naos: Synchronized wmf-config/db-codfw.php: Depool db2043 and db2061 - T163339 (duration: 01m 08s)
  • 14:14 moritzm: rebooting ms1001 for kernel update to Linux 4.9
  • 14:10 hashar@naos: Finished scap: Full scap for namespaces related changes (T161529 and https://gerrit.wikimedia.org/r/#/c/349864/1) (duration: 16m 06s)
  • 14:09 ema@neodymium: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 14:08 ema: re-pooling cp2002's varnish-be with increased priority for expiry thread T145661
  • 13:57 ema@neodymium: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 13:54 hashar@naos: Started scap: Full scap for namespaces related changes (T161529 and https://gerrit.wikimedia.org/r/#/c/349864/1)
  • 13:50 addshore: Initial run of populateCognatePages.php complete. 27,595,121 rows in cognate_pages & 17,263,411 in cognate_titles
  • 13:49 godog: swift eqiad-prod: more weight on ms-be1028 -> ms-be1039 - T160640
  • 13:47 elukey: reimage analytics1003 to Jessie (Oozie/Hive/Camus not available during this timeframe in the Analytics Hadoop cluster)
  • 13:47 marostegui: Deploy unscheduled alter table on silver (labswiki.user_groups) - T159416
  • 13:26 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Enable user group expiry in production - T159416 (duration: 00m 49s)
  • 13:16 marostegui: Remove replication codfw - eqiad on s3 (db2018 codfw master will not be a slave of eqiad master) - https://phabricator.wikimedia.org/T130067 https://phabricator.wikimedia.org/T147166 T162133
  • 13:14 hashar@naos: Synchronized php-1.29.0-wmf.20/extensions/ProofreadPage/ProofreadPage.namespaces.php: Fix language code for Norwegian (duration: 00m 54s)
  • 13:12 marostegui: Deploy alter table on wikidatawiki.wb_terms on db1082 - T162539 - T163548
  • 13:11 marostegui: Deploy alter table on wikidatawiki.wb_terms on db1063 - T162539 https://phabricator.wikimedia.org/T163548
  • 13:10 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Make sysops able to grant/remove confirmed user group at cswiki - T163206 (duration: 00m 55s)
  • 13:09 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Raise autoconfirmed status requirements to 4 days, 10 edits at cswiki - T163207 (duration: 01m 09s)
  • 13:06 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Set timezone to Asia/Kolkata on wb.wikimedia - T163322 (duration: 00m 44s)
  • 13:05 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Remove all feeds added in T127176 from RSS whitelist for mw.org - T163217 (duration: 00m 45s)
  • 13:03 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Enable NewUserMessage on zh_classicalwiki - T163043 (duration: 00m 46s)
  • 12:52 aude@naos: Synchronized wmf-config/Wikibase-production.php: Disable use of new column in wb_terms table for now (duration: 00m 48s)
  • 12:46 aude@naos: Synchronized wmf-config/Wikibase-production.php: (no justification provided) (duration: 00m 47s)
  • 12:41 Dereckson: pt.wikimedia.org and dty.wikipedia.org wikis creation done
  • 12:38 dereckson@naos: Synchronized wmf-config/interwiki.php: +dty +wmpt and other fixes (duration: 00m 48s)
  • 12:28 Dereckson: mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php dtywiki --backend=local-multiwrite (T162874)
  • 12:14 dereckson@naos: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for dty.wikipedia (T161529) (duration: 00m 49s)
  • 12:13 dereckson@naos: Synchronized langlist: +dty (T161529) (duration: 00m 50s)
  • 12:09 dereckson@naos: rebuilt wikiversions.php and synchronized wikiversions files: +dtywiki
  • 12:08 Dereckson: Creata dtywiki database (T161529)
  • 12:08 dereckson@naos: Synchronized dblists: +dtywiki (duration: 00m 56s)
  • 12:07 dereckson@naos: Synchronized static/images/project-logos/: Logo for dty.wikipedia (T161529) (duration: 01m 13s)
  • 11:59 Dereckson: Purged https://pt.wikimedia.org/ URL (T126832)
  • 11:55 dereckson@naos: Synchronized multiversion/MWMultiVersion.php: Entry point for pt.wikimedia.org (T126832) (duration: 00m 44s)
  • 11:50 Dereckson: mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php ptwikimedia --backend=local-multiwrite (T126832)
  • 11:48 dereckson@naos: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for pt.wikimedia (T126832)
  • 11:42 dereckson@naos: rebuilt wikiversions.php and synchronized wikiversions files: +pt.wikimedia (T126832)
  • 11:42 dereckson@naos: Synchronized dblists/: Respawn pt.wikimedia configuration (duration: 00m 44s)
  • 11:41 Dereckson: Recreate database for ptwikimedia (T126832)
  • 11:28 dereckson@naos: Synchronized php-1.29.0-wmf.20/languages/messages/MessagesDty.php: Localize namespaces in Doteli (T162872) (duration: 00m 50s)
  • 11:27 dereckson@naos: Synchronized php-1.29.0-wmf.20/extensions/Gadgets/Gadgets.namespaces.php: Localize namespaces in Doteli (T162873) (duration: 00m 44s)
  • 11:26 dereckson@naos: Synchronized php-1.29.0-wmf.20/extensions/Scribunto/Scribunto.namespaces.php: Localize namespaces in Doteli (T162874) (duration: 00m 46s)
  • 11:16 addshore: addshore@wasat:~$ mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist --batch-size=1000
  • 11:14 addshore@naos: Synchronized wmf-config/InitialiseSettings-labs.php: Deploy Cognate to production wiktionaries T150182 PT 4/4 (duration: 00m 47s)
  • 11:12 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: Deploy Cognate to production wiktionaries T150182 PT 3/4 (touched) (duration: 00m 52s)
  • 11:02 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: Deploy Cognate to production wiktionaries T150182 PT 3/4 (duration: 00m 57s)
  • 11:01 addshore@naos: Synchronized wmf-config/CommonSettings-labs.php: Deploy Cognate to production wiktionaries T150182 PT 2/4 (duration: 01m 01s)
  • 10:57 addshore@naos: Synchronized wmf-config/CommonSettings.php: Deploy Cognate to production wiktionaries T150182 PT 1/4 (duration: 01m 18s)
  • 10:28 addshore: addshore@wasat:~$ mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist
  • 10:27 addshore: 180 rows added to cognate_titles & cognate_pages
  • 10:25 addshore: addshore@wasat:~$ mwscript extensions/Cognate/maintenance/populateCognatePages.php zawiktionary
  • 10:25 addshore: 172 sites added to cognate_sites
  • 10:24 addshore: addshore@wasat:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php enwiktionary --site-group=wiktionary
  • 10:16 addshore@naos: Finished scap: Add Cognate to extension-list T150182 (duration: 15m 26s)
  • 10:01 addshore@naos: Started scap: Add Cognate to extension-list T150182
  • 10:00 jynus: disabling puppet on app servers for apache config deploy T126832
  • 09:56 addshore@naos: Synchronized wmf-config/InitialiseSettings-labs.php: wmgUseInterwikiSorting true for wiktionaries PT 2/2 (duration: 00m 46s)
  • 09:54 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: wmgUseInterwikiSorting true for wiktionaries PT 1/2 (duration: 00m 47s)
  • 09:51 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: Configure InterwikiSorting orders for Wiktionaries PT 2/2 (duration: 00m 48s)
  • 09:50 addshore@naos: Synchronized wmf-config/InterwikiSortOrders.php: Configure InterwikiSorting orders for Wiktionaries PT 1/2 (duration: 00m 53s)
  • 09:49 jynus: testing mediawiki changes on mwdebug1001
  • 09:44 addshore@naos: Synchronized docroot/noc/conf/InterwikiSortOrders.php.txt: NOOP Add InterwikiSortOrders to noc docroot (docs only) (duration: 01m 00s)
  • 09:42 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: Use group0 to reduce lines for WMDE related config settings (duration: 01m 18s)
  • 09:15 marostegui: Stop MYSQL on db1062 to backup its mysql - T163665
  • 09:14 jynus: dropping ptwikimedia from es1012,es1016,es1018,es2011,es2012,es2013, T126832
  • 09:11 jynus: dropping ptwikimedia from es3 T126832
  • 09:08 jynus: dropping ptwikimedia from es2 T126832
  • 09:04 jynus: dropping ptwikimedia from x1 T126832
  • 08:55 jynus: dropping ptwikimedia from s3 T126832
  • 08:03 marostegui: Deploy alter table enwiki.revision on db1095 (sanitarium2) - T132416
  • 07:34 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1080 and db1067 (duration: 01m 18s)
  • 06:23 marostegui: Deploy alter table enwiki.revision db1052 (eqiad master) - T132416
  • 06:12 marostegui: Deploy alter table on wikidatawiki.wb_terms on db1087 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 06:12 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1092, depoll db1087 - T162539 T163548 (duration: 02m 19s)

2017-04-23

  • 19:13 ema: cp2020: restart varnish-be
  • 17:49 jynus: disabling puppet on db2062 and upgrading MariaDB package to 10.1 T116557
  • 03:12 andrewbogott: removing files in /srv/deployment/ocg/postmortem on ocg1003, another case of T162780

2017-04-22

  • 13:41 ema@neodymium: conftool action : set/pooled=no; selector: name=cp2024.codfw.wmnet,service=varnish-be
  • 07:53 jynus: restarting es2019.codfw.wmnet after upgrade
  • 07:43 jynus: powercycling es2019.codfw.wmnet, unresponsive
  • 07:21 jynus@naos: Synchronized wmf-config/db-codfw.php: Depool es2019 (duration: 02m 16s)
  • 03:21 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2024.codfw.wmnet,service=varnish-be
  • 02:56 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2024.codfw.wmnet,service=varnish-be
  • 02:18 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 00:34 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-be

2017-04-21

  • 23:52 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2026.codfw.wmnet,service=varnish-be
  • 22:49 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2026.codfw.wmnet,service=varnish-be
  • 15:06 marostegui@naos: Synchronized wmf-config/db-codfw.php: Increase weight db2071 (duration: 01m 17s)
  • 14:32 marostegui: Analyze revision, logging and page table on s1 db1067 - https://phabricator.wikimedia.org/T116557
  • 14:26 ema: ban objects with CT < 1024 on codfw cache_upload T145661
  • 14:00 moritzm: installing postgresql bugfix update from jessie point release on labsdb1004
  • 13:35 marostegui: Deploy alter table on wikidatawiki.wb_terms on db1092 - T162539 T163548
  • 13:20 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Depool db1092 - T162539 T163548 (duration: 01m 18s)
  • 12:51 akosiaris: reboot puppetmaster1002 for kernel upgrade
  • 12:07 marostegui: Analyze revision, logging and page table on s1 db1080 - T116557
  • 12:07 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Update db1080 depool reason (duration: 01m 18s)
  • 10:35 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1071 - T163109 (duration: 01m 20s)
  • 09:20 moritzm: rebooting etherpad1001 (running etherpad.wikimedia.org) for update to Linux 4.9
  • 09:10 jynus: stopping and upgrading/reconfiguring db2062 (depooled) T116557
  • 08:49 jynus@naos: Synchronized wmf-config/db-codfw.php: Depool db2062 (duration: 01m 20s)
  • 08:32 akosiaris: looking at tcpircbot (logmsgbot) problems at tegmen
  • 08:20 elukey: rolling restart of aqs (nodejs) on aqs* to pick up upgrades
  • 08:01 moritzm: rolling restart of hhvm on application servers in eqiad to pick up ICU security update
  • 07:47 marostegui: Stop MySQL on db1071 and db1063 to reclone db1063 - T163109
  • 07:43 moritzm: installing further icu security updates
  • 06:21 marostegui: Restart MySQL on db1065 for maintenance - T163351
  • 06:09 marostegui: Deploy alter table enwiki.revision db1067 - T132416

2017-04-20

  • 22:28 twentyafterfour: enable rate limiting in phabricator
  • 22:17 paravoid: setting tw_reuse to 1 on dbproxy1003
  • 21:47 twentyafterfour: started phd on iridium
  • 21:31 twentyafterfour: stopped phd on iridium to reduce load on the database
  • 19:26 Amir1: deploy finished
  • 19:24 Amir1: start of ladsgroup@naos:/srv/mediawiki-staging/php-1.29.0-wmf.20$ scap sync-file php-1.29.0-wmf.20/extensions/ORES/includes/Hooks.php 'Disable ORES in Recentchangeslinked (T163063)'
  • 19:15 mutante: test logging in fundraising channel
  • 19:06 mutante: fixing duplicate ircecho situation - since today it should run from tegmen, the active icinga server
  • 17:51 mutante: restarted icinga-wm (ircecho) to pick up config change
  • 17:13 jynus: stopping replication on db1040
  • 17:09 andrewbogott: disabling puppet on serpens, seaborgium, pollux, dubnium, labservices1001, labservices1002 for tentative rollout of https://gerrit.wikimedia.org/r/#/c/348920/
  • 16:58 jynus: moving GTID s4 eqiad replicas under db1068
  • 16:46 ema: repool varnish-be on cp2017
  • 16:18 ema: depool varnish-be on cp2017
  • 16:08 elukey: uploaded piwik 2.17.1-1 to jessie-wikimedia main
  • 15:17 Amir1: deleting duplicate rows in ores_classification dated after revision 775502802 (dated April 15th) (T163337)
  • 15:16 XioNoX: disabling pybal on lvs2002 for T163323
  • 14:32 moritzm: upgrading tor on radium to 0.2.9.10
  • 14:23 moritzm: rebooting radium (tor relay) for kernel update to Linux 4.9
  • 14:09 moritzm: rebooting osmium for kernel update to Linux 4.9
  • 14:06 gehel: rolling restart of kartotherian / tilerator on maps codfw cluster
  • 13:58 gehel: rolling restart of kartotherian / tilerator on maps eqiad cluster
  • 13:58 marostegui: Stop MySQL on db1068 and db1081 for maintenance - T163110
  • 13:57 jynus: running reset slave all on db2019
  • 13:53 gehel: rolling restart of kartotherian / tilerator on maps-test cluster
  • 13:18 moritzm: restarting hhvm on mw2097/2098 to pick up icu security update
  • 13:11 elukey: upgrading Piwik to 2.17.1 (brief downtime during the maintenance announced)
  • 12:12 elukey: restart Yarn Resource manager on analytics1001 (hadoop master) to pick up new JVM settings
  • 12:11 moritzm: installing icu security updates
  • 11:32 _joe_: removing hack for jobqueue's refreshlinks T163418 from the jobrunners
  • 11:23 jynus: changing db2071 to replicate from db2016
  • 10:32 moritzm: installing remaining dbus updates from jessie point update
  • 10:07 elukey: restart Yarn Resource manager on analytics1002 (hadoop master standby) to pick up new JVM settings
  • 09:47 Amir1: running the cleanup script for ores_classification in enwiki
  • 09:38 _joe_: live-hack redeployed, running scap pull on codfw jobrunners T163418
  • 09:38 _joe_: live-hack redeployed, running scap pull on codfw jobrunners
  • 09:34 hashar@naos: Synchronized rpc/RunJobs.php: Revert "rpc: raise exception instead of die" - causes monitoring spam (duration: 01m 20s)
  • 09:17 _joe_: removed the live hack, running scap pull again on mw2154
  • 09:14 _joe_: scap pull of live hack for T163418 on mw2154
  • 08:47 _joe_: live-patching ./includes/jobqueue/jobs/RefreshLinksJob.php to drop all recursive jobs, T163418
  • 07:59 jynus: shutting down db1080 for cloning and upgrade T163413
  • 07:54 jynus@naos: Synchronized wmf-config/db-codfw.php: Add db2071, depooled (duration: 00m 53s)
  • 07:53 jynus@naos: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 01m 02s)
  • 07:53 marostegui: Deploy alter table enwiki.revision db1065 - https://phabricator.wikimedia.org/T132416
  • 07:31 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Depool db1065 - T132416 (duration: 02m 18s)
  • 07:12 marostegui: Deploy alter table on s4.image on eqiad master db1040 (this will create lag on eqiad - all hosts have been silenced) - https://phabricator.wikimedia.org/T73563
  • 06:39 marostegui: Deploy alter table on s4.oldimage on eqiad master db1040 (this will create lag on eqiad - all hosts have been silenced) - T73563
  • 01:37 mutante: mw2150 - restarted hhvm (had 'thread leakage' alert)
  • 01:28 mutante: ran puppet on all (16) Dell R320 via cumin to add CPU frequency check
  • 00:37 ejegg: updated CiviCRM from 90d679b to 51dbbad

2017-04-19

  • 23:58 ejegg: updated payments-wiki from ccfbf98 to ee7d402
  • 22:37 papaul: OS installation on db2071
  • 21:44 ejegg: updated SmashPig from 17c56b0 to 200f63e
  • 21:37 krinkle@naos: Synchronized php-1.29.0-wmf.20/resources/src/startup.js: I34bbe8edf - Fix js fatal (duration: 01m 20s)
  • 20:08 ejegg: updated payments-wiki from 5398b23 to ccfbf98
  • 19:22 krinkle@naos: Synchronized php-1.29.0-wmf.20/resources/src/mediawiki/mediawiki.js: Ie50bdd (duration: 00m 58s)
  • 19:20 krinkle@naos: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/extension.json: T162604 (duration: 01m 20s)
  • 19:17 XenoRyet: Updated SmashPig from 3db064d to 17c56b0
  • 18:58 ejegg: rolled back payments-wiki to 5398b23
  • 18:56 ejegg: updated payments-wiki from 5398b23 to 68e3ac6
  • 18:27 ariel@naos: Finished deploy [dumps/dumps@ad621e6]: doc fixes thanks to awight (duration: 00m 04s)
  • 18:27 ariel@naos: Started deploy [dumps/dumps@ad621e6]: doc fixes thanks to awight
  • 18:25 ejegg: updated payments-wiki from 36f38f6 to 5398b23
  • 18:19 mobrovac: restbase stopping RB and disabling puppet on restbase1018 due to T163292
  • 18:18 ariel@naos: Finished deploy [dumps/dumps@101f8a4]: page range fixes and standalone scripts (duration: 00m 18s)
  • 18:18 ariel@naos: Started deploy [dumps/dumps@101f8a4]: page range fixes and standalone scripts
  • 17:27 Amir1: mwscript extensions/ORES/maintenance/CleanDuplicateScores.php on all wikis with ORES review tool enabled (T163337)
  • 17:26 thcipriani@naos: Synchronized docroot/noc/index.html: test scap on naos.codfw.wmnetdocroot/noc/index.html: trailing whitespace (duration: 02m 02s)
  • 17:25 mobrovac@naos: Started restart [restbase/deploy@1bfada4]: Restart to stop trying to connect to dead restbase1018 Cassandra instances - T163292
  • 17:08 thcipriani@naos.codfw.wmnet: test
  • 17:03 filippo@naos: Finished deploy [prometheus/jmx_exporter@7327459]: test deploy from naos (duration: 00m 03s)
  • 17:03 filippo@naos: Started deploy [prometheus/jmx_exporter@7327459]: test deploy from naos
  • 17:02 godog: bounce tcpircbot on einsteinium to pick up changes
  • 17:02 _joe_: running manally enwiki refreshLinks jobs to catch up a bit
  • 16:59 papaul: power balancing on mw2215
  • 16:58 Amir1: ladsgroup@naos:~$ mwscript extensions/ORES/maintenance/CleanDuplicateScores.php --wiki=enwiki froze
  • 16:49 Amir1: ladsgroup@naos:~$ mwscript extensions/ORES/maintenance/CleanDuplicateScores.php --wiki=enwiki (T163337)
  • 16:33 godog: deploy.fixurl on G@deployment_target:* after deployment server switchover
  • 16:20 gehel: disabling deprecation warning logs on elasticsearch eqiad - T163345
  • 16:19 jynus: setting db2033 as read write
  • 16:13 godog: run puppet on naos.codfw.wmnet - new deployment server
  • 16:03 gehel: disabling deprecation warning logs on elasticsearch codfw - T163345
  • 15:51 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=elasticsearch,name=elastic2020.*
  • 15:49 jynus: shutting down db2033 (x1-master)
  • 15:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=appserver,name=mw2256.*
  • 15:48 jynus@tin: Synchronized wmf-config/db-codfw.php: Failing over x1-master (duration: 00m 41s)
  • 15:46 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic2020.codfw.wmnet
  • 15:42 jynus@tin: Synchronized wmf-config/InitialiseSettings.php: Disable cx_translation- it is causing an outage on x1 (duration: 02m 44s)
  • 15:40 dzahn@puppetmaster2001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet
  • 15:32 mutante: mw2256 went down and showed " PANIC: double fault, error_code: 0x0"
  • 15:16 jynus@tin: Synchronized wmf-config/db-codfw.php: Pool db2055 as an additional API server (duration: 01m 02s)
  • 15:11 _joe_: ran cumin 'R:class = role::mediawiki::jobrunner and *.eqiad.wmnet' 'systemctl reset-failed' manually
  • 15:07 godog: start swiftrepl on ms-fe1005 for codfw switchover
  • 15:04 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_restart_parsoid(eqiad, codfw) Successfully completed
  • 14:53 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2256.codfw.wmnet,service=apache2
  • 14:53 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2256.codfw.wmnet,service=nginx
  • 14:48 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet,service=nginx
  • 14:48 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet,service=apache2
  • 14:46 gehel: banning elastic2020 from codfw cluster - T149006
  • 14:46 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_restart_parsoid(eqiad, codfw) Rolling restart parsoid in eqiad and codfw
  • 14:44 oblivian@tin: Synchronized wmf-config/ProductionServices.php: Fix redis locks (duration: 02m 24s)
  • 14:41 akosiaris: powercycle mw2256
  • 14:33 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_tendril(eqiad, codfw) Successfully completed
  • 14:33 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_tendril(eqiad, codfw) Update Tendril configuration for the new masters
  • 14:33 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_start_maintenance(eqiad, codfw) Successfully completed
  • 14:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_start_maintenance(eqiad, codfw) Start MediaWiki maintenance in the new master DC
  • 14:31 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_restore_ttl(eqiad, codfw) Successfully completed
  • 14:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_restore_ttl(eqiad, codfw) Restore the TTL of all the MediaWiki discovery records
  • 14:30 switchdc: (volans@sarin) END TASK - switchdc.stages.t08_stop_mediawiki_readonly(eqiad, codfw) Successfully completed
  • 14:30 switchdc: (volans@sarin) MediaWiki read-only period ends at: 2017-04-19 14:30:05.678665
  • 14:30 root@tin: Synchronized wmf-config/db-codfw.php: Set MediaWiki in read-write mode in datacenter codfw (duration: 00m 18s)
  • 14:29 switchdc: (volans@sarin) START TASK - switchdc.stages.t08_stop_mediawiki_readonly(eqiad, codfw) Set MediaWiki in read-write mode (db_to config already merged and git pulled)
  • 14:28 switchdc: (volans@sarin) END TASK - switchdc.stages.t07_coredb_masters_readwrite(eqiad, codfw) Successfully completed
  • 14:28 switchdc: (volans@sarin) START TASK - switchdc.stages.t07_coredb_masters_readwrite(eqiad, codfw) set core DB masters in read-write mode
  • 14:25 switchdc: (volans@sarin) END TASK - switchdc.stages.t06_redis(eqiad, codfw) Successfully completed
  • 14:25 switchdc: (volans@sarin) START TASK - switchdc.stages.t06_redis(eqiad, codfw) Switch the Redis replication
  • 14:25 switchdc: (volans@sarin) END TASK - switchdc.stages.t05_switch_traffic(eqiad, codfw) Successfully completed
  • 14:22 switchdc: (volans@sarin) START TASK - switchdc.stages.t05_switch_traffic(eqiad, codfw) Switch traffic flow to the appservers in the new datacenter
  • 14:22 switchdc: (volans@sarin) END TASK - switchdc.stages.t05_switch_datacenter(eqiad, codfw) Successfully completed
  • 14:22 root@tin: Synchronized wmf-config/CommonSettings.php: Switch MediaWiki active datacenter to codfw (duration: 00m 19s)
  • 14:21 switchdc: (volans@sarin) START TASK - switchdc.stages.t05_switch_datacenter(eqiad, codfw) Switch MediaWiki configuration to the new datacenter
  • 14:21 switchdc: (volans@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
  • 14:15 switchdc: (volans@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 14:15 switchdc: (volans@sarin) END TASK - switchdc.stages.t03_coredb_masters_readonly(eqiad, codfw) Successfully completed
  • 14:15 switchdc: (volans@sarin) START TASK - switchdc.stages.t03_coredb_masters_readonly(eqiad, codfw) set core DB masters in read-only mode
  • 14:14 switchdc: (volans@sarin) END TASK - switchdc.stages.t02_start_mediawiki_readonly(eqiad, codfw) Successfully completed
  • 14:14 root@tin: Synchronized wmf-config/db-eqiad.php: Set MediaWiki in read-only mode in datacenter eqiad (duration: 01m 29s)
  • 14:13 switchdc: (volans@sarin) MediaWiki read-only period starts at: 2017-04-19 14:12:54.007017
  • 14:12 switchdc: (volans@sarin) START TASK - switchdc.stages.t02_start_mediawiki_readonly(eqiad, codfw) Set MediaWiki in read-only mode (db_from config already merged and git pulled)
  • 14:09 switchdc: (volans@sarin) END TASK - switchdc.stages.t01_stop_maintenance(eqiad, codfw) Successfully completed
  • 14:07 switchdc: (volans@sarin) START TASK - switchdc.stages.t01_stop_maintenance(eqiad, codfw) Stop MediaWiki maintenance in the old master DC
  • 14:06 godog: stop swiftrepl on ms-fe1005 for codfw switchover
  • 14:06 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_reduce_ttl(eqiad, codfw) Successfully completed
  • 14:06 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_reduce_ttl(eqiad, codfw) Reduce the TTL of all the MediaWiki discovery records
  • 14:06 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Successfully completed
  • 14:05 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Disabling puppet on selected hosts
  • 14:00 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2014.codfw.wmnet,service=varnish-be
  • 13:42 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2014.codfw.wmnet,service=varnish-be
  • 13:28 urandom: cqlsh -f /etc/cassandra/adduser.cql, recreating user/perms (as-needed)
  • 12:38 urandom: T163292: Starting removal of Cassandra instance restbase1018-c.eqiad.wmnet
  • 11:36 oblivian:: Setting swift-rw in eqiad DOWN
  • 11:36 oblivian:: Setting swift-rw in codfw UP
  • 11:36 ema: repool varnish-be on cp3044
  • 11:23 godog: add naos to git-deploy term on common-infrastructure4 - T162900
  • 11:03 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
  • 10:57 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 10:56 _joe_: running the warmup stage in codfw for final testing
  • 10:41 ema: depool varnish-be on cp3044 because of mailbox lag issues
  • 09:34 moritzm: installing dbus security updates
  • 09:11 elukey: cleaning up ocg1003's /srv/deployment/ocg/postmortem dir (root partition filled up)
  • 07:26 hoo: Updated the sites and site_identifiers tables on all Wikidata clients for T149522.
  • 06:57 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t06_redis(codfw, eqiad) Successfully completed
  • 06:56 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t06_redis(codfw, eqiad) Switch the Redis replication
  • 06:52 _joe_: artificially stopping slave replication on rdb2001 for a final test of the switchover redis stage
  • 03:53 urandom: T163292: Starting removal of Cassandra instance restbase1018-b.eqiad.wmnet
  • 03:49 mobrovac@tin: Started restart [restbase/deploy@1bfada4]: (no justification provided)
  • 03:40 mobrovac@tin: Started restart [restbase/deploy@1bfada4]: Kick RB to pick up restbase1018 instances are gone
  • 03:32 mobrovac@tin: Finished deploy [changeprop/deploy@a19ebf8]: Temp: Decrease the transclusion update from 400 to 200 for T163292 (duration: 00m 53s)
  • 03:31 mobrovac@tin: Started deploy [changeprop/deploy@a19ebf8]: Temp: Decrease the transclusion update from 400 to 200 for T163292
  • 01:58 mutante: naos: rsyncd is of course legitimately running on a deployment server sepearate from this (unlike in other cases where we used it for syncing during migration), so this was just the one config fragment for /home and not removing the service or anything
  • 01:56 mutante: naos: manually deleting rsyncd config remnants (puppet wouldn't know to clean up after itself)
  • 01:47 mutante: rsyncing /home from mira to naos (T162900)
  • 01:21 urandom: T163292: Starting removal of Cassandra instance restbase1018-a.eqiad.wmnet

2017-04-18

  • 23:04 dzahn@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1018.eqiad.wmnet
  • 23:02 mutante: ms1001 - deleting old GlobalCert SSL cert for dumps.wm that was about to expire and is replaced by Letsencrypt,
  • 22:30 mutante: ocg1003 gzipping ocg.log for disk space
  • 21:12 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 20:36 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 17:26 mobrovac@tin: Finished deploy [restbase/deploy@1bfada4]: Blacklist all user pages on commons (duration: 07m 12s)
  • 17:26 ssastry@tin: Finished deploy [parsoid/deploy@b067328]: Deploying Parsoid to bump heap limits to 900m (from 600m) (duration: 06m 25s)
  • 17:19 ssastry@tin: Started deploy [parsoid/deploy@b067328]: Deploying Parsoid to bump heap limits to 900m (from 600m)
  • 17:19 mobrovac@tin: Started deploy [restbase/deploy@1bfada4]: Blacklist all user pages on commons
  • 17:12 XenoRyet: updated tools from a8b8d72 to a1e9342
  • 17:09 elukey: restart nutcracker in codfw (profile::mediawiki::nutcracker) to make sure that all the daemons are running with the latest config
  • 16:26 bblack: completed Traffic-layer portions of codfw switchover ( https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchover_2 )
  • 16:21 bblack: starting Traffic-layer portions of codfw switchover ( https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchover_2 )
  • 16:15 jynus: reimporting some rows to dbstore1002 on jawiki and ruwiki T160509
  • 16:12 godog: reboot tin to fix cpu mhz issue and check bios settings - T163158
  • 16:09 mobrovac@tin: Finished deploy [restbase/deploy@960b468]: Blacklist an enwiki and a commons page (duration: 08m 16s)
  • 16:01 mobrovac@tin: Started deploy [restbase/deploy@960b468]: Blacklist an enwiki and a commons page
  • 16:00 mobrovac@tin: Finished deploy [restbase/deploy@960b468]: Dev Cluster: Blacklist an enwiki and a commons page (duration: 01m 42s)
  • 15:58 mobrovac@tin: Started deploy [restbase/deploy@960b468]: Dev Cluster: Blacklist an enwiki and a commons page
  • 15:20 elukey: restored default output-buffer config for rdb2005:6479
  • 15:08 godog: puppet-run on cache_upload in codfw/eqiad to pick up swift a/p changes
  • 15:02 godog: puppet-run on cache_upload in codfw/eqiad to pick up switch a/a changes
  • 15:02 gehel: upgrading elastic2020 to elasticsearch 5.1.2
  • 14:55 _joe_: switchover of services, misc things done
  • 14:54 oblivian:: Setting restbase-async in codfw DOWN
  • 14:54 oblivian:: Setting restbase-async in eqiad UP
  • 14:43 _joe_: switching traffic for all a/a services plus maps and restbase to codfw-only
  • 14:38 _joe_: forcing puppet run on caches for catching up with the a/a setting of maps and restbase
  • 14:33 oblivian:: Setting restbase in eqiad DOWN
  • 14:33 _joe_: starting switchover of services eqiad => codfw; external traffic will be switched over, as well as internal traffic to restbase
  • 14:25 gehel: un-ban elastic2020 to get ready for real-life test during switchover - T149006
  • 14:22 elukey: executed config set client-output-buffer-limit "normal 0 0 0 slave 2147483648 2147483648 300 pubsub 33554432 8388608 60" on rdb2005:6749 as attempt to solve slave lagging - T159850
  • 14:21 oblivian:: Setting mobileapps in eqiad UP
  • 14:14 oblivian:: Setting mobileapps in eqiad DOWN
  • 14:11 elukey: executed CONFIG SET appendfsync everysec (default) to restore defaults on rdb2005:6479- T159850
  • 14:08 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_restart_parsoid(codfw, eqiad) Successfully completed
  • 14:04 elukey: executed CONFIG SET appendfsync no on rdb2005:6479 to test if fsync stalls affect replication - T159850
  • 13:50 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_restart_parsoid(codfw, eqiad) Rolling restart parsoid in eqiad and codfw
  • 13:35 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Failed to execute
  • 13:35 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Stop MediaWiki maintenance in the old master DC
  • 12:32 moritzm: upgrading labnodepool1001 to Linux 4.9
  • 12:13 moritzm: upgraded mw1261 to HHVM 3.18.2+wmf2
  • 11:39 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Successfully completed
  • 11:38 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Start MediaWiki maintenance in the new master DC
  • 11:37 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_tendril(codfw, eqiad) Successfully completed
  • 11:37 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_tendril(codfw, eqiad) Update Tendril configuration for the new masters
  • 11:35 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_tendril(eqiad, codfw) Successfully completed
  • 11:35 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_tendril(eqiad, codfw) Update Tendril configuration for the new masters
  • 11:34 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_tendril(codfw, eqiad) Successfully completed
  • 11:34 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_tendril(codfw, eqiad) Update Tendril configuration for the new masters
  • 11:33 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Successfully completed
  • 11:33 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Restore the TTL of all the MediaWiki discovery records
  • 11:31 switchdc: (volans@sarin) END TASK - switchdc.stages.t08_stop_mediawiki_readonly(codfw, eqiad) Successfully completed
  • 11:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t08_stop_mediawiki_readonly(codfw, eqiad) Set MediaWiki in read-write mode (db_to config already merged and git pulled)
  • 11:30 switchdc: (volans@sarin) END TASK - switchdc.stages.t07_coredb_masters_readwrite(codfw, eqiad) Successfully completed
  • 11:30 switchdc: (volans@sarin) START TASK - switchdc.stages.t07_coredb_masters_readwrite(codfw, eqiad) set core DB masters in read-write mode
  • 11:18 switchdc: (volans@sarin) END TASK - switchdc.stages.t06_redis(codfw, eqiad) Successfully completed
  • 11:18 switchdc: (volans@sarin) START TASK - switchdc.stages.t06_redis(codfw, eqiad) Switch the Redis replication
  • 11:14 moritzm: upgrading logstash* to Linux 4.9
  • 10:58 switchdc: (volans@sarin) END TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Successfully completed
  • 10:56 switchdc: (volans@sarin) START TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Switch traffic flow to the appservers in the new datacenter
  • 10:56 switchdc: (volans@sarin) END TASK - switchdc.stages.t05_switch_datacenter(codfw, eqiad) Successfully completed
  • 10:55 switchdc: (volans@sarin) START TASK - switchdc.stages.t05_switch_datacenter(codfw, eqiad) Switch MediaWiki configuration to the new datacenter
  • 10:48 switchdc: (volans@sarin) END TASK - switchdc.stages.t03_coredb_masters_readonly(codfw, eqiad) Failed to execute
  • 10:48 switchdc: (volans@sarin) START TASK - switchdc.stages.t03_coredb_masters_readonly(codfw, eqiad) set core DB masters in read-only mode
  • 10:43 switchdc: (volans@sarin) END TASK - switchdc.stages.t02_start_mediawiki_readonly(codfw, eqiad) Successfully completed
  • 10:43 switchdc: (volans@sarin) START TASK - switchdc.stages.t02_start_mediawiki_readonly(codfw, eqiad) Set MediaWiki in read-only mode (db_from config already merged and git pulled)
  • 10:33 switchdc: (volans@sarin) END TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Failed to execute
  • 10:33 switchdc: (volans@sarin) START TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Stop MediaWiki maintenance in the old master DC
  • 10:31 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_reduce_ttl(codfw, eqiad) Successfully completed
  • 10:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_reduce_ttl(codfw, eqiad) Reduce the TTL of all the MediaWiki discovery records
  • 10:31 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_disable_puppet(codfw, eqiad) Successfully completed
  • 10:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_disable_puppet(codfw, eqiad) Disabling puppet on selected hosts
  • 10:28 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_reduce_ttl(codfw, eqiad) Failed to execute
  • 10:28 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_reduce_ttl(codfw, eqiad) Reduce the TTL of all the MediaWiki discovery records
  • 10:26 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_disable_puppet(codfw, eqiad) Successfully completed
  • 10:26 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_disable_puppet(codfw, eqiad) Disabling puppet on selected hosts
  • 10:25 volans: Final test of switchdc steps in the codfw->eqiad configuration, only idempotent changes, T160178
  • 10:25 moritzm: installing wireshark security updates
  • 10:20 moritzm: uploaded HHVM 3.18.2+wmf2 for jessie-wikimedia/experimental (includes fix for T162354)
  • 09:52 oblivian:: Setting zotero in codfw UP
  • 09:50 _joe_: testing switchover script for services, will act on zotero in codfw
  • 09:45 _joe_: adding 60G to the ocg output partition on ocg1003
  • 09:17 oblivian@neodymium: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw
  • 09:03 volans: upgrading conftool to v0.4.1 on neodymium/sarin
  • 07:48 _joe_: uploaded python-conftool 0.4.1 to jessie-wikimedia
  • 07:42 _joe_: cleaning up orphaned COW images in /var/cache/pbuilder/build/ on copper
  • 06:16 marostegui: For the record: restarted s7 instance on db1069 - T163183
  • 00:36 catrope@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend/resources/mobile.mainMenu/mainmenu.less: T163059 (duration: 03m 07s)

2017-04-17

  • 23:37 mutante: runnin rmmod acpi_pad on the 16 R320 via cumin, since blacklisting in puppet does not actively remove, confirmed unloaded. (16/16) success ratio (>= 100.0% threshold) for command: 'lsmod|grep -c acpi_pad ||:' (T162850)
  • 23:33 mutante: running puppet via cumin on all 16 Dell PowerEdge R320, adding blacklist file for acpi_pad kernel module. 15/16 success, all but tin (T162850)
  • 22:46 catrope@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T158458 T163152 (duration: 03m 01s)
  • 22:42 mutante: tin - load average going down, acpi_pad processes gone, cpu usage low again (T163158)
  • 22:40 mutante: tin - rmmod acpi_pad (T163158)
  • 22:08 catrope@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T158458 T163152 (duration: 16m 23s)
  • 19:16 mutante: tegmen test ircecho stop/start service to confirm it's fine on jessie/prod icinga role (that's the passive server)
  • 19:02 demon@tin: Synchronized wmf-config/: Pruning some old extension message files, co-master sync (duration: 01m 52s)
  • 18:58 demon@tin: Pruned MediaWiki: 1.29.0-wmf.15 (duration: 00m 14s)
  • 18:46 maxsem@tin: Finished deploy [tilerator/deploy@001811e]: https://gerrit.wikimedia.org/r/#/c/348224/ to test hosts only (duration: 00m 19s)
  • 18:46 maxsem@tin: Started deploy [tilerator/deploy@001811e]: https://gerrit.wikimedia.org/r/#/c/348224/ to test hosts only
  • 18:45 maxsem@tin: scap aborted: https://gerrit.wikimedia.org/r/#/c/348224/ to test hosts only (duration: 00m 19s)
  • 18:45 maxsem@tin: Started scap: https://gerrit.wikimedia.org/r/#/c/348224/ to test hosts only
  • 15:48 mobrovac@tin: Finished deploy [restbase/deploy@6595298]: Update client caching headers for T161284 (duration: 08m 15s)
  • 15:40 mobrovac@tin: Started deploy [restbase/deploy@6595298]: Update client caching headers for T161284
  • 15:34 mobrovac@tin: Finished deploy [restbase/deploy@6595298]: (no justification provided) (duration: 01m 29s)
  • 15:33 mobrovac@tin: Started deploy [restbase/deploy@6595298]: (no justification provided)
  • 15:32 mobrovac@tin: Finished deploy [restbase/deploy@6595298]: (no justification provided) (duration: 01m 42s)
  • 15:31 mobrovac@tin: Started deploy [restbase/deploy@6595298]: (no justification provided)
  • 09:33 marostegui: Silence alerts for restbase2004 and restbase2009 T160759

2017-04-16

  • 15:44 elukey: restart ocg on ocg1003 to clean up deleted files in lsof
  • 15:35 elukey: executing sudo find -name *.pdf -mtime +3 -exec rm {} \; on ocg1003's /srv/deployment/ocg/output to clean up some disk space - T162780

2017-04-14

  • 23:14 jynus: skipping CREATE DATABASE wbwikimedia on dbstore2001- duplicate declaration due to multi-source
  • 22:58 jynus: skipping CREATE DATABASE pawikisource on dbstore2001- duplicate declaration due to multi-source
  • 22:49 volans: restarting parsoid to get the disable linter change T148609
  • 22:17 Reedy: created linter tables on wbwikimedia T148609
  • 22:16 Reedy: created linter tables on pawikisource T148609
  • 20:53 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Disable Linter on larger wikis T148609 (duration: 00m 41s)
  • 20:26 reedy@tin: Synchronized wmf-config/abusefilter.php: abusefilter-modify-restricted for trwiki T161960 (duration: 01m 38s)
  • 17:48 mutante: mw1297 - restarted hhvm and apache
  • 17:07 twentyafterfour: deployed phabricator hotfix for T162943
  • 10:29 elukey: rollback systctl settings on mw1306 after experiment (stop jobchron/runner, stop hhvm, restore systctl settings, restart hhvm and job* daemons)
  • 09:50 elukey: temporarily set sysctl -w net.netfilter.nf_conntrack_max=524288 on mw1306 (jobrunner) as test - (rollback: sysctl -w net.netfilter.nf_conntrack_max=262144")
  • 09:43 elukey: temporarily set sysctl -w net.ipv4.ip_local_port_range="15000 64000" on mw1306 (jobrunner) as test - (rollback: sysctl -w net.ipv4.ip_local_port_range="32768 60999") - T157968
  • 08:32 elukey: restored appendfsync to 'everysec' on Redis rdb2005:6380 (end of performance experiment)
  • 07:23 elukey: executed CONFIG SET appendfsync no on redis2005:6780 as performance test
  • 00:39 niharika29@tin: Synchronized wmf-config/abusefilter.php: Fix Abuse Filter configuration for tr.wikipedia (T161960) (duration: 00m 42s)
  • 00:30 niharika29@tin: Finished scap: Reword ORES preferences (T162831), Put ORES r behind a preference (T162831), Deploy Special:Autoblocklist (T146414) (duration: 24m 44s)
  • 00:05 niharika29@tin: Started scap: Reword ORES preferences (T162831), Put ORES r behind a preference (T162831), Deploy Special:Autoblocklist (T146414)
  • 00:03 mutante: mw1297 - restart hhvm/apache
  • 00:03 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Remove use of blacklist for related pages feature (T162201) (duration: 00m 41s)
  • 00:02 niharika29@tin: Synchronized wmf-config/CommonSettings.php: Remove use of blacklist for related pages feature (T162201) (duration: 00m 41s)
  • 00:00 mutante: mw1293 - restart hhvm

2017-04-13

  • 23:56 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Retry sync Revert Remove use of blacklist for related pages feature (T162201) (duration: 00m 40s)
  • 23:51 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Revert Remove use of blacklist for related pages feature (T162201) (duration: 00m 41s)
  • 23:43 niharika29@tin: Synchronized wmf-config/CommonSettings.php: Remove use of blacklist for related pages feature (T162201) (duration: 00m 40s)
  • 23:41 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Remove use of blacklist for related pages feature (T162201) (duration: 00m 40s)
  • 23:39 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Enable related pages on Vector for htwiki (T126826) (duration: 00m 41s)
  • 23:26 niharika29@tin: Synchronized php-1.29.0-wmf.20/extensions/CirrusSearch/: Revert Workaround OOM issue on ngrams field (duration: 00m 54s)
  • 23:19 Dereckson: Create account for Jayantanth on wb.wikimedia (bureaucrat)
  • 23:09 dereckson@tin: Synchronized wmf-config/interwiki.php: DMOZ, pa.wikisource and wb.wikimedia interwiki map update (duration: 00m 41s)
  • 23:01 Dereckson: Create local-multiwrite stores for wb.wikimedia (T162510)
  • 23:01 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Initial configurationfor wb.wikimedia.org (T162510) (duration: 00m 40s)
  • 23:00 Dereckson: Create Translate extension tables for wb.wikimedia (T162510)
  • 22:59 dereckson@tin: Synchronized multiversion/MWMultiVersion.php: Add wb.wikimedia.org to wikimedia.org domains to serve as wikis (T162510) (duration: 00m 40s)
  • 22:59 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: Create wb.wikimedia.org (T162510)
  • 22:58 dereckson@tin: Synchronized dblists: Create wb.wikimedia.org (T162510) (duration: 00m 41s)
  • 22:47 dereckson@tin: Synchronized static/images/project-logos/: Logos for wb.wikimedia (T162510) (duration: 00m 41s)
  • 22:32 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: pa.wikisource creation (take two) (duration: 00m 41s)
  • 22:31 dereckson@tin: Synchronized w/static/images/project-logos/: pa.wikisource creation (take two) (duration: 00m 40s)
  • 22:30 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: pa.wikisource creation (take two)
  • 22:30 dereckson@tin: Synchronized dblists: pa.wikisource creation (take two) (duration: 00m 41s)
  • 22:15 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for pa.wikisource (T149522) (duration: 00m 41s)
  • 22:14 dereckson@tin: Synchronized static/images/project-logos/: Logos for pa.wikisource (T149522) (duration: 00m 41s)
  • 22:12 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: (no justification provided)
  • 22:12 dereckson@tin: Synchronized dblists: pa.wikisource creation (T149522) (duration: 00m 41s)
  • 21:56 demon@tin: Finished scap: pruned cdb files from wmf.18 (duration: 07m 55s)
  • 21:48 demon@tin: Started scap: pruned cdb files from wmf.18
  • 20:07 urandom: T161243: Clearing all snapshots
  • 19:45 ejegg: updated civicrm from 908b9c1 to 90d679b
  • 19:43 ejegg: updated SmashPig from ab52dbe to 3db064d
  • 19:16 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.20
  • 18:57 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Clean Wikisource namespaces T46320 (duration: 00m 43s)
  • 18:42 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Education Program on it.wikiversity T162692 (duration: 00m 43s)
  • 18:38 reedy@tin: Synchronized php-1.29.0-wmf.20/extensions/LiquidThreads: Remove extra parameter from hook (duration: 00m 45s)
  • 18:35 reedy@tin: Synchronized wmf-config/abusefilter.php: Enable AbuseFilter blocks on tr.wikipedia T161960 (duration: 00m 43s)
  • 18:30 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Enable NewUserMessage on tr.wikiquote T161962 (duration: 00m 43s)
  • 18:30 urandom: T161243: Truncating parsoid tables (wikimedia storage group)
  • 18:29 mutante: restarting jenkins service to apply logging change gerrit:347877. it was already tested on jenkinstest.integration.eqiad.wmflabs
  • 18:25 reedy@tin: Synchronized php-1.29.0-wmf.20/extensions/Wikidata: Stop some logspam for deprecated hooks (duration: 02m 06s)
  • 18:23 reedy@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents: Stop some logspam for deprecated hooks (duration: 00m 43s)
  • 18:21 reedy@tin: Synchronized php-1.29.0-wmf.20/extensions/LiquidThreads: Stop some logspam for deprecated hooks (duration: 00m 45s)
  • 18:19 reedy@tin: Synchronized php-1.29.0-wmf.19/extensions/Wikidata: Stop some logspam for deprecated hook usage (duration: 02m 14s)
  • 18:16 urandom: T161243: Truncating parsoid tables (default storage group)
  • 18:16 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Document EducationProgram config (duration: 00m 43s)
  • 18:12 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Set wgUsejQueryThree to false everywhere ahead of further testing (duration: 00m 43s)
  • 18:09 reedy@tin: Synchronized wmf-config/CommonSettings-labs.php: Run 3d2png with xfvb-run on beta (duration: 00m 43s)
  • 16:55 elukey: restored default value of client-output-buffer-limit on rdb1007:6379 - T159850
  • 16:23 mobrovac@tin: Finished deploy [citoid/deploy@b8c4cb2]: Test deploy for T162814 (duration: 02m 24s)
  • 16:21 mobrovac@tin: Started deploy [citoid/deploy@b8c4cb2]: Test deploy for T162814
  • 16:15 thcipriani@tin: Synchronized README: scap.cfg change test (duration: 00m 44s)
  • 15:49 mobrovac@tin: Finished deploy [citoid/deploy@212800d]: Enable multiple results for T115248 and remove b/c for T114515 (duration: 03m 10s)
  • 15:46 mobrovac@tin: Started deploy [citoid/deploy@212800d]: Enable multiple results for T115248 and remove b/c for T114515
  • 15:02 andrewbogott: disabling puppet on dubnium and pollux for a cautious merge of https://gerrit.wikimedia.org/r/#/c/348071
  • 15:01 andrewbogott: disabling puppet on seaborgium and serpens for a cautious merge of https://gerrit.wikimedia.org/r/#/c/348071
  • 14:56 ppchelko@tin: Finished deploy [changeprop/deploy@e47afea]: Provide separate rules for ORES precaching in both DCs (duration: 00m 58s)
  • 14:55 ppchelko@tin: Started deploy [changeprop/deploy@e47afea]: Provide separate rules for ORES precaching in both DCs
  • 14:50 moritzm: installing bouncycastle security updates
  • 14:27 bblack: disabling puppet on recnds/ntp boxes to control patch rollout
  • 13:28 moritzm: powercycling thumbor1001, stuck in reboot
  • 13:18 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 43s)
  • 13:16 hashar@tin: Synchronized dblists/closed.dblist: Close wikimania2016 - T161183 (duration: 00m 43s)
  • 13:14 hashar@tin: Synchronized static/images/project-logos: (no justification provided) (duration: 00m 46s)
  • 13:00 moritzm: Upgrading thumbor* to Linux 4.9
  • 12:52 elukey: temporary set config set client-output-buffer-limit "slave 5368709120 5368709120 180" on rdb1007:6379
  • 12:34 volans@tin: Synchronized wmf-config/db-eqiad.php: Use a generic retry for the read only message T160178 (duration: 00m 44s)
  • 12:34 elukey: temporary set config set client-output-buffer-limit "slave 3221225472 3221225472 180" on rdb1007:6379
  • 12:22 volans@tin: Synchronized wmf-config/db-codfw.php: Use a generic retry for the read only message T160178 (duration: 01m 54s)
  • 12:16 moritzm: restarting ntp on achernar
  • 11:59 elukey: temporary set config set client-output-buffer-limit "slave 2536870912 2536870912 60" on rdb1007:6379
  • 11:37 elukey: temporary set config set client-output-buffer-limit "slave 2147483648 2147483648 60" on rdb1007:6379 to give time to rdb2005's replication to catch up - T159850
  • 10:58 moritzm: rebooting alsafi to Linux 4.9
  • 10:58 moritzm: rebooting alfafi to Linux 4.9
  • 10:47 elukey: reverted previous config for Redis rdb2005
  • 10:47 XioNoX: Confirmed we can still reach cr2-knams:lo0 via v6 (from esams), disabling IPv4 transit sessions for T162601
  • 10:42 XioNoX: disable V6 transit BGP session on cr2-knams for T162601
  • 10:22 elukey: executed CONFIG SET appendfsync no (prev value: "everysec") to Redis instance 6380 on rdb2005 - T125735
  • 10:13 godog: upgrade thumbor to 0.1.38
  • 10:08 moritzm: rebooting restbase1016 to Linux 4.9
  • 09:39 moritzm: rebooting restbase1011 to Linux 4.9
  • 09:12 moritzm: rebooting restbase1010 to Linux 4.9
  • 06:29 elukey: re-arm keyholder on mira after reboot
  • 06:14 elukey: powercycle mira - eth0 errors in the dmesg, CPU system utilization skyrocketed
  • 04:14 mutante: ms-be2023 is rebooting
  • 04:12 mutante: ms-be2023 icinga alerts, no more swift processes. cant ssh to it. attempt to power cycle. mgmt console enourmous spam of "rejecting I/O to offline device"
  • 01:58 bblack@neodymium: conftool action : set/pooled=yes; selector: name=achernar.wikimedia.org,dc=codfw,cluster=dns,service=pdns_recursor
  • 00:36 catrope@tin: Finished scap: Split RCFilters GuidedTour messages for ORES vs non-ORES (T162693) (duration: 53m 47s)

2017-04-12

  • 23:42 catrope@tin: Started scap: Split RCFilters GuidedTour messages for ORES vs non-ORES (T162693)
  • 23:37 catrope@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend/: Log only infoboxes which are not a direct children of lead section (T149884) (duration: 01m 05s)
  • 23:35 catrope@tin: Synchronized php-1.29.0-wmf.20/resources/src/mediawiki.widgets: Fix setDisabled in mw.widgets.Complex* (T162667) (duration: 00m 42s)
  • 23:32 catrope@tin: Synchronized php-1.29.0-wmf.19/resources/src/mediawiki.widgets: Fix setDisabled in mw.widgets.Complex* (T162667) (duration: 00m 44s)
  • 23:25 awight: rebuilt and reenabled process-control jobs
  • 23:20 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Disable cross-wiki uploads to Commons (T162374) (duration: 00m 43s)
  • 23:19 cwd: removed p-c crontab to stop all jobs
  • 23:15 bblack@neodymium: conftool action : set/pooled=no; selector: name=achernar.wikimedia.org,dc=codfw,cluster=dns,service=pdns_recursor
  • 23:13 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable wgCiteResponsiveReferences on cawiki (T161307) and bgwiki (T162145) (duration: 00m 44s)
  • 23:02 bblack@neodymium: conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org,dc=codfw,cluster=dns,service=pdns_recursor
  • 22:50 bblack: acamar fixed up BIOS: HT disabled and power mgmt was set to PPW (DAPC) instead of PPW (OS)
  • 22:45 bblack: downtiming acamar again to fixup bios stuff (HT at least)
  • 21:31 Dereckson: Create Education Program tables on it.wikiversity (T162692)
  • 20:44 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to all wikis - T148609 (duration: 00m 44s)
  • 20:42 bblack@neodymium: conftool action : set/pooled=no; selector: name=acamar.wikimedia.org,dc=codfw,cluster=dns,service=pdns_recursor
  • 20:25 mutante: planet2001 - manually updating all feeds to make it active (or would have to wait for crons)
  • 20:12 ssastry@tin: Finished deploy [parsoid/deploy@323cebb]: Updating Parsoid to 75debae3 (duration: 09m 16s)
  • 20:07 mutante: planet2001 - activating all the crons, making planet active/active eqiad/codfw
  • 20:03 ssastry@tin: Started deploy [parsoid/deploy@323cebb]: Updating Parsoid to 75debae3
  • 19:42 bd808@tin: Synchronized wmf-config/mc.php: Revert "wikitech: Enable binary memcached protocol" (duration: 00m 43s)
  • 19:05 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.20
  • 19:05 XenoRyet: reverted SmashPig from aede277 to ab52dbe
  • 19:05 demon@tin: Synchronized php: symlink bump (duration: 00m 42s)
  • 19:04 ejegg: updated payments-wiki from 0b396a3 to 36f38f6
  • 18:52 XenoRyet: updated SmashPig from ab52dbe to aede277
  • 18:45 thcipriani@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend: SWAT: formatter: Increase log level of infobox message T149884 (duration: 00m 46s)
  • 18:44 ppchelko@tin: Finished deploy [changeprop/deploy@e403f56]: Config: Send ORES precache requests to both DCs. Attempt #2. T159615 (duration: 01m 15s)
  • 18:43 ppchelko@tin: Started deploy [changeprop/deploy@e403f56]: Config: Send ORES precache requests to both DCs. Attempt #2. T159615
  • 18:38 thcipriani@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend: SWAT: formatter: Change log channel of infobox message T149884 (duration: 00m 46s)
  • 18:37 ppchelko@tin: Finished deploy [changeprop/deploy@0a9a008]: Config: Send ORES precache requests to both DCs. T159615 (duration: 06m 53s)
  • 18:30 ppchelko@tin: Started deploy [changeprop/deploy@0a9a008]: Config: Send ORES precache requests to both DCs. T159615
  • 18:26 thcipriani@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend: SWAT: setMobileOptions at time of skin creation T125588 (duration: 00m 46s)
  • 18:18 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Tweak Russian logo wordmark T162036 PART II (duration: 00m 43s)
  • 18:16 thcipriani@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ru.svg: SWAT: Tweak Russian logo wordmark T162036 PART I (duration: 00m 43s)
  • 16:46 awight@tin: rebuilt wikiversions.php and synchronized wikiversions files: (no justification provided)
  • 16:37 awight@tin: Synchronized php-1.29.0-wmf.20/extensions/FundraiserLandingPage: Fix for donatewiki T162716 (duration: 00m 45s)
  • 16:35 awight@tin: Synchronized php-1.29.0-wmf.19/extensions/FundraiserLandingPage: Fix for donatewiki T162716 (duration: 00m 48s)
  • 15:53 chasemp: remove 2fa for Freddy2001 on wikitech per T162772
  • 14:31 andrewbogott: running maintain-meta_p on labsdb1001/1003/1009/1010/1011
  • 14:23 hashar: Restarting Jenkins for git/scm plugins updates
  • 14:06 hashar: European SWAT complete
  • 13:51 switchdc: (volans@neodymium) END TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Successfully completed
  • 13:48 switchdc: (volans@neodymium) START TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Switch traffic flow to the appservers in the new datacenter
  • 13:42 volans: testing t05_switch_traffic of the switchdc
  • 13:41 elukey: apply SLOWLOG RESET and CONFIG SET slowlog-max-len 100000 (prev value 10000, 10ms) to rdb1005:6380 to track down slow reqs - T125735
  • 13:37 hoo@tin: Synchronized php-1.29.0-wmf.20/extensions/Wikidata: Update Wikibase/ ArticlePlaceholder (duration: 02m 19s)
  • 13:33 hoo@tin: Synchronized php-1.29.0-wmf.19/extensions/Wikidata: Update Wikibase/ ArticlePlaceholder (duration: 02m 16s)
  • 13:33 elukey: restored slowlog-log-slower-than 10000 on rdb2005
  • 13:25 elukey: applied CONFIG SET slowlog-log-slower-than 300000 to Redis 6379 on rdb2005 and reset slowlog history to play with the stats
  • 13:10 addshore@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/extension.json: WMDE Spring campaign PT2/2 (duration: 00m 45s)
  • 13:09 addshore@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/WikimediaEventsHooks.php: WMDE Spring campaign PT1/2 (duration: 00m 45s)
  • 13:08 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Revert "Temporarily enable change dispatch logging on testwikidata" - T159828 (duration: 00m 47s)
  • 12:23 elukey: restart HDFS datanode daemons on all the Hadoop worker node to pick up the new JVM settings
  • 12:18 kartik@tin: Finished deploy [cxserver/deploy@2842efa]: Update cxserver to 56a012d (duration: 03m 58s)
  • 12:14 kartik@tin: Started deploy [cxserver/deploy@2842efa]: Update cxserver to 56a012d
  • 11:57 elukey: restart Yarn nodemanager daemons on all the Hadoop worker node to pick up the new JVM settings
  • 11:05 _joe_: downgrading python-urllib3 on puppetmaster1001
  • 11:02 akosiaris: upgrade puppet across the trusty fleet to 3.8. T162462
  • 10:34 hashar: Upgrading Jenkins "Email Extension" plugin 2.57.1..2.57.2 and restarting Jenkins
  • 10:07 hashar: Upgrading Jenkins "Git client" plugin 2.3.0..2.4.1 and restarting Jenkins
  • 09:58 switchdc: (volans@neodymium) END TASK - switchdc.stages.t07_coredb_masters_readwrite(codfw, eqiad) Successfully completed
  • 09:58 switchdc: (volans@neodymium) START TASK - switchdc.stages.t07_coredb_masters_readwrite(codfw, eqiad) set core DB masters in read-write mode
  • 09:56 switchdc: (volans@neodymium) END TASK - switchdc.stages.t03_coredb_masters_readonly(codfw, eqiad) Failed to execute
  • 09:56 switchdc: (volans@neodymium) START TASK - switchdc.stages.t03_coredb_masters_readonly(codfw, eqiad) set core DB masters in read-only mode
  • 09:53 _joe_: removing the old directory of data from ocg1003
  • 09:52 volans: testing t03 and t07 DB-RO/RW stages of switchdc (codfw->eqiad), we are already in that situation, t03 will fail the verfication, is expected
  • 09:52 godog: swift codfw-prod: ms-be2001 - ms-be2012 initial decom - T162785
  • 09:47 _joe_: remounting the new partition under /srv/deployment/ocg/output, cleaning out the old dir. Will cause a service interruption for requests to ocg1003 for a few minutes. T162780
  • 09:42 gehel: starting load on elastic2020 - T149006
  • 09:41 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: wmgUseGettingStarted false for dewiki (duration: 00m 45s)
  • 09:26 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: WMDE Spring campaign - Add logging from WikimediaEvent (duration: 00m 46s)
  • 09:22 hashar: Restarting Jenkins for Matrix related plugins updates (3)
  • 09:12 _joe_: copying data from / to the neww partition on ocg1003 T162462
  • 09:10 hashar: Restarting Jenkins for plugins update (2)
  • 09:06 _joe_: creating a LVM volume on ocg1003
  • 09:05 hashar: Restarting Jenkins for plugins update
  • 08:59 addshore@tin: Synchronized php-1.29.0-wmf.19/extensions/WikimediaEvents/extension.json: patch1 & patch2 WMDE Spring campaign PT2/2 (duration: 00m 45s)
  • 08:58 addshore@tin: Synchronized php-1.29.0-wmf.19/extensions/WikimediaEvents/WikimediaEventsHooks.php: patch1 & patch2 WMDE Spring campaign PT1/2 (duration: 00m 47s)
  • 08:52 ema: upgrade cache_upload to linux 4.9 T162029
  • 08:44 gehel: reimaging elastic2020 for testing - T149006
  • 08:24 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Successfully completed
  • 08:22 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Start MediaWiki maintenance in the new master DC
  • 08:14 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t08_stop_mediawiki_readonly(codfw, eqiad) Failed to execute
  • 08:14 root@tin: Synchronized wmf-config/db-eqiad.php: Set MediaWiki in read-write mode in datacenter eqiad (duration: 00m 35s)
  • 08:13 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t08_stop_mediawiki_readonly(codfw, eqiad) Set MediaWiki in read-write mode (db_to config already merged and git pulled)
  • 08:09 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t06_redis(codfw, eqiad) Successfully completed
  • 08:09 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t06_redis(codfw, eqiad) Switch the Redis replication
  • 08:02 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t05_switch_datacenter(codfw, eqiad) Successfully completed
  • 08:02 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t05_switch_datacenter(codfw, eqiad) Switch MediaWiki configuration to the new datacenter
  • 08:00 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Successfully completed
  • 07:59 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Restore the TTL of all the MediaWiki discovery records
  • 07:58 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Successfully completed
  • 07:55 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Switch traffic flow to the appservers in the new datacenter
  • 07:55 _joe_: resuming non-dry run tests of switchdc, all logs from switchdc by me are just tests
  • 06:57 _joe_: the last messages are just a test and nothing was really done, as codfw is already in read-only mode right now
  • 06:57 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t02_start_mediawiki_readonly(codfw, eqiad) Failed to execute
  • 06:57 root@tin: Synchronized wmf-config/db-codfw.php: Set MediaWiki in read-only mode in datacenter codfw (duration: 00m 23s)
  • 06:57 switchdc: (oblivian@sarin) MediaWiki read-only period starts at: 2017-04-12 06:56:53.822926
  • 06:56 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t02_start_mediawiki_readonly(codfw, eqiad) Set MediaWiki in read-only mode (db_from config already merged and git pulled)
  • 06:53 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Failed to execute
  • 06:53 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Stop MediaWiki maintenance in the old master DC
  • 06:50 _joe_: testing switchover codfw => eqiad, no destructive actions will be taken
  • 06:42 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1093 - T17441 (duration: 00m 46s)
  • 06:37 elukey: reimage mw2246.codfw.wmnet mw2152.codfw.wmnet to remove the /tmp partition (codfw videoscalers, switchover prep)
  • 06:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1072 - T132416 (duration: 00m 46s)
  • 06:28 _joe_: killing long-running puppet-agent on db2058 too
  • 06:20 _joe_: killing badly-started puppet agents on mc1010, tempdb2001,db1090, db2058, hydrogen, possibly others later
  • 06:13 marostegui: Deploy alter table on db1075 eqiad master (s3, image table) - T160415
  • 06:04 marostegui: Deploy schema change on s6 - db1093 - T17441
  • 06:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 02m 00s)
  • 05:56 marostegui: Deploy alter table on db2108 codfw master (s3, image table) - T160415
  • 04:53 legoktm: started `mwscriptwikiset refreshLinks.php small.dblist` on terbium

2017-04-11

  • 23:58 thcipriani@tin: Synchronized wmf-config/CirrusSearch-production.php: SWAT: Enable deleted archive indexing & searching T109561 PART II (duration: 00m 45s)
  • 23:56 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable deleted archive indexing & searching T109561 PART I (duration: 00m 45s)
  • 23:29 ejegg: updated fundraising-tools from 0a42db3 to a8b8d72
  • 23:27 thcipriani@tin: Synchronized portals: SWAT: Bumping portals to master T128546 (duration: 00m 46s)
  • 23:26 thcipriani@tin: Synchronized portals/prod/wikipedia.org/assets: SWAT: Bumping portals to master T128546 (duration: 00m 46s)
  • 23:23 mutante: ocg: clearing host cache for ocg1001 which is shutdown for hardware repair. (on ocg1003: sudo -u ocg -g ocg nodejs-ocg /srv/deployment/ocg/ocg/mw-ocg-service/scripts/clear-host-cache.js -c /etc/ocg/mw-ocg-service.js ocg1001) T161158
  • 23:15 thcipriani@tin: Synchronized docroot/noc/conf/pageassessments.dblist: SWAT: Adding pageassessments.dblist for maintanence script T159438 PART II (duration: 00m 45s)
  • 23:14 thcipriani@tin: Synchronized dblists/pageassessments.dblist: SWAT: Adding pageassessments.dblist for maintanence script T159438 PART I (duration: 00m 45s)
  • 23:11 mutante: ocg1001 - scheduled downtime in icinga for host and all services, confirmed it's not actively doign things anymore, shutting down for hardware replacement (T161158)
  • 23:10 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Flow beta feature on frwikiversity T162022 (duration: 00m 46s)
  • 23:04 mutante: ocg1001 - apt-get clean for disk space
  • 22:36 mutante: ocg1003 started picking up jobs (mw-ocg-latexer) after it was enabled with gerrit:347781, ocg1001 was disabled in the same change. Also ganglia graphs confirm it. T84723 T161158
  • 22:22 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: Enable alternate RevSlider slider on group0 T160410 (duration: 00m 45s)
  • 22:19 dzahn@puppetmaster1001: conftool action : set/pooled=no; selector: name=ocg1001.eqiad.wmnet
  • 22:17 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: Enable TwoColConflict BetaFeature on fiwiki (duration: 00m 46s)
  • 21:23 mobrovac@tin: Finished deploy [restbase/deploy@a4042a6]: Update the legal text in the API docs (duration: 06m 49s)
  • 21:17 mobrovac@tin: Started deploy [restbase/deploy@a4042a6]: Update the legal text in the API docs
  • 21:16 mobrovac@tin: Finished deploy [restbase/deploy@a4042a6]: Staging: Update the legal text in the API docs (duration: 03m 55s)
  • 21:12 mobrovac@tin: Started deploy [restbase/deploy@a4042a6]: Staging: Update the legal text in the API docs
  • 21:12 mobrovac@tin: Finished deploy [restbase/deploy@a4042a6]: Dev cluster: Update the legal text in the API docs (duration: 01m 37s)
  • 21:11 mobrovac@tin: Started deploy [restbase/deploy@a4042a6]: Dev cluster: Update the legal text in the API docs
  • 20:51 _joe_: killed running 'puppet agent t-v' on ruthenium
  • 19:20 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, full deploy, attempt#2 T160764 (duration: 01m 25s)
  • 19:18 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, full deploy, attempt#2 T160764
  • 19:11 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, full deploy, T160764 (duration: 03m 38s)
  • 19:08 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, full deploy, T160764
  • 19:08 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.20
  • 19:01 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001, attempt#3 T160764 (duration: 00m 52s)
  • 19:00 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001, attempt#3 T160764
  • 18:34 elukey: restart hhvm on mw1165 (debug in /tmp/hhvm.5384.bt.)
  • 18:25 demon@tin: Finished scap: testwiki to wmf.20 to bootstrap (duration: 35m 27s)
  • 17:49 demon@tin: Started scap: testwiki to wmf.20 to bootstrap
  • 17:49 demon@tin: Pruned MediaWiki: 1.29.0-wmf.17 [keeping static files] (duration: 00m 16s)
  • 17:41 mobrovac@tin: Finished deploy [restbase/deploy@e470b9f]: Initial Scap3 config deploy - T116335 (duration: 10m 39s)
  • 17:30 mobrovac@tin: Started deploy [restbase/deploy@e470b9f]: Initial Scap3 config deploy - T116335
  • 17:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1093 (duration: 00m 57s)
  • 17:14 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
  • 17:08 mobrovac: restbase enabling back puppet for T116335
  • 17:07 mobrovac@tin: Finished deploy [restbase/deploy@e470b9f]: Staging: Initial Scap3 config deploy, take 2 - T116335 (duration: 02m 12s)
  • 17:06 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 17:06 marostegui: Deploy unscheduled alter table on db1044 (s3, image table) - T160415
  • 17:05 mobrovac@tin: Started deploy [restbase/deploy@e470b9f]: Staging: Initial Scap3 config deploy, take 2 - T116335
  • 17:05 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001, attempt#2 T160764 (duration: 03m 22s)
  • 17:04 marostegui: Deploy unscheduled alter table on db1015 (s3, image table) - T160415
  • 17:02 mobrovac@tin: Finished deploy [restbase/deploy@e470b9f]: Dev Cluster: Initial Scap3 config deploy, take 2 - T116335 (duration: 00m 58s)
  • 17:02 marostegui: Deploy unscheduled alter table on db1038 (s3, image table) - T160415
  • 17:02 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: nope, no wmf.19 for donatewiki. life is hard
  • 17:02 mobrovac@tin: Started deploy [restbase/deploy@e470b9f]: Dev Cluster: Initial Scap3 config deploy, take 2 - T116335
  • 17:01 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001, attempt#2 T160764
  • 17:00 marostegui: Deploy unscheduled alter table on db1035 (s3, image table) - T160415
  • 16:58 marostegui: Deploy unscheduled alter table on db1077 (s3, image table) - T160415
  • 16:56 marostegui: Deploy unscheduled alter table on db1078 (s3, image table) - T160415
  • 16:54 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: donatewiki back to wmf.19. you put your left foot in, you put your left foot out...
  • 16:48 marostegui: Deploy unscheduled alter table on db1093 (adding pl_from index)
  • 16:45 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 00m 42s)
  • 16:43 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t05_switch_traffic(eqiad, codfw) Successfully completed
  • 16:43 mobrovac@tin: Finished deploy [restbase/deploy@e470b9f]: Staging: Initial Scap3 config deploy - T116335 (duration: 01m 33s)
  • 16:42 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001 T160764 (duration: 04m 28s)
  • 16:41 mobrovac@tin: Started deploy [restbase/deploy@e470b9f]: Staging: Initial Scap3 config deploy - T116335
  • 16:40 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t05_switch_traffic(eqiad, codfw) Switch traffic flow to the appservers in the new datacenter
  • 16:37 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001 T160764
  • 16:37 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: donatewiki still busted
  • 16:35 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: donatewiki back to wmf.19
  • 16:33 mobrovac@tin: Finished deploy [restbase/deploy@e470b9f]: Dev Cluster: Initial Scap3 config deploy - T116335 (duration: 01m 04s)
  • 16:32 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
  • 16:32 mobrovac@tin: Started deploy [restbase/deploy@e470b9f]: Dev Cluster: Initial Scap3 config deploy - T116335
  • 16:28 mobrovac: restbase disabling puppet for T116335
  • 16:27 demon@tin: Synchronized README: no-op, co-master sync (duration: 00m 43s)
  • 16:24 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 16:11 switchdc: (volans@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
  • 16:08 switchdc: (volans@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 16:08 volans: testing the codfw caches wipe+warm, take 2
  • 16:04 demon@tin: Synchronized scap/plugins/clean.py: syncing to both masters (duration: 00m 44s)
  • 15:56 switchdc: (volans@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Failed to execute
  • 15:54 switchdc: (volans@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 15:53 volans: testing the codfw caches wipe+warm: https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Phase_4.1_-_Wipe_caches T160178
  • 15:25 thcipriani@tin: Synchronized README: test sync for new scap version 3.5.5 (duration: 00m 59s)
  • 15:19 godog: upload scap 3.5.5-1 - T127762
  • 15:05 ema: upgrade cp4005 (cache_upload) to linux 4.9 T162029
  • 14:31 moritzm: powercycled restbase1007, stuck during reboot
  • 14:18 moritzm: upgrading restbase1007 to Linux 4.9
  • 13:55 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable RCFilters beta feature on fawiki, ruwiki, trwiki, and frwiki (T144458) (duration: 00m 39s)
  • 13:54 ottomata: reimaging stat1004 as jessie
  • 13:53 akosiaris: upgrade puppet agent to 3.8 across the jessie fleet. Do that in a stages, starting with parsoid hosts. move on to mw fleet next. T162462
  • 13:51 akosiaris: upgrade puppet agent to 3.8 across the jessie fleet. Do that in a stages, starting with parsoid hosts
  • 13:49 godog: roll-upgrade swift to 2.2.0 across eqiad machines - T162609
  • 13:45 hashar: Updating all Jenkins jobs using the git plugin due to JJB change cdfeb7b - https://phabricator.wikimedia.org/T162674
  • 13:39 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add autopatrolled group to svwiktionary (T161919) (duration: 00m 39s)
  • 13:34 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increase default image thumbnail size on Finnish Wikipedia to 250px (T162376) (duration: 00m 39s)
  • 13:10 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give sysops ability to promote users to eliminator at fawiki (T162396) (duration: 00m 39s)
  • 13:01 godog: roll-upgrade swift to 2.2.0 across codfw machines - T162609
  • 12:55 moritzm: powercycling wtp2013, stuck during reboot
  • 12:47 elukey: reimage mw2246 (Debian codfw videoscaler) to Trusty
  • 12:46 marostegui: Deploy schema change on db1069 (s7 instance) - T160390
  • 11:42 ema: upgrade cache_misc to linux 4.9 T162029
  • 11:33 elukey: resume reboot of analytics1040->1050 for kernel upgrades
  • 11:27 moritzm: wtp2* to Linux 4.9
  • 11:27 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: NOOP (Beta file only) - Remove redundant wmgUseRevisionSlider in InitialiseSettings-labs (duration: 00m 38s)
  • 11:09 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: NOOP - Remove redundant testwiki from wmgUseLinter (already has group0) (duration: 00m 39s)
  • 11:02 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: NOOP (Beta file only) - Fix some tabs (duration: 00m 39s)
  • 10:46 moritzm: upgrading wtp1020-wtp1024 to Linux 4.9
  • 10:13 moritzm: upgrading wtp1010-wtp1019 to Linux 4.9
  • 09:17 moritzm: install remaining pam updates from jessie point update
  • 09:11 godog: upgrade swift to 2.2.0 on ms-be2001 - T162609
  • 06:58 moritzm: restarted cassandra-a on restbase2004, crashed with "out of heap memory"
  • 06:50 marostegui: Deploy alter table enwiki.revision dbstore1002 - T132416
  • 06:48 moritzm: installing jasper security updates
  • 06:30 elukey: restart hhvm on mw1299 - dump debug in /tmp/hhvm.84379.bt
  • 06:28 marostegui: Deploy alter table enwiki.revision db1072 - T132416
  • 06:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1072 - T132416 (duration: 00m 43s)
  • 06:07 marostegui: Deploy schema change on db1041 (eqiad master) (s7) - T160390
  • 06:02 marostegui: Deploy schema change labsdb1003 (s7) - T160390
  • 06:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1073 - T132416 (duration: 00m 39s)
  • 02:59 bblack: jessie recdns software upgrades complete
  • 02:52 bblack@neodymium: conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org,service=pdns_recursor
  • 02:51 bblack: upgrading maerlant to pdns-recursor 4.x
  • 02:50 bblack@neodymium: conftool action : set/pooled=no; selector: name=maerlant.wikimedia.org,service=pdns_recursor
  • 02:48 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Apr 11 02:48:56 UTC 2017 (duration 5m 43s)
  • 02:43 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.19) (duration: 07m 16s)
  • 02:37 bblack@neodymium: conftool action : set/pooled=yes; selector: name=chromium.wikimedia.org,service=pdns_recursor
  • 02:32 bblack: upgrading chromium to pdns-recursor 4.x
  • 02:31 bblack@neodymium: conftool action : set/pooled=no; selector: name=chromium.wikimedia.org,service=pdns_recursor
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 07m 47s)
  • 02:16 bblack@puppetmaster1001: conftool action : set/pooled=yes; selector: service=pdns_recursor,name=nescio.wikimedia.org
  • 02:13 bblack: upgrading nescio to pdns-recursor 4.x
  • 02:06 bblack: jessie-recdns: unpausing upgrade process...

2017-04-10

  • 23:43 bblack: jessie-recdns: upgrade to pdns-recursor 4.x paused - hydrogen updated and in-service; chromium/nescio/maerlant still puppet-disabled. Going to leave things in this state for a while. If something seems amiss, hydrogen can be re-depooled via confctl: confctl select name=hydrogen.wikimedia.org,service=pdns_recursor set/pooled=no
  • 23:34 bblack@neodymium: conftool action : set/pooled=yes; selector: name=hydrogen.wikimedia.org,service=pdns_recursor
  • 23:33 bblack: upgrading hydrogen to pdns-recursor 4.x
  • 23:25 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Set ORES thresholds for fawiki, ruwiki, trwiki (duration: 00m 39s)
  • 23:18 bblack@neodymium: conftool action : set/pooled=no; selector: name=hydrogen.wikimedia.org,service=pdns_recursor
  • 23:04 bblack: puppet disabled on jessie recdns (maerlant, nescio, hydrogen, chromium) for complex upgrade process ( https://gerrit.wikimedia.org/r/#/c/346937/ )
  • 22:45 dapatrick: Deployed patch for T162621 to wmf18 and wmf19
  • 22:04 ejegg: updated CiviCRM from b6c8f3e to 908b9c1
  • 21:37 ejegg: updated payments-wiki from b5bcfa1 to 0b396a3
  • 21:33 gehel: logstash upgrade on all logstash1* nodes completed- T161908
  • 21:31 gehel: upgrading logstash on logstash1003 - T161908
  • 21:22 gehel: upgrading logstash on logstash1002 - T161908
  • 21:17 gehel: logstash upgrade on logstash1001 completed - T161908
  • 21:13 gehel: running puppet on logstash1001 to deploy new logstash plugins - T161908
  • 20:45 ejegg: updated payments-wiki from 9622a4b to b5bcfa1
  • 20:29 gehel: upgrading logstash on logstash1001 - T161908
  • 20:27 ebernhardson: deployed new logstash plugins to logstash100[123]
  • 20:16 bsitzmann@tin: Finished deploy [mobileapps/deploy@9bc8c07]: Update mobileapps to 1695900 (duration: 05m 27s)
  • 20:10 bsitzmann@tin: Started deploy [mobileapps/deploy@9bc8c07]: Update mobileapps to 1695900
  • 19:51 andrewbogott: upgrading qemu and oslo packages on labvirt1002
  • 19:38 gehel: disabling puppet on logstash1* - T161908
  • 19:38 gehel: starting logstash upgrade - some log messages will be lost! - T161908
  • 18:22 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ORES review tool in hewiki T161621 (duration: 00m 39s)
  • 18:12 thcipriani: mwscript extensions/ORES/maintenance/CheckModelVersions.php hewiki && mwscript extensions/ORES/maintenance/PopulateDatabase.php hewiki
  • 18:06 thcipriani: create ores tables on hewiki
  • 17:51 elukey: restore Hadoop masters to analytics1001
  • 17:16 papaul: testing lvs2002 after mainboard replacement
  • 17:06 gehel@tin: Finished deploy [wdqs/wdqs@1cfbd8d]: (no justification provided) (duration: 01m 22s)
  • 17:04 gehel@tin: Started deploy [wdqs/wdqs@1cfbd8d]: (no justification provided)
  • 16:48 _joe_: not really restarting parsoid, still testing swtichdc
  • 16:45 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_restart_parsoid(codfw, eqiad) Rolling restart parsoid in eqiad and codfw
  • 16:02 mobrovac@tin: Finished deploy [restbase/deploy@2c70843]: Initial deployment with Scap3 (duration: 07m 52s)
  • 15:58 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Successfully completed
  • 15:58 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Disabling puppet on MediaWiki jobrunners and videoscalers
  • 15:55 mobrovac@tin: Started deploy [restbase/deploy@2c70843]: Initial deployment with Scap3
  • 15:47 cmjohnson1: troubleshooting link cr2-eqiad:xe-3/0/1 {#2014 to asw-b-eqiad:xe-1/1/2 per T162199
  • 15:35 mobrovac@tin: Finished deploy [restbase/deploy@a8d4d02]: Initial deployment with Scap3 (duration: 00m 10s)
  • 15:35 mobrovac@tin: Started deploy [restbase/deploy@a8d4d02]: Initial deployment with Scap3
  • 15:33 mobrovac: restbase enabling back puppet in prod
  • 15:31 mobrovac@tin: Finished deploy [restbase/deploy@a8d4d02]: Initial deployment with Scap3 on staging (duration: 03m 31s)
  • 15:28 mobrovac@tin: Started deploy [restbase/deploy@a8d4d02]: Initial deployment with Scap3 on staging
  • 15:19 mobrovac@tin: Finished deploy [restbase/deploy@a8d4d02]: (no justification provided) (duration: 01m 22s)
  • 15:18 mobrovac@tin: Started deploy [restbase/deploy@a8d4d02]: (no justification provided)
  • 15:15 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 after maintenance with full weight (duration: 00m 39s)
  • 15:05 mobrovac: restbase disabling puppet for upgrade to scap3 deploys
  • 15:01 andrewbogott: disabling puppet on labcontrol1001 to raise log levels
  • 14:58 moritzm: upgrading wtp1006-wtp1009 to Linux 4.9
  • 14:52 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Successfully completed
  • 14:52 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Disabling puppet on MediaWiki jobrunners and videoscalers
  • 14:48 marostegui: Deploy alter table enwiki.revision db1073 - T132416
  • 14:48 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1073 - T132416 (duration: 00m 39s)
  • 14:47 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Failed to execute
  • 14:46 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Start MediaWiki maintenance in the new master DC
  • 14:45 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Successfully completed
  • 14:45 ema: upgrade cache_maps to linux 4.9 T162029
  • 14:45 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Restore the TTL of all the MediaWiki discovery records
  • 14:45 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Failed to execute
  • 14:45 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Start MediaWiki maintenance in the new master DC
  • 14:39 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Successfully completed
  • 14:39 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Disabling puppet on MediaWiki jobrunners and videoscalers
  • 14:31 gehel: deploying new psotgresql replication check, might generate a few icinga alerts -T162345
  • 14:12 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1028 - T160390 (duration: 00m 38s)
  • 14:05 elukey: reimage anaytics1001 to Debian Jessie
  • 13:49 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 after maintenance with low weight (duration: 00m 38s)
  • 13:41 moritzm: upgrading wtp1002-wtp1005 to Linux 4.9
  • 13:30 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Set wgTranslateNumerals false on bhwiki - T160098 (duration: 00m 40s)
  • 13:26 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Create editprotected right on ptwikinews - T162577 (duration: 00m 40s)
  • 13:19 elukey: reboot analytics1040->1050 to pick up the new kernel
  • 13:17 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Increase default thumb size to 250px at nowiki - T155892 (duration: 00m 45s)
  • 13:16 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: pagePreviews: Enable NavPopups gadget detection - T160081 (duration: 00m 40s)
  • 13:00 twentyafterfour: stopped search indexer on iridium to lighten load on m3 databases.
  • 12:55 marostegui: Run pt-table-checksum on s4 - T162593
  • 12:40 akosiaris: upload apertium-spa-cat_2.0.0~r77288-2+wmf1 on apt.wikimedia.org jessie-wikimedia/main
  • 11:11 akosiaris: upload puppet_3.8.5-2~bpo8+1 on apt.wikimedia.org jessie-wikimedia/main
  • 11:00 akosiaris: upload apertium-cat_2.0.0~r77286-1+wmf1, apertium-spa_1.0.0~r77293-1+wmf1 on apt.wikmedia.org/jessie-wikimedia
  • 10:58 gehel: starting load test on elstic2020 - T149006
  • 10:48 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
  • 10:32 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1004.eqiad.wmnet
  • 10:31 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1003.eqiad.wmnet
  • 10:23 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1003.eqiad.wmnet
  • 10:23 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1002.eqiad.wmnet
  • 10:12 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1002.eqiad.wmnet
  • 10:11 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1001.eqiad.wmnet
  • 10:03 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1001.eqiad.wmnet
  • 10:02 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2004.codfw.wmnet
  • 10:01 gehel: rolling restart of maps1* (eqiad) cluster
  • 09:52 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2004.codfw.wmnet
  • 09:52 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2003.codfw.wmnet
  • 09:44 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2003.codfw.wmnet
  • 09:44 XioNoX: all interfaces back up on cr2-esams, BGP sessions up as well T162239
  • 09:44 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2002.codfw.wmnet
  • 09:33 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2002.codfw.wmnet
  • 09:29 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2001.codfw.wmnet
  • 09:18 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2001.codfw.wmnet
  • 09:17 XioNoX: remote hands work started to replace the FPC on cr2-esams T162239
  • 09:16 gehel: rolling restart of maps2* cluster
  • 08:52 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=wdqs
  • 08:51 godog: swift codfw-prod: bump ms-be2028 ms-be2039 object weight to 4000 - T158337
  • 08:48 gehel: reimage elastic2020 - T149006
  • 08:43 gehel: rolling restart of maps-test cluster
  • 08:39 elukey: manual failover of Hadoop master daemons from analyitics1001 to analytics1002 (T160333)
  • 07:48 _joe_: testing a dry-run of the switchdc software on sarin
  • 07:02 moritzm: installing pam updates from jessie point update
  • 06:26 marostegui: Deploy schema change labsdb1001 (s7) - T160390
  • 06:24 marostegui: Deploy schema change db1028 (s7) - T160390
  • 06:24 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1028 - T160390 (duration: 00m 39s)
  • 06:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1034 - T160390 (duration: 00m 38s)
  • 06:07 marostegui: Deploy schema change db1034 (s7) - T160390
  • 06:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Add tempdb2001 to x1 as a slave - T162290 (duration: 00m 38s)
  • 06:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1034 - T160390 (duration: 00m 39s)
  • 02:49 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Apr 10 02:49:06 UTC 2017 (duration 5m 40s)
  • 02:43 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.19) (duration: 07m 32s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 08m 17s)

2017-04-09

  • 02:59 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Apr 9 02:59:29 UTC 2017 (duration 5m 35s)
  • 02:53 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.19) (duration: 07m 36s)
  • 02:28 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 07m 56s)

2017-04-08

  • 20:56 bblack: removed varnishkafka logs and daemon.log.1 on cp1052 to free disk space and clear alert
  • 17:43 chasemp: service nova-compute restart labvirt1002
  • 17:36 chasemp: nova reset-state on 15 nodepool stuck in deletion nodes, and force-delete
  • 17:29 chasemp: delete manual on labcontrol all instances in delete state on nodepool
  • 17:25 chasemp: openstack server delete 970a86ce-2549-4cf3-be91-1f8558ab1b32 (admin-monitoring stuck in build)
  • 17:21 chasemp: restart rabbitmq on labcontrol1001
  • 17:20 chasemp: restart nova-api on labnet
  • 16:00 bblack: banning obj.http.Content-Type ~ text/html on cache_upload
  • 15:46 bblack: banning obj.http.X-Orig-Content-Type !~ . on cache_upload in ulsfo
  • 14:56 bblack: banning obj.http.X-Orig-Content-Type !~ . on cache_upload in esams
  • 13:54 bblack: banning obj.http.X-Orig-Content-Type !~ . on cache_upload in codfw
  • 13:27 bblack: banning obj.http.X-Orig-Content-Type !~ . on cache_upload in eqiad
  • 11:55 bblack: banning obj.http.Content-Type ~ text/html on cache_upload
  • 10:55 jynus: setting labsdb1001 and labsdb1003 in read only mode
  • 09:55 reedy@tin: Finished scap: Rebuild EP l10n cache for namespace aliases T162481 (duration: 79m 11s)
  • 08:36 reedy@tin: Started scap: Rebuild EP l10n cache for namespace aliases T162481
  • 08:34 reedy@tin: Synchronized wmf-config/CommonSettings.php: T162481 (duration: 00m 39s)
  • 08:33 reedy@tin: Synchronized wmf-config/extension-list: T162481 (duration: 00m 40s)
  • 02:56 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Apr 8 02:56:37 UTC 2017 (duration 5m 33s)
  • 02:51 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.19) (duration: 08m 03s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 08m 13s)

2017-04-07

  • 23:16 mutante: gerrit2001 - deleting netmon1001 backup (/srv/netmon1001), stop rsyncd, remove rsyncd config (T125020)
  • 23:06 ejegg: updated DjangoBannerStats from 220f80e to 9e6b117
  • 22:18 reedy@tin: Synchronized php-1.29.0-wmf.19/extensions/EducationProgram/EducationProgram.php: Load wgExtensionMessagesFiles in PHP entry point for mergeMessageLists T162481 (duration: 00m 49s)
  • 20:07 demon@tin: Synchronized README: no-op, testing master sync speed now (duration: 00m 38s)
  • 20:05 demon@tin: Synchronized README: no-op, co-master sync (duration: 00m 39s)
  • 19:41 demon@tin: Finished scap: no-op, final history sync (duration: 23m 05s)
  • 19:18 demon@tin: Started scap: no-op, final history sync
  • 18:40 demon@tin: Synchronized php-1.29.0-wmf.19/includes/specials/: no-op, cleaning up history (duration: 01m 00s)
  • 18:16 demon@tin: Synchronized php-1.29.0-wmf.19/includes/api/: No-op, cleaning up git history (duration: 00m 54s)
  • 17:17 demon@tin: Finished scap: no-op, cleaning up wmf.19 history (duration: 25m 07s)
  • 16:51 demon@tin: Started scap: no-op, cleaning up wmf.19 history
  • 16:29 demon@tin: Synchronized php-1.29.0-wmf.19/extensions/SyntaxHighlight_GeSHi/: no-op, cleaning up history (duration: 00m 44s)
  • 15:32 gehel: reimaging elstic2020 - T149006
  • 14:58 marostegui: Deploy schema change dbstore1001 (s7 wikis) - T160390
  • 14:40 marostegui: Deploy  schema change db1033 (already depooled) (s7) - T160390
  • 14:13 elukey: restart hadoop-hdfs-namenode on an1002 (Hadoop Master standby) to pick up new jvm settings
  • 14:07 elukey: restart hadoop-mapreduce-historyserver on an1001 to pick up the new jvm settings
  • 14:02 switchdc: (oblivian@sarin) Executing task switchdc.stages.t00_reduce_ttl(eqiad, codfw): Reduce the TTL of all the MediaWiki discovery records
  • 14:01 _joe_: running tests of the switchdc automation in dry-run mode
  • 14:01 switchdc: (oblivian@sarin) Executing task switchdc.stages.t00_disable_puppet(eqiad, codfw): Stop puppet execution on maintenance, jobqueues
  • 12:52 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Enable alternate RevisionSlider slider on beta BETA ONLY (duration: 00m 51s)
  • 12:48 bblack: banning cache_upload obj.http.Content-type ~ text/html
  • 12:46 bblack: banning cache_upload obj.http.Content-type == text/html
  • 12:45 bblack: banning cache_upload obj.http.Content-type ~ text
  • 10:53 elukey: increase Redis connection timeout manually (.3s -> .5s) on mw1306 as performance test - T125735
  • 09:22 marostegui: Deploy  schema change db1062 (already depooled) (s7) - T160390
  • 08:15 moritzm: upgrade mw1262-mw1265 to HHVM 3.18.2
  • 07:58 elukey: added "notifempty" to /etc/logrotate.d/nginx on cp1008, it should remove cronspam for access_pipe.log.1.gz
  • 07:51 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=wdqs
  • 07:51 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1094 - T160390 (duration: 00m 50s)
  • 07:50 marostegui: Deploy  schema change db1039 (already depooled) (s7) - T160390
  • 07:21 jynus: reimporting several damaged db tables on s2 T154485
  • 07:17 ariel@tin: Finished deploy [dumps/dumps@af61d8d]: I mean: handle page range generation for wikis with PAGES with hundreds of thousands of revisions (duration: 00m 02s)
  • 07:17 ariel@tin: Started deploy [dumps/dumps@af61d8d]: I mean: handle page range generation for wikis with PAGES with hundreds of thousands of revisions
  • 07:16 ariel@tin: Finished deploy [dumps/dumps@af61d8d]: handle page range generation for wikis with hundreds of thousands of revisions (duration: 00m 03s)
  • 07:16 ariel@tin: Started deploy [dumps/dumps@af61d8d]: handle page range generation for wikis with hundreds of thousands of revisions
  • 06:06 marostegui: Deploy schema change db1094 (s7) - T160390
  • 06:05 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1094 - T160390 (duration: 00m 49s)
  • 03:04 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Apr 7 03:04:52 UTC 2017 (duration 5m 13s)
  • 02:59 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.19) (duration: 14m 11s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 09m 54s)

2017-04-06

  • 23:14 dereckson@tin: Synchronized php-1.29.0-wmf.19/extensions/Popups: actions: Correctly delay FETCH_COMPLETE (Gerrit:346832) (duration: 00m 41s)
  • 22:23 maxsem@tin: Finished deploy [tilerator/deploy@9cf2338]: https://gerrit.wikimedia.org/r/#/c/346913/ to test hosts only (duration: 00m 18s)
  • 22:22 maxsem@tin: Started deploy [tilerator/deploy@9cf2338]: https://gerrit.wikimedia.org/r/#/c/346913/ to test hosts only
  • 22:15 ejegg: re-enabled adyen and paypal SmashPig job runners
  • 22:07 ejegg: re-enabled two main dedupe jobs and orphan rectifier
  • afk: set thank-you back size back to 400
  • 20:52 awight: change thank_you_batch from 400->1
  • 19:42 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Fix ORES threshold settings again (duration: 00m 40s)
  • 19:10 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.19
  • 18:48 legoktm@tin: Synchronized php-1.29.0-wmf.19/extensions/Linter/includes/RecordLintJob.php: Split statsd metrics by wiki - https://gerrit.wikimedia.org/r/#/c/346807 (duration: 00m 42s)
  • 18:17 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgLinterStatsdSampleFactor (duration: 00m 45s)
  • 18:10 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Adjust plwiki, ptwiki ORES thresholds for new model deployment (duration: 00m 40s)
  • 18:00 switchdc: (volans@neodymium) Test switchdc IRC/SAL announcement (2)
  • 17:57 switchdc: (volans@neodymium) Test switchdc IRC/SAL announcement
  • 17:46 maxsem@tin: Finished deploy [tilerator/deploy@71aed11]: https://gerrit.wikimedia.org/r/#/c/346782/ to test hosts (duration: 00m 19s)
  • 17:45 maxsem@tin: Started deploy [tilerator/deploy@71aed11]: https://gerrit.wikimedia.org/r/#/c/346782/ to test hosts
  • 17:41 halfak@tin: Finished deploy [ores/deploy@3396b64]: T161748 (duration: 21m 08s)
  • 17:20 halfak@tin: Started deploy [ores/deploy@3396b64]: T161748
  • 17:19 arlolra@tin: Finished deploy [parsoid/deploy@b5c2a2b]: Updating Parsoid to 56ae82bb (duration: 08m 29s)
  • 17:13 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to medium wikis too - T148609 (duration: 00m 40s)
  • 17:11 arlolra@tin: Started deploy [parsoid/deploy@b5c2a2b]: Updating Parsoid to 56ae82bb
  • 16:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1079 - T160390 (duration: 00m 43s)
  • 16:18 elukey: restart hhvm on mw1227 - debug in /tmp/hhvm.30097.bt. - theads stuck in HPHP::Treadmill::getAgeOldestRequest
  • 16:17 hoo@tin: Synchronized wmf-config/Wikibase-production.php: Try using redisLockManager for test.wikidata.org (T159828) (duration: 00m 39s)
  • 16:11 hoo@tin: Synchronized wmf-config/InitialiseSettings.php: Temporarily enable change dispatch logging on testwikidata (duration: 00m 45s)
  • 15:48 hoo@tin: Synchronized wmf-config/Wikibase.php: Fix Wikibase site groups for testwiki and test2wiki (duration: 00m 40s)
  • 15:36 hoo@tin: Synchronized wmf-config/Wikibase.php: Don't set removed Wikibase client settings (duration: 00m 40s)
  • 15:27 marostegui@tin: Synchronized wmf-config/db-codfw.php: Add tempdb2001 to config files - T162290 (duration: 00m 40s)
  • 15:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add tempdb2001 to config files - T162290 (duration: 00m 39s)
  • 14:55 hoo@tin: Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 42s)
  • 14:51 hoo: Restarted apache on mwdebug1001 in order to test a potential CACHE_ACCEL issue
  • 14:46 hoo@tin: Synchronized wmf-config/: Don't use "enwiki" as Wikibase site id on testwiki and test2wiki (T94416) (duration: 01m 08s)
  • 14:12 hoo@tin: Synchronized wmf-config/Wikibase.php: Add testwiki and test2wiki to "specialSiteLinkGroups" on testwikidata (T94416) (duration: 00m 40s)
  • 14:04 elukey: reimage analytics1002 to Debian Jessie (Hadoop Master Node standby)
  • 13:44 gehel: re-generating tiles for tasmania on maps codfw cluster
  • 13:42 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1079 - T160390 (duration: 00m 39s)
  • 13:39 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db1079 - T160390 (duration: 00m 43s)
  • 13:39 marostegui: Deploy schema change db1079 (s7 wikis) - T160390
  • 13:34 hashar: European SWAT completed
  • 13:30 marostegui: Deploy Deploy schema change dbstore1002 (s7 wikis) - T160390
  • 13:20 hashar@tin: Synchronized php-1.29.0-wmf.19/extensions/Popups: renderer: Pass event to behavior for processing - T162324 (duration: 00m 51s)
  • 12:51 ema: upgrade cp3007 to linux 4.9 T162029
  • 12:50 moritzm: upgraded mw1261 to HHVM 3.18.2 with cherrypicked fix for stat_cache deadlock, now running with stat_cache enabled again
  • 12:39 ema: rebooting cp2006 again to check for potential issues bringing up network ifaces / loading intel_uncore T162029
  • 12:28 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1081 original weight - T161088 (duration: 00m 40s)
  • 12:28 ema: cp2009 stuck rebooting, powercycled
  • 12:21 ema: upgrade cp2009 to linux 4.9 T162029
  • 12:16 moritzm: uploaded HHVM 3.18.2 to jessie-wikimedia/experimental
  • 11:51 ema: upgrade cp2006 to linux 4.9 T162029
  • 11:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increae db1081 weight - T161088 (duration: 00m 40s)
  • 10:59 _joe_: running some tests for the switchdc automation
  • 09:33 moritzm: installing freetype security updates on trusty
  • 08:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increae db1081 weight - T161088 (duration: 00m 39s)
  • 08:41 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=wdqs
  • 08:40 moritzm: installing glibc updates on trusty
  • 08:37 gehel: shutting down wdqs codfw for data reimport - T162111
  • 08:34 hashar: starting Jenkins on contint1001
  • 08:27 moritzm: rebooting contint1001 to Linux 4.9
  • 08:02 elukey: restart hhvm on mw1194 - dump debug in /tmp/hhvm.1692.bt. - threads stuck in HPHP::Treadmill::getAgeOldestRequest
  • 07:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increae db1081 weight - T161088 (duration: 00m 39s)
  • 07:32 ema: cache_upload: ban all objects with content-type ~ "^text" T162035
  • 07:16 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1081 with low weight - T161088 (duration: 00m 48s)
  • 06:29 elukey: restart hhvm on mw1165 (jobrunner) - dump debug in /tmp/hhvm.19449.bt. - threads stuck in HPHP::Treadmill::getAgeOldestRequest
  • 06:09 marostegui: Deploy schema change db2029 (s7 codfw master) - T160390
  • 06:02 marostegui: Configure and start replication on db1081 after the defragment - T161088
  • 05:58 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2047 - T160390 (duration: 00m 40s)
  • 05:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 after compression - T153743 (duration: 00m 51s)
  • 04:06 twentyafterfour: restarting apache2 on iridium to apply a minor hotfix
  • 03:06 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Apr 6 03:06:35 UTC 2017 (duration 5m 59s)
  • 03:00 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.19) (duration: 15m 46s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 10m 28s)
  • 01:45 mutante: restarting gerrit to pick up config change gerrit:346180 (disable MD5)
  • 00:39 mutante: install1002/2002: deleting /srv/autoinstall/precise.cfg
  • 00:37 mutante: install1002/2002: deleteing /srv/tftboot/precise-installer | puppetmaster1002/2001: deleting /var/lib/puppet/volatile/tftpboot/precise-installer (clean up after gerrit:345549)
  • 00:25 twentyafterfour: Phabricator upgrade completed uneventfully, other than the undisputable fact that the new search functionality is awesome.
  • 00:21 mutante: added #wikimedia-traffic channel to stashbot config, test
  • 00:19 mutante: stopping and starting stashbot for config change - added #wikimedia-traffic channel
  • 00:19 twentyafterfour: updating phabricator, the service will be offline for just a few moments.
  • 00:08 twentyafterfour: preparing to update Phabricator to tag release/2017-04-05/1 #phab-2017-04-05

2017-04-05

  • 23:29 thcipriani@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ru.svg: SWAT: Update Russian Wikipedia logo T162036 (duration: 00m 40s)
  • 23:18 demon@tin: Synchronized wmf-config/CommonSettings.php: unbreak dashiki again (duration: 00m 40s)
  • 23:13 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy Page previews to stable on Hungrian and Hebrew Wikipedias T162162 (duration: 00m 40s)
  • 23:12 demon@tin: Synchronized php-1.29.0-wmf.19/extensions/Dashiki/: swattttttt (duration: 00m 41s)
  • 22:37 mobrovac: restbase deploying a8d4d027
  • 22:12 ppchelko@tin: Finished deploy [trending-edits/deploy@46544de]: Correctly calculate since parameter and allow to change decay for debugging, attempt 2 (duration: 07m 06s)
  • 22:05 ppchelko@tin: Started deploy [trending-edits/deploy@46544de]: Correctly calculate since parameter and allow to change decay for debugging, attempt 2
  • 22:04 ppchelko@tin: Finished deploy [trending-edits/deploy@46544de]: Correctly calculate since parameter and allow to change decay for debugging (duration: 02m 29s)
  • 22:02 ppchelko@tin: Started deploy [trending-edits/deploy@46544de]: Correctly calculate since parameter and allow to change decay for debugging
  • 21:58 demon@tin: Synchronized wmf-config/CommonSettings.php: bump video transcode timeouts, brion made me do it (duration: 00m 40s)
  • 20:53 ppchelko@tin: Finished deploy [trending-edits/deploy@475a5c0]: Fix edit scorer (duration: 05m 34s)
  • 20:47 ppchelko@tin: Started deploy [trending-edits/deploy@475a5c0]: Fix edit scorer
  • 20:44 ppchelko@tin: Finished deploy [trending-edits/deploy@475a5c0]: Fix edit scorer (duration: 02m 51s)
  • 20:41 ppchelko@tin: Started deploy [trending-edits/deploy@475a5c0]: Fix edit scorer
  • 20:27 arlolra: Updated Parsoid to 32b7c677 (T112043, T161936)
  • 20:18 arlolra@tin: Finished deploy [parsoid/deploy@f2d4eee]: Updating Parsoid to 32b7c677 (duration: 11m 26s)
  • 20:07 arlolra@tin: Started deploy [parsoid/deploy@f2d4eee]: Updating Parsoid to 32b7c677
  • 19:55 ppchelko@tin: Finished deploy [trending-edits/deploy@d8ca758]: Providing a debug endpoint (duration: 04m 59s)
  • 19:50 ppchelko@tin: Started deploy [trending-edits/deploy@d8ca758]: Providing a debug endpoint
  • 19:50 ppchelko@tin: Finished deploy [trending-edits/deploy@d8ca758]: Providing a debug endpoint (duration: 07m 56s)
  • 19:44 XioNoX: pushing https://www.irccloud.com/pastebin/Kecy61aZ/ to cr1/2.codfw for T162099
  • 19:43 awight: reenabled NL Fundraising campaigns
  • 19:42 ppchelko@tin: Started deploy [trending-edits/deploy@d8ca758]: Providing a debug endpoint
  • 19:38 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: roll back donatewiki to wmf.18
  • 19:37 awight: disabled NL campaigns per T162300
  • 19:12 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.19
  • 18:04 mutante: lvs2002 - power off via mgmt (it was down but still showed power as on)
  • 18:02 awight: rerunning paypal_audit
  • 16:57 moritzm: rearmed keyholder on mira after reboot
  • 15:20 elukey: playing with hhvm settings on mwdebug1002
  • 13:05 hashar@tin: Synchronized wmf-config/throttle.php: Add new throttle rule - T162089 (duration: 00m 40s)
  • 12:57 elukey: reimage analytics1035 (journal node) to Debian Jessie
  • 12:44 marostegui: Deploy schema change db2047 (s7) - T160390
  • 12:44 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2047 - T160390 (duration: 00m 41s)
  • 12:40 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2054 - T160390 (duration: 00m 44s)
  • 12:04 moritzm: upgrade remaining ca-certificates from jessie point update
  • 12:00 volans: re-enabled puppet on nitrogen/nihal/einsteinium, restarted ircecho
  • 11:42 volans: disabling ircecho for the merge of gerrit/346110 ( T159163 ) and postgres upgrade
  • 11:18 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 for maintenance (duration: 00m 40s)
  • 09:48 volans: deleted a third swift thumb that was making swiftrepl stuck in a loop: T162122
  • 09:11 elukey: reimage analytics1057 to Debian Jessie
  • 09:04 volans: deleted the 2 swift thumbs that were making swiftrepl stuck in a loop: T162122
  • 08:43 hoo: Ran scap pull on mwdebug1001 to revert local changes to Wikibase maintenance scripts
  • 08:15 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1034 after maintenance (duration: 00m 40s)
  • 07:44 marostegui: Migrate dbstore1002 enwiki.page and enwiki.categorylinks from TokuDB to InnoDB+compression - T159430
  • 06:56 marostegui: Stop replication on db1081 for maintenance - T161088
  • 06:55 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1081 - T161088 (duration: 00m 39s)
  • 06:55 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db1081 - T161088 (duration: 00m 39s)
  • 06:36 elukey: restart hhvm on mw1288 (hhvm-dump-debug in /tmp/hhvm.92520.bt.)
  • 06:33 elukey: restart hhvm on mw1223 (hhvm-dump-debug in /tmp/hhvm.2164.bt.)
  • 06:22 marostegui: Deploy schema change db2054 (s7) - https://phabricator.wikimedia.org/T160390
  • 06:22 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2054 - T160390 (duration: 00m 43s)
  • 06:07 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2061 - T160390 (duration: 00m 40s)
  • 03:03 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Apr 5 03:03:18 UTC 2017 (duration 5m 53s)
  • 02:57 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.19) (duration: 07m 22s)
  • 02:31 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 08m 47s)
  • 01:27 tstarling@tin: Synchronized php-1.29.0-wmf.18/extensions/ParserMigration/includes: scap test only, no code changes (duration: 00m 39s)
  • 01:26 tstarling@tin: Synchronized php-1.29.0-wmf.18/extensions/ParserMigration/includes: scap test only, no code changes (duration: 00m 40s)
  • 01:17 demon@tin: Synchronized scap/plugins/clean.py: fixes (duration: 00m 41s)
  • 00:57 demon@tin: Finished scap: wmf.14 again, testing testing (duration: 26m 48s)
  • 00:30 demon@tin: Started scap: wmf.14 again, testing testing
  • 00:29 tstarling@tin: Synchronized php-1.29.0-wmf.18/extensions/ParserMigration/includes: scap test only, no code changes (duration: 01m 21s)
  • 00:08 tstarling@tin: Synchronized php-1.29.0-wmf.18/extensions/ParserMigration/includes/MigrationEditPage.php: for bug fix gerrit 346478 (duration: 00m 56s)

2017-04-04

  • 23:55 tstarling@tin: Synchronized php-1.29.0-wmf.18/extensions/ParserMigration: (no justification provided) (duration: 00m 39s)
  • 23:50 reedy@tin: Synchronized php-1.29.0-wmf.18/extensions/Quiz: Revert "Start implementing Quiz generation using TemplateParser" (duration: 00m 42s)
  • 23:31 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Prepare for related pages config change (T160076) and set $wgOresFiltersThresholds on plwiki and ptwiki (duration: 00m 41s)
  • 23:29 jynus: unscheduled restart of dbstore1002 T162212
  • 23:19 demon@tin: Finished scap: re-syncing old wmf.14-16 branches...cleaned up a little too much (duration: 44m 32s)
  • 22:34 demon@tin: Started scap: re-syncing old wmf.14-16 branches...cleaned up a little too much
  • 22:01 mobrovac: SCB all services updated to use the new service-runner DNS caching
  • 22:00 mobrovac@tin: Finished deploy [trending-edits/deploy@5cc3969]: Bump service-runner to pick up new DNS caching (duration: 06m 40s)
  • 21:55 mobrovac@tin: Finished deploy [graphoid/deploy@5fc26cb]: Bump service-runner to pick up new DNS caching (duration: 02m 15s)
  • 21:54 mobrovac@tin: Started deploy [trending-edits/deploy@5cc3969]: Bump service-runner to pick up new DNS caching
  • 21:53 mobrovac@tin: Started deploy [graphoid/deploy@5fc26cb]: Bump service-runner to pick up new DNS caching
  • 21:52 mobrovac@tin: Finished deploy [mobileapps/deploy@b93488f]: Bump service-runner to pick up new DNS caching (duration: 02m 43s)
  • 21:49 mobrovac@tin: Started deploy [mobileapps/deploy@b93488f]: Bump service-runner to pick up new DNS caching
  • 21:48 mobrovac@tin: Finished deploy [cxserver/deploy@b4184d3]: Bump service-runner to pick up new DNS caching (duration: 03m 37s)
  • 21:45 mobrovac@tin: Started deploy [cxserver/deploy@b4184d3]: Bump service-runner to pick up new DNS caching
  • 21:44 mobrovac@tin: Finished deploy [mathoid/deploy@4eb6d9d]: Bump service-runner to pick up new DNS caching (duration: 03m 27s)
  • 21:40 mobrovac@tin: Started deploy [mathoid/deploy@4eb6d9d]: Bump service-runner to pick up new DNS caching
  • 21:36 mobrovac@tin: Finished deploy [eventstreams/deploy@cf892f4]: Bump service-runner to pick up new DNS caching (duration: 02m 04s)
  • 21:33 mobrovac@tin: Started deploy [eventstreams/deploy@cf892f4]: Bump service-runner to pick up new DNS caching
  • 21:29 mobrovac@tin: Finished deploy [citoid/deploy@7dbbac8]: Bump service-runner to pick up new DNS caching (duration: 03m 13s)
  • 21:27 awight: Finished migrating Fundraising jobs to process-controlb
  • 21:26 mobrovac@tin: Started deploy [citoid/deploy@7dbbac8]: Bump service-runner to pick up new DNS caching
  • 21:20 jynus: applying mariadb MDEV#7383 patch on db1034 T159319
  • 21:18 mutante: running puppet across labvirt10* to replace cert
  • 21:12 mutante: revoked old labvirt-star.eqiad.wmnet cert - created new csr, signed it (CA: wmf_ca_2014_2017). deploying new labvirt-star.eqiad valid for 720 days (T162085)
  • 20:48 catrope@tin: Synchronized php-1.29.0-wmf.19/extensions/Echo/: T162173 (duration: 00m 43s)
  • 20:00 paravoid: rolling out a border-in4 ACL update across core routers (T160055)
  • 19:17 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.19
  • 18:57 awight: enabled pilot process-control job: banner history queue consumer
  • 18:55 demon@tin: Synchronized php: symlink repoint (duration: 00m 39s)
  • 18:55 awight: disabled banner history queue consumer
  • 18:51 demon@tin: Finished scap: wmf.19 bootstrap (duration: 35m 16s)
  • 18:16 demon@tin: Started scap: wmf.19 bootstrap
  • 17:53 andrewbogott: disabling puppet on labvirts to roll out a nova config change
  • 17:40 volans: stopped ircecho to avoid IRC spam
  • 16:03 hoo: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds
  • 15:59 elukey: reimage analytics1052 (Hadoop Journal node) to Debian Jessie
  • 15:59 jynus: running ANALIZE on revision table for on eswiki,cawiki on db1034
  • 15:56 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1034 for maintenance (duration: 00m 44s)
  • 14:39 moritzm: rebooting praseodymium to Linux 4.9
  • 14:34 moritzm: rebooting xenon to Linux 4.9
  • 14:27 moritzm: rebooting cerium to Linux 4.9
  • 14:06 elukey: reimage analytics1039 and 1051 to Debian Jessie
  • 13:11 akosiaris: add LVS IPs to the url-downloader blacklist now that all nodejs services no longer require it anymore. See https://gerrit.wikimedia.org/r/207490
  • 13:09 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY Enable interwikisorting on BETA wiktionaries (duration: 00m 44s)
  • 13:05 moritzm: installing ca-certificates updates from jessie point update
  • 13:00 ema: cache_upload: ban all objects with content-type ~ "^text" T162035
  • 12:19 ema: upgrade cp2003 to linux 4.9 T162029
  • 11:58 moritzm: installing e2fsprogs update from jessie point update
  • 11:53 elukey: reimage analytics10[36,37,38] to Debian Jessie
  • 11:46 marostegui: Deploy schema change db2061 (s7) - T160390
  • 11:46 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2061 - T160390 (duration: 00m 44s)
  • 11:39 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2068 - T160390 (duration: 00m 58s)
  • 09:40 moritzm: rebooting wtp1001 to Linux 4.9
  • 09:10 volans: restarted swiftrepl (repl_all.sh loop) on ms-fe1005
  • 08:47 moritzm: rebooting mw1265 to Linux 4.9
  • 08:19 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1015 - T159319 (duration: 00m 45s)
  • 07:54 moritzm: rebooting bast2001 to Linux 4.9
  • 07:35 elukey: reimage analytics103[234] to Debian Jessie
  • 06:43 marostegui: Deploy alter table on db2019 (codfw s4 master) - this will generate lag on codfw for s4 - T161683
  • 06:35 marostegui: Deploy schema change db2068 (s7) - T160390
  • 06:34 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2068 - T160390 (duration: 00m 44s)
  • 06:27 marostegui: Deploy schema change db1015 (s3) - https://phabricator.wikimedia.org/T159319
  • 02:39 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Apr 4 02:39:47 UTC 2017 (duration 5m 28s)
  • 02:34 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 14m 27s)
  • 01:31 reedy@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Disable LoginNotify on wikis that have no Echo T158878 (duration: 00m 44s)
  • 00:45 mutante: install1002/2002: sudo -i reprepro --delete clearvanished to remove precise distro after merging gerrit:345550

2017-04-03

  • 23:54 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Deploy ParserMigration extension T141586 (for real) (duration: 00m 44s)
  • 23:41 thcipriani@tin: Finished scap: SWAT: Deploy ParserMigration extension T141586 (l10nupdate only) (duration: 22m 24s)
  • 23:19 thcipriani@tin: Started scap: SWAT: Deploy ParserMigration extension T141586 (l10nupdate only)
  • 23:10 thcipriani@tin: Synchronized wmf-config: SWAT: Test LoginNotify on Beta cluster T158878 (duration: 00m 46s)
  • 22:39 volans: completed restart of swift-proxies in eqiad, ms-fe1005 was missing due to swiftrepl stuck/running
  • 22:37 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1005.eqiad.wmnet
  • 22:35 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1005.eqiad.wmnet
  • 22:06 mutante: power cycling lvs2002, it was down and console showed nothing
  • 20:47 bsitzmann@tin: Finished deploy [mobileapps/deploy@20ab197]: Update mobileapps to fdd4e31 (duration: 03m 05s)
  • 20:44 bsitzmann@tin: Started deploy [mobileapps/deploy@20ab197]: Update mobileapps to fdd4e31
  • 19:21 hashar: Finished deployment of project-logos optimization for T161999 / https://gerrit.wikimedia.org/r/#/c/346057/ . And purged the related logos
  • 19:18 hashar@tin: Synchronized static/images/project-logos: Optimize a few project logos - T161999 (duration: 00m 44s)
  • 19:16 andrewbogott: in testlabs, deleted ou=projects,dc=wikimedia,dc=org and ou=roles,dc=wikimedia,dc=org as per T126758
  • 19:15 mutante: phabricator/ops: adding ayounsi to WMF-NDA (project 61) and acl*operations-team (project 29) (T162073)
  • 18:37 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Configure Babel for elwikisource (T161593) (duration: 00m 44s)
  • 18:29 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Convert reference lists to 'responsive' on hewiki (T161804) (duration: 00m 52s)
  • 17:02 gehel@tin: Finished deploy [wdqs/wdqs@d7c367a]: (no justification provided) (duration: 01m 29s)
  • 17:01 gehel@tin: Started deploy [wdqs/wdqs@d7c367a]: (no justification provided)
  • 15:43 hoo: Updated email for "Lucie Kaffee" on wikitech from work address (wikimedia.de) to known volunteer address (upon request)
  • 14:54 marostegui: Deploy alter table to unify revision table across all the s3 wikis on db1015 - T159319
  • 14:49 ema: cache_upload: ban all objects with content-type: text/html T162035
  • 14:29 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1015 - T159319 (duration: 00m 44s)
  • 14:26 ariel@tin: Finished deploy [dumps/dumps@905a845]: fix stub recombines, broken by too agressive 'cleanup' of local vars (duration: 00m 02s)
  • 14:26 ariel@tin: Started deploy [dumps/dumps@905a845]: fix stub recombines, broken by too agressive 'cleanup' of local vars
  • 14:23 cwd: restarted jenkins to stop ArrayIndexOutOfBoundsException error
  • 14:17 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1086 - T160390 (duration: 00m 51s)
  • 13:38 zfilipin@tin: Synchronized php-1.29.0-wmf.18/extensions/cldr/: SWAT: Translate Atikamekw language name in French (duration: 00m 51s)
  • 13:29 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Add NS100 (Portal) to ladwiki, Add rollback user group in fawikisource (duration: 00m 47s)
  • 13:27 hashar: terbium: scap pull for ladwiki namespace additions
  • 13:15 moritzm: upgrading restbase-dev* to Linux 4.9
  • 13:09 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: enwiki: Temporarily disable Wikidata descriptions (T161805) (duration: 00m 45s)
  • 12:37 elukey: reimage analytics10[29,30,31] to Debian Jessie
  • 12:28 ema: banning 200px-Status_iucn3.1_LC_cs.svg.png from esams frontends T162035
  • 11:49 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1006.eqiad.wmnet
  • 11:45 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1006.eqiad.wmnet
  • 11:35 joal@tin: Finished deploy [analytics/refinery@cc73c40]: (no justification provided) (duration: 07m 23s)
  • 11:31 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1008.eqiad.wmnet
  • 11:28 joal@tin: Started deploy [analytics/refinery@cc73c40]: (no justification provided)
  • 11:28 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1008.eqiad.wmnet
  • 11:21 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1007.eqiad.wmnet
  • 11:18 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1007.eqiad.wmnet
  • 11:08 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1006.eqiad.wmnet
  • 11:05 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1006.eqiad.wmnet
  • 11:04 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1005.eqiad.wmnet
  • 10:43 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1005.eqiad.wmnet
  • 10:38 volans: upgrading swift-proxy in eqiad to use discovery URLs
  • 08:46 marostegui: Deploy alter table db1086 (s7) on revision table to unify PK and indexes - T160390
  • 08:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1086 - T160390 (duration: 00m 44s)
  • 07:39 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
  • 07:25 marostegui: Deploy alter table dbstore2001 (s7) on revision table to unify PK and indexes - T160390
  • 07:25 _joe_: rebooting copper to clean up at least partially the docker mess
  • 07:14 moritzm: switched default kernel for jessie installations to Linux 4.9
  • 07:06 _joe_: removing stale files on copper for docker, all local images will be wiped away
  • 07:03 moritzm: instaling gnutls security updates on trusty
  • 06:51 marostegui: Deploy InnoDB compression on dewiki - db1070 - T150438
  • 06:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 to compress it - T153743 (duration: 00m 44s)
  • 06:40 _joe_: manually restarted replication for etcd
  • 06:25 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove db1057 entry - T160435 (duration: 00m 44s)
  • 06:21 marostegui@tin: Synchronized wmf-config/db-codfw.php: Remove db1057 entry - T160435 (duration: 00m 54s)
  • 06:12 marostegui: Remove partitions from metawiki.pagelinks (s7) on codfw master (db2029) this will generate lag on codfw - T153300
  • 05:59 marostegui: Resume pt-table-checksum on wikidata - T161294
  • 05:53 _joe_: powercycling mw2256, unresponsive to ping, blank console
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 09m 39s)

2017-04-02

  • 08:18 elukey: powercycle ms-be1016 (stuck in console, answers pings but not ssh)
  • 07:25 ariel@tin: Finished deploy [dumps/dumps@1ac3fb3]: var/method name cleanups, refactor, pregenerate page ranges for page content jobs, auto retry of failed page ranges (duration: 00m 03s)
  • 07:25 ariel@tin: Started deploy [dumps/dumps@1ac3fb3]: var/method name cleanups, refactor, pregenerate page ranges for page content jobs, auto retry of failed page ranges
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 09m 44s)

2017-04-01

  • 19:01 elukey: restart hhvm on mw1191 (dump debug in /tmp/hhvm.16619.bt.) - threads stuck in HPHP::Treadmill::getAgeOldestRequest
  • 02:37 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Apr 1 02:37:30 UTC 2017 (duration 5m 20s)
  • 02:32 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 13m 27s)

2017-03-31

  • 23:11 mutante: ruthenium: logrotate --force /etc/logrotate.d/parsoid (note this is existing file "parsoid" not new file "parsoid_testing") (T161920)
  • 20:18 elukey: stopping jobrunners on mw116[89] and restarting hhvm after https://gerrit.wikimedia.org/r/345881
  • 19:44 Reedy: Stop badge hacks from messing up the entire page on IE 11 on MonoBook T161869
  • 19:42 reedy@tin: Synchronized php-1.29.0-wmf.18/extensions/Echo: Stop badge hacks from messing up the entire page on IE 11 on MonoBook T161689 (duration: 00m 50s)
  • 19:16 mutante: ruthenium also deleting ancient "htmldumper" data, gwicke confirmed it's not needed anymore
  • 18:27 mutante: ruthenium mounting /dev/mapper/ruthenium--vg-tank into /srv/visualdiff/pngs | deleted "mysql" and "dumps" data that was on previously unmounted partition , subbu checked that wasn't needed anymore, we still need logrotate (T161920)
  • 18:14 mutante: ruthenium mounting /dev/mapper/ruthenium--vg-tank which wasnt used at all.. bam.. over 477GB of free space
  • 16:02 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,name=ms-fe2008.codfw.wmnet
  • 16:00 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,name=ms-fe2008.codfw.wmnet
  • 15:59 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,name=ms-fe2007.codfw.wmnet
  • 15:57 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,name=ms-fe2007.codfw.wmnet
  • 15:55 mobrovac@tin: Finished deploy [trending-edits/deploy@26b5eb4]: Config change: lower min_edits to 15 T160127 (duration: 06m 37s)
  • 15:55 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,name=ms-fe2006.codfw.wmnet
  • 15:52 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,name=ms-fe2006.codfw.wmnet
  • 15:49 mobrovac@tin: Started deploy [trending-edits/deploy@26b5eb4]: Config change: lower min_edits to 15 T160127
  • 15:44 volans@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,name=ms-fe2005.codfw.wmnet
  • 15:22 volans@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=imagescaler-rw,name=eqiad
  • 15:17 volans@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=imagescaler-rw,name=eqiad
  • 15:02 volans@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,name=ms-fe2005.codfw.wmnet
  • 15:01 oblivian@puppetmaster1001: conftool action : set/ttl=300; selector: dnsdisc=restbase-async
  • 15:01 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 14:58 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=eqiad
  • 14:56 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=restbase-async
  • 14:55 _joe_: reducing ttl on the restbase-async discovery record, then flipping eqiad to active
  • 14:55 volans: deploying the use of discovery URL to swift-proxy hosts in codfw T160178#3136906
  • 14:09 _joe_: performing a rolling restart of changeprop after puppet runs on scb*
  • 14:00 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
  • 13:23 elukey: restart hhvm on mw116[89] after https://gerrit.wikimedia.org/r/345829
  • 13:19 gehel: rolling restart of maps-test cluster for kernel upgrade
  • 13:09 moritzm: rebooting bromine to Linux 4.9
  • 12:10 moritzm: rebooting mwdebug* to Linux 4.9
  • 12:05 moritzm: rebooting pybal-test* to Linux 4.9
  • 11:22 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1066 after maintenance (duration: 00m 49s)
  • 10:47 akosiaris: uploaded jessie-wikimedia kubernetes_1.4.6-4 on apt.wikimedia.org/jessie-wikimedia
  • 09:59 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2244.codfw.wmnet
  • 09:56 elukey: set pooled=yes mw210[56789], mw2260 and mw2213 (and cleaned up old /srv/mediawiki dirs that were causing rsync spam in scap pull)
  • 09:52 marostegui: Adding rev_timestamp index to revision page db1066 (s1) - T132416
  • 09:49 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1066 for maintenance (duration: 00m 44s)
  • 09:47 elukey: restart hhvm on mw1197 - hhvm dump debug in /tmp/hhvm.14540.bt. - threads stuck in Treadmill::getAgeOldestRequest (HHVM 3.12)
  • 09:37 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1062 (duration: 00m 45s)
  • 09:35 godog: fix long-standing swift-account-server REPLICATE backtrace error on ms-be1022 - https://bugs.launchpad.net/swift/+bug/1424108
  • 09:21 godog: delete stray nginx error log with debug logging on thumbor1002
  • 08:28 moritzm: repooled mw1261 for more HHVM 3.18 debugging
  • 07:29 marostegui: Start pt-table-checksum on s5 wikidatawiki - T161294
  • 02:44 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Mar 31 02:44:11 UTC 2017 (duration 5m 30s)
  • 02:38 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 15m 54s)

2017-03-30

  • 23:58 aude@tin: Synchronized php-1.29.0-wmf.18/extensions/Wikidata: Fixes for special pages (duration: 02m 15s)
  • 23:07 catrope@tin: Synchronized php-1.29.0-wmf.18/extensions/ORES/modules/: T161706 (duration: 00m 51s)
  • 21:34 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 -> wmf.18
  • 21:16 awight: reenabled ingenico orphan rectifier (jenkins)
  • 21:08 awight: disable ingenico orphan rectifier (jenkins)
  • 21:07 demon@tin: Synchronized php-1.29.0-wmf.18/extensions/Echo/includes/model/Event.php: fix logging class reference (duration: 00m 47s)
  • 19:25 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.18
  • 18:59 MaxSem: Portals were not deployed: https://phabricator.wikimedia.org/T161832
  • 18:54 maxsem@tin: Synchronized portals/: (no justification provided) (duration: 00m 48s)
  • 18:45 maxsem@tin: Synchronized portals: (no justification provided) (duration: 00m 44s)
  • 18:45 maxsem@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 43s)
  • 18:38 maxsem@tin: Synchronized portals: (no justification provided) (duration: 00m 44s)
  • 18:29 maxsem@tin: Synchronized portals: (no justification provided) (duration: 00m 45s)
  • 18:28 maxsem@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 44s)
  • 18:24 maxsem@tin: Synchronized portals: (no justification provided) (duration: 00m 44s)
  • 18:24 godog: swift eqiad-prod add ms-be1028 -> ms-be1039 - T160640
  • 18:23 maxsem@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 44s)
  • 18:18 maxsem@tin: Synchronized portals: (no justification provided) (duration: 00m 44s)
  • 18:17 maxsem@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 45s)
  • 17:32 elukey: shutdown analytics1039 to apply new thermal paste - T132256
  • 16:17 godog: upgrade thumbor to 0.1.37 on thumbor100[12]
  • 16:03 _joe_: restarting hhvm on mw1191, stuck in HPHP::Treadmill::getAgeOldestRequest
  • 15:59 twentyafterfour@tin: Synchronized php-1.29.0-wmf.17/includes/: sync I7c5c0a refs T159319 (duration: 01m 41s)
  • 14:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove db1057 entry from s1 shard - T160435 (duration: 00m 44s)
  • 14:38 godog: run stress test (w/ bonnie) on new swift hw - T160640
  • 14:33 andrewbogott: upgrading nova-compute to 12.0.6 on all labvirts
  • 14:33 moritzm: rebooting restbase2001 to Linux 4.9
  • 14:30 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1090 - T17441 (duration: 00m 45s)
  • 14:22 kaldari@tin: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 48s)
  • 14:21 kaldari: sync InitialiseSettings.php to enable cookie blocking on English Wikipedia
  • 14:06 oblivian@tin: Synchronized wmf-config/ProductionServices.php: switch to discovery for cxserver,eventbus (duration: 00m 43s)
  • 14:01 oblivian@tin: Synchronized wmf-config/ProductionServices.php: switch to discovery for some records (duration: 00m 47s)
  • 13:46 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Allow eliminators and autoreviewers to move a file on ptwiki (T161532) Assign move-categorypages to sysops&bots only on nlwiki (T161551) Enable Multimedia Viewer at officewiki (T160420) (duration: 00m 44s)
  • 13:19 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: [cleanup] Remove expired rules (T161530) (duration: 00m 45s)
  • 12:49 moritzm: rebooting bast4001 for kernel update to 4.9
  • 12:18 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventbus,name=codfw
  • 12:15 hoo: Updated the Constraints table on Wikidata, per T160506.
  • 12:02 moritzm: installing glibc security updates on trusty
  • 11:55 moritzm: installing jbig2dec security updates
  • 10:03 moritzm: repooling mw1261 for additional test
  • 09:48 root@tin: Synchronized wmf-config/db-eqiad.php: Uniform maintenance message and indentation (duration: 00m 47s)
  • 09:34 root@tin: Synchronized wmf-config/db-codfw.php: Uniform maintenance message and indentation (duration: 00m 44s)
  • 09:06 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
  • 09:05 elukey: depooling mw1261 (hhvm-dump-debug in /tmp/hhvm.98736.bt.)
  • 08:38 moritzm: repooling mw1261 to reproduce hhvm deadlock with higher debug level
  • 08:13 marostegui: Convert UNIQUE keys to PK on db1090 (s2) - T17441
  • 08:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1090 - T17441 (duration: 00m 44s)
  • 07:43 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic2021.codfw.wmnet
  • 07:41 gehel: pull elastic2021 back into active duty - T149006
  • 07:05 ema: upgrading twisted to 16.2.0 on lvs100[123] (eqiad primaries) T160433
  • 06:45 moritzm: installing apparmor security updates on trusty
  • 06:25 marostegui: Logging backwards for the record: restart mysql on db1047 for maintenance - T160454
  • 05:56 marostegui: Deploy schema change on db2014 - codfw master (this will generate lag on codfw) - T73563
  • 05:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1094 - T17441 (duration: 00m 45s)
  • 05:51 ema: upgrading twisted to 16.2.0 on lvs100[456] (eqiad secondaries) T160433
  • 03:03 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Mar 30 03:03:31 UTC 2017 (duration 5m 49s)
  • 02:57 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 13m 43s)
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.17) (duration: 07m 20s)
  • 01:52 twentyafterfour: phd fixed on iridium. libphutil was out of sync with phd source
  • 01:11 twentyafterfour: running `puppet agent --test` on iridium
  • 00:10 twentyafterfour: Phabricator update completed.
  • 00:10 mutante: ruthenium low on disk space, because /srv/visualdiff/pngs (parsoid-vd-tests) is pretty large and /srv isn't a separate mount
  • 00:06 twentyafterfour: updating phabricator on iridium
  • 00:04 mutante: ruthenium - apt-get clean gets a little more disk space

2017-03-29

  • 23:46 reedy@tin: Synchronized php-1.29.0-wmf.18/extensions/Quiz: Fix undefined variable stateObject T161735 (duration: 00m 49s)
  • 23:43 reedy@tin: Synchronized wmf-config/CommonSettings.php: Dont use EP_NS in CommonSettings (duration: 00m 44s)
  • 21:23 krinkle@tin: Synchronized errorpages/: I15295835a1a (duration: 00m 44s)
  • 20:56 thcipriani@tin: Synchronized php-1.29.0-wmf.18/extensions/ProofreadPage/includes/page/ProofreadPagePage.php: Makes sure to always return a Title in ProofreadPagePage::findIndexTitle T161734 (duration: 00m 46s)
  • 20:35 halfak@tin: Finished deploy [ores/deploy@554ea12]: T160638 (duration: 18m 40s)
  • 20:31 arlolra: Updated Parsoid to b1b27146 (T161558, T160207, T153798)
  • 20:21 arlolra@tin: Finished deploy [parsoid/deploy@bc798dc]: Updating Parsoid to b1b27146 (duration: 07m 26s)
  • 20:16 halfak@tin: Started deploy [ores/deploy@554ea12]: T160638
  • 20:13 arlolra@tin: Started deploy [parsoid/deploy@bc798dc]: Updating Parsoid to b1b27146
  • 20:09 ppchelko@tin: Finished deploy [changeprop/deploy@ef62908]: Fix metrics for regex topics (duration: 00m 56s)
  • 20:08 ppchelko@tin: Started deploy [changeprop/deploy@ef62908]: Fix metrics for regex topics
  • 19:46 ppchelko@tin: Finished deploy [changeprop/deploy@1150cf5]: Config: Enabling regex-based topic subscription (duration: 01m 45s)
  • 19:44 ppchelko@tin: Started deploy [changeprop/deploy@1150cf5]: Config: Enabling regex-based topic subscription
  • 19:16 awight: re-run today's ingenico audit job
  • 19:15 awight: pick at paypal scab: re-run audit parser
  • 19:10 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 back to 1.29.0-wmf.17
  • 19:05 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.18
  • 18:45 bblack: varnish active/active deploy done ( https://gerrit.wikimedia.org/r/#/c/339667/ ) - all caches running the new code, puppet re-enabled, etc.
  • 18:43 hoo: Started a Wikidata TTL dump run on snapshot1007 using Zend (due to T161695).
  • 18:22 catrope@tin: Synchronized php-1.29.0-wmf.18/includes/page/WikiPage.php: T159319 (duration: 00m 44s)
  • 18:22 catrope@tin: Synchronized php-1.29.0-wmf.18/includes/Title.php: T159319 (duration: 00m 46s)
  • 18:10 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Save->Publihs on Wikipedias except dewiki and enwiki (T131132); set wgOOUIEditPage false everywhere (duration: 00m 57s)
  • 17:58 awight: disabling PayPal audit parser
  • 17:56 ppchelko@tin: Finished deploy [changeprop/deploy@e4547cd]: Support regexed topics (duration: 00m 55s)
  • 17:55 ppchelko@tin: Started deploy [changeprop/deploy@e4547cd]: Support regexed topics
  • 17:31 godog: remove ge-3/0/27 from interface-range labs-instance-ports (now for ms-be1031)
  • 17:25 bblack: puppet disabled on all cp* ahead of careful deploy for https://gerrit.wikimedia.org/r/#/c/339667/
  • 17:12 mutante: removing parsoid-tests.wikimedia.org from DNS - replaced by more specific parsoid-rt-tests and parsoid-vd-tests
  • 17:12 nuria@tin: Finished deploy [eventlogging/analytics@2874077]: (no justification provided) (duration: 00m 03s)
  • 17:12 nuria@tin: Started deploy [eventlogging/analytics@2874077]: (no justification provided)
  • 17:11 elukey: restarting nginx on eqiad appservers to pick up the new certs
  • 16:55 marostegui: Stop eventlog syncs to db1047 and dbstore1002 for maintenance - T160454
  • 16:53 marostegui: Disable puppet on db1047 and dbstore1002 for maintenance - T160454
  • 16:51 elukey: upgrading ssl cert appservers.svc.eqiad.wmnet to include the new discovery endpoints
  • 16:51 _joe_: actually performing the parsoid rolling restart in codfw
  • 16:31 _joe_: rolling restart of parsoid in codfw
  • 14:32 moritzm: installing apparmor security updates on trusty
  • 14:31 elukey: upgrading ssl cert api.svc.eqiad.wmnet to include the new discovery endpoints
  • 14:14 andrewbogott: disabling puppet on labs hosts for a staged rollout of https://gerrit.wikimedia.org/r/#/c/345275/
  • 14:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1091 - T17441 (duration: 00m 44s)
  • 13:49 elukey: upgrading ssl cert rendering.svc.eqiad.wmnet to include the new discovery endpoints
  • 13:08 reedy@tin: Synchronized wmf-config/CommonSettings.php: use wfLoadExtension for VisualEditor (duration: 00m 44s)
  • 12:53 elukey: reimage analytics1045 to Debian Jessie
  • 12:52 _joe_: depooling wtp1001 to test puppet/confd transfer of responsibilities
  • 11:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1094 - T17441 (duration: 00m 44s)
  • 11:30 hoo: Started a Wikidata JSON dump run on snapshot1007 using Zend (due to T161695).
  • 11:16 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1094 - T17441 (duration: 00m 44s)
  • 11:03 elukey: upgrading ssl cert appservers.svc.codfw.wmnet to include the new discovery endpoints
  • 11:01 moritzm: Linux 4.9 uploaded for jessie-wikimedia (along with new meta package linux-meta-4.9 and updated firmware)
  • 11:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1093 - T17441 (duration: 00m 44s)
  • 10:27 godog: reimage netmon1001 with jessie
  • 10:12 ema: emptying /srv/log/parsoid/main.log.1 (3.2G!) on ruthenium to reclaim some disk space
  • 10:11 elukey: upgrading ssl cert api.svc.codfw.wmnet to include the new discovery endpoints
  • 09:39 ema: upgrading twisted to 16.2.0 on lvs200[123] (codfw primaries) T160433
  • 08:54 ema: upgrading twisted to 16.2.0 on lvs200[456] (codfw secondaries) T160433
  • 08:39 ema: apt.w.o: set digest-algo to sha256 in gpg.conf T132325
  • 08:29 elukey: upgrading ssl cert rendering.svc.codfw.wmnet to include the new discovery endpoints
  • 07:57 marostegui: Convert s6 UNIQUE keys into PK on db1093 - T17441
  • 07:55 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1093 - T17441 (duration: 00m 54s)
  • 06:01 marostegui: Keep converting UNIQUE keys to PK on s4 - db1091 - T17441
  • 03:15 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Mar 29 03:15:16 UTC 2017 (duration 5m 53s)
  • 03:09 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.18) (duration: 14m 55s)
  • 02:35 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.17) (duration: 13m 41s)
  • 01:41 krinkle@tin: Synchronized errorpages/404.php: Match 404.html and default.html - Id58e25afbe (duration: 00m 44s)
  • 01:16 mutante: rsyncing librenms/torrus/smokeping app data from netmon1001 to gerrit2001. adding alias "syncit" to do it all at once (T125020)
  • 00:57 paravoid: Removing upload.wikimedia.org/index.html ("swift delete root index.html") from both eqiad/codfw

2017-03-28

  • 23:22 thcipriani@tin: Synchronized php-1.29.0-wmf.17/extensions/NavigationTiming/modules/ext.navigationTiming.js: SWAT: ext.NavigationTiming: Restore unsampled Save Timing T161368 (duration: 00m 45s)
  • 23:16 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable header version 2 on all wikis T160471 (duration: 00m 45s)
  • {{safesubst:SAL entry|1=22:45 urandom: T111113: Restarting Cassandra instances, eqiad row 'd' {{done]}}}
  • 22:21 mutante: DNS - creating new language "dty" (T161529) - running "authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones" to trigger re-creation of zone files after change in langs.tmpl. (gerrit:345077) | https://www.ethnologue.com/language/dty
  • 22:19 mutante: DNS - creating new language "dty" (T160865) - running "authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones" to trigger re-creation of zone files after change in langs.tmpl. (gerrit:345077) | https://www.ethnologue.com/language/dty
  • 21:55 urandom: T111113: Restarting Cassandra instances, eqiad row 'd'
  • 21:55 urandom: T111113: Restarting Cassandra instances, eqiad row 'b' Yes check.svg Done
  • 21:18 andrewbogott: upgraded nova-compute on labvirt1014 because it contains a long-awaited bugfix
  • 21:08 urandom: T111113: Restarting Cassandra instances, eqiad row 'b'
  • 21:08 urandom: T111113: Restarting Cassandra instances, eqiad row 'a' Yes check.svg Done
  • 20:24 mutante: ms-fe1001 thru msfe1004 - scheduled last downtime for host and services in icinga - shutdown -h now, turn them off, revoke puppet certs, salt-keys... (T160986)
  • 20:22 mutante: mc1019 - puppet fail due to Failed resource /etc/redis/replica since 4 days
  • 20:21 urandom: T111113: Restarting Cassandra instances, eqiad row 'a'
  • 20:21 mutante: copper - puppet errors due to Failed resource /var/lib/docker/devicemapper ??
  • 20:19 mutante: mwdebug1002 - same, was low on disk space, 'apt-get clean' freed > 3GB
  • 20:18 mutante: mwdebug1001 - was low on disk space, 'apt-get clean' - freed about 4GB
  • 20:15 mutante: mw1261 - depooled
  • 20:14 mutante: mw1261 runs with HHVM 3.18 - which seems to have a bug leading to a deadlock every 4-5 hours
  • 20:14 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.29.0-wmf.18
  • 20:13 mutante: mw1261 HHVM crash as predicted by Moritz - ran sudo hhvm-dump-debug. Backtrace saved as /tmp/hhvm.79460.bt.
  • 20:06 mutante: ms-fe100[1-4] - disable/stop puppet, stop salt minion, decom (T160986)
  • 19:57 thcipriani@tin: Finished scap: testwiki to php-1.29.0-wmf.18 and rebuild l10n cache (duration: 40m 19s)
  • 19:37 mobrovac: restbase deploying d477f495
  • 19:33 urandom: T111113: Restarting Cassandra instances, codfw row 'd' Yes check.svg Done
  • 19:17 thcipriani@tin: Started scap: testwiki to php-1.29.0-wmf.18 and rebuild l10n cache
  • 18:45 urandom: T111113: Restarting Cassandra instances, codfw row 'd'
  • 18:44 urandom: T111113: Restarting Cassandra instances, codfw row 'c' Yes check.svg Done
  • 18:18 ppchelko@tin: Finished deploy [changeprop/deploy@1689d86]: Rename event field in logs (duration: 00m 52s)
  • 18:18 ppchelko@tin: Started deploy [changeprop/deploy@1689d86]: Rename event field in logs
  • 17:53 urandom: T111113: Restarting Cassandra instances, codfw row 'c'
  • 17:22 thcipriani: starting branch cut for 1.29.0-wmf.18
  • 17:07 godog: swift codfw-prod: bump ms-be2028 ms-be2039 object weight to 3000 - T158337
  • 17:06 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic2021.codfw.wmnet
  • 16:39 urandom: T111113: Restarting remaining Cassandra instances, rack 'b', codfw (restbase20{02,07,10})
  • 16:19 urandom: T111113: Restarting Cassandra on restbase2001 to apply mandatory client encryption (canary)
  • 15:56 gehel: banning elastic2021 to run same tests as elastic2020 - T149006
  • 14:41 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2256.codfw.wmnet
  • 14:40 marostegui: Convert UNIQUE keys into PK on db1091 (commonswiki) - T17441
  • 14:38 ppchelko@tin: Finished deploy [changeprop/deploy@bfbaa17]: Increase log level for processinng failures (duration: 01m 07s)
  • 14:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1091 - T17441 (duration: 00m 43s)
  • 14:38 elukey: ran restart-hhvm on mw1242, hhvm threads stuck (dump debug in /tmp/hhvm.9008.bt.) - HHVM 3.12
  • 14:37 ppchelko@tin: Started deploy [changeprop/deploy@bfbaa17]: Increase log level for processinng failures
  • 13:54 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1092 - T17441 (duration: 00m 43s)
  • 13:44 elukey: started hhvm on mw1261 (still depooled) - no hhvm process running
  • 13:29 RoanKattouw: Ran initUserPreference.php -s ores-enabled -t rcenhancedfilters and -s ores-enabled -t oresHighlight on plwiki and ptwiki
  • 13:22 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable RCFilters beta feature on plwiki and ptwiki T158336 (duration: 00m 43s)
  • 12:58 moritzm: depooled mw1261
  • 10:39 ema: upgrading twisted to 16.2.0 on lvs3001 and lvs3002 (esams primaries) T160433
  • 10:36 ema: upgrading twisted to 16.2.0 on lvs3003 and lvs3004 (esams secondaries) T160433
  • 10:27 marostegui: Convert dewiki UNIQUE keys into PK on db1092 - https://phabricator.wikimedia.org/T17441
  • 10:15 elukey: Switching hue.w.o's backend (cache misc) from anaytics1027 to thorium - T159527
  • 10:10 moritzm: upgraded mw1262 to HHVM 3.18
  • 08:48 marostegui: Convert wikidatawiki UNIQUE keys into PK on db1092 - T17441
  • 08:48 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1092 - T17441 (duration: 00m 44s)
  • 08:29 akosiaris: enable IGMP snooping on all VLANs on asw2-d-eqiad. T133387
  • 07:19 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 - T17441 (duration: 00m 43s)
  • 07:18 moritzm: installing eject security updates on trusty hosts
  • 06:11 marostegui: Keep converting unique keys into PK on db1089 - T17441
  • 06:01 marostegui: Deploy schema change on s2.enwiktionary.templatelinks - on codfw master, this will generate lag on codfw slaves (which have been silenced) - T154097
  • 05:52 marostegui: Run pt-table-checksum on es2 - T161510
  • 02:39 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Mar 28 02:39:53 UTC 2017 (duration 5m 28s)
  • 02:34 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.17) (duration: 12m 37s)
  • 00:36 reedy@tin: Synchronized private: Remove mwblocker.log (duration: 00m 44s)
  • 00:34 reedy@tin: Synchronized wmf-config/CommonSettings.php: Remove $wgProxyList (duration: 00m 43s)

2017-03-27

  • 23:06 ebernhardson@tin: Synchronized php-1.29.0-wmf.17/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT T160006, turning off cirrussearch AB test for sistersearch (duration: 00m 44s)
  • 22:00 bawolff: deployed patch T151735
  • 21:27 andrewbogott: disabling puppet on labvirt* and labcontrol* to stagger roll out of https://gerrit.wikimedia.org/r/#/c/344689/
  • 20:29 arlolra: Updated Parsoid to 6eaad376 (T160599, T161178, T133267)
  • 20:21 arlolra@tin: Finished deploy [parsoid/deploy@371ba4f]: Updating Parsoid to 6eaad376 (duration: 07m 06s)
  • 20:14 arlolra@tin: Started deploy [parsoid/deploy@371ba4f]: Updating Parsoid to 6eaad376
  • 19:57 mutante: ruthenium/varnish misc - remove parsoid-tests.wikimedia.org server_name / backend - replaced by parsoid-rt-test and parsoid-vd-tests
  • 19:56 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Linter: whitelist parsoid canaries too - https://gerrit.wikimedia.org/r/#/c/344998/ - T160573 (duration: 00m 44s)
  • 18:08 mobrovac@tin: Finished deploy [mobileapps/deploy@92f693c]: Remove the proxy from the config, deploying to scb2004 (duration: 00m 43s)
  • 18:07 mobrovac@tin: Started deploy [mobileapps/deploy@92f693c]: Remove the proxy from the config, deploying to scb2004
  • 18:05 mobrovac@tin: Finished deploy [mobileapps/deploy@92f693c]: Remove the proxy from the config (duration: 03m 29s)
  • 18:02 mobrovac@tin: Started deploy [mobileapps/deploy@92f693c]: Remove the proxy from the config
  • 17:19 mutante: tin/mira: welcome new mediawiki deployer 'musikanimal' (T161181)
  • 17:03 gehel@tin: Finished deploy [wdqs/wdqs@d07586c]: (no justification provided) (duration: 01m 26s)
  • 17:02 gehel@tin: Started deploy [wdqs/wdqs@d07586c]: (no justification provided)
  • 16:40 mobrovac: restbase deploying f53bec41
  • 16:34 _joe_: cleaned the bc cache on mw1261, restarted hhvm and repooled
  • 15:46 mobrovac@tin: Finished deploy [mobileapps/deploy@aed916b]: Add discovery.wmnet to no_proxy_list (duration: 04m 05s)
  • 15:42 mobrovac@tin: Started deploy [mobileapps/deploy@aed916b]: Add discovery.wmnet to no_proxy_list
  • 15:38 mobrovac@tin: Finished deploy [cxserver/deploy@40e86ad]: Add discovery.wmnet to no_proxy_list (duration: 02m 39s)
  • 15:35 mobrovac@tin: Started deploy [cxserver/deploy@40e86ad]: Add discovery.wmnet to no_proxy_list
  • 14:33 dcausse: rebuilding ttmserver index in elastic@codfw from wasat
  • 14:14 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Add autopatrolled group to svwiki (T161210) (duration: 00m 50s)
  • 13:40 dereckson@tin: Synchronized wmf-config/CommonSettings.php: no-op, to force resync (duration: 00m 43s)
  • 13:29 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Fix wgLogoHD 2.5x key (T161416) (duration: 00m 43s)
  • 13:26 dereckson@tin: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] enable more accurate regex timeout (T161095) (duration: 00m 44s)
  • 13:22 moritzm: repooled mw1261 (now that fix for lcfirst() issue from T161095 is deployed)
  • 13:20 dereckson@tin: Synchronized static/images/project-logos/: Add khw.wikipedia logos to static resources (T160865) (duration: 00m 43s)
  • 13:19 dereckson@tin: Synchronized wmf-config/: [es5 upgrade] step 5: restore normal operations (T157479, 2/2) (duration: 00m 49s)
  • 13:18 dereckson@tin: Synchronized tests/cirrusTest.php: [es5 upgrade] step 5: restore normal operations (T157479, 1/2) (duration: 00m 48s)
  • 13:08 dereckson@tin: Synchronized wmf-config/CirrusSearch-common.php: Updates and typo fixes to CirrusSearch-common.php (gerrit:344933) (duration: 00m 43s)
  • 12:47 marostegui: Run pt-table-checksum for a couple of hundred small wikis in es2 - T161510
  • 12:44 jynus: deploying semi-sync replication to all hosts on codfw T161007
  • 12:21 ladsgroup@tin: Synchronized php-1.29.0-wmf.17/extensions/Wikidata/vendor/composer/installed.json: Third try for Update Wikidata - fix term validation (T161263) Part III (duration: 00m 43s)
  • 12:19 ladsgroup@tin: Synchronized php-1.29.0-wmf.17/extensions/Wikidata/extensions/Wikibase/: Third try for Update Wikidata - fix term validation (T161263) Part II (duration: 01m 32s)
  • 12:19 godog: upgrade grafana to 4.2.0 on krypton T161193
  • 12:17 ladsgroup@tin: Synchronized php-1.29.0-wmf.17/extensions/Wikidata/composer.lock: Third try for Update Wikidata - fix term validation (T161263) Part I (duration: 00m 44s)
  • 12:16 Amir1: start of ladsgroup@tin:/srv/mediawiki-staging$ scap sync-file php-1.29.0-wmf.17/extensions/Wikidata/composer.lock 'Third try for Update Wikidata - fix term validation (T161263) Part I'
  • 12:02 _joe_: experimenting with cxserver config on scb2004
  • 12:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1053 T160415 - T73563 (duration: 01m 07s)
  • 11:51 marostegui: Deploy new index on db1040, s4 primary master table: commonswiki.image - T160415
  • 11:14 akosiaris: upgraded bacula-sd to 7.4.3+dfsg-1+sid1~bpo8+1 on heze as well
  • 11:03 akosiaris: performed bacula schema change on db1016 for database bacula
  • 11:00 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=ores
  • 10:59 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=.*-ro
  • 10:54 akosiaris: upgrade bacula director and storage daemon to 7.4.3
  • 10:47 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(kartotherian|search)
  • 10:20 hashar: Restarting Jenkins to drop the Throttle Concurrent Builds plugin - T158596
  • 10:16 ladsgroup@tin: Synchronized php-1.29.0-wmf.17/extensions/Wikidata: Second try for Update Wikidata - fix term validation (T161263) (duration: 02m 05s)
  • 10:15 Amir1: start of ladsgroup@tin:/srv/mediawiki-staging/php-1.29.0-wmf.17$ scap sync-dir php-1.29.0-wmf.17/extensions/Wikidata "Second try for Update Wikidata - fix term validation (T161263)"
  • 09:56 _joe_: rolling restart of restbase in codfw to pick up the new parsoid config
  • 09:54 ladsgroup@tin: Synchronized php-1.29.0-wmf.17/extensions/Wikidata: Update Wikidata - fix term validation (T161263) (duration: 02m 22s)
  • 09:53 mforns@tin: Finished deploy [analytics/aqs/deploy@a5e1775]: (no justification provided) (duration: 01m 41s)
  • 09:52 Amir1: start of ladsgroup@tin:/srv/mediawiki-staging/php-1.29.0-wmf.17$ scap sync-dir php-1.29.0-wmf.17/extensions/Wikidata "Update Wikidata - fix term validation (T161263)"
  • 09:52 mforns@tin: Started deploy [analytics/aqs/deploy@a5e1775]: (no justification provided)
  • 09:35 mforns@tin: Finished deploy [analytics/aqs/deploy@80a9de4]: (no justification provided) (duration: 01m 49s)
  • 09:33 mforns@tin: Started deploy [analytics/aqs/deploy@80a9de4]: (no justification provided)
  • 08:42 jynus: deploying semisync replication to all hosts (eqiad and codfw) on s6 T161007
  • 08:38 hashar@tin: Synchronized php-1.29.0-wmf.17/languages/classes/LanguageKk.php: Check for string initialization in lcfirst() for HHVM 3.18 - T161095 (duration: 00m 52s)
  • 08:16 marostegui: Deploy alter tables on db1089 (depooled) for a bunch of tables to convert UNIQUE keys into PK for testing - T17441
  • 08:01 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool es2014 after maintenance (duration: 00m 43s)
  • 07:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T17441 (duration: 00m 45s)
  • 07:17 elukey@puppetmaster1001: conftool action : set/pooled=active; selector: name=mw2256.codfw.wmnet
  • 06:26 marostegui: Deploy alter table s4 (commonswiki) db1053 - https://phabricator.wikimedia.org/T73563 https://phabricator.wikimedia.org/T160415
  • 06:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1053 T160415 - T73563 (duration: 00m 56s)
  • 06:18 _joe_: disabling puppet on authdns while merging a dns change
  • 06:06 marostegui: Resume pt-table-checksum on dewiki - T161294
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Mar 27 02:26:53 UTC 2017 (duration 5m 25s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.17) (duration: 08m 22s)

2017-03-26

  • 10:06 _joe_: restarting apache2 on puppetmaster2002, passenger probably stuck
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Mar 26 02:26:05 UTC 2017 (duration 5m 25s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.17) (duration: 07m 20s)

2017-03-25

  • 22:28 legoktm@tin: Synchronized wmf-config/: No-op labs only changes https://gerrit.wikimedia.org/r/#/c/344788/ (duration: 00m 52s)
  • 20:08 Krinkle: Ran mwscript deleteEqualMessages.php on public wikis (T45917) - deleted 5 pages across 5 wikis
  • 13:43 dereckson@tin: Synchronized wmf-config/throttle.php: Add throttle rule for BordeauxJS (T161402) (duration: 00m 50s)
  • 03:15 Krinkle: Re-create optimised indexes for xhgui in mongodb on tungsten per https://github.com/perftools/xhgui/tree/v0.7.0#installation (lost after T161196)
  • 02:36 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Mar 25 02:36:46 UTC 2017 (duration 5m 27s)
  • 02:31 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.17) (duration: 12m 22s)

2017-03-24

  • 20:03 krinkle@tin: Synchronized php-1.29.0-wmf.17/StartProfiler.php: touch - T161286 - (symlink) (duration: 00m 42s)
  • 19:55 krinkle@tin: Synchronized wmf-config/StartProfiler.php: T161286 - include hostname (duration: 00m 49s)
  • 19:33 krinkle@tin: Synchronized wmf-config/StartProfiler.php: touch - T161286 - hhvm cache maybe? (duration: 00m 43s)
  • 18:10 ejegg: updated CiviCRM from d3c439f to b6c8f3e
  • 17:50 ebernhardson: restart elasticsearch on relforge100[12] to test reindex api over https
  • 15:27 jynus: running unscheduled ALTER TABLE on arbcom_cswiki.archive T104756
  • 13:47 moritzm: installing freetype security updates on trusty
  • 12:30 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 T160415 - T73563 (duration: 00m 44s)
  • 12:13 marostegui: Start first run of pt-table-checksum on s5 (dewiki) - T161294
  • 11:18 godog: upgrade grafana to 4.2.0 on labmon1001 - T161193
  • 09:39 godog: pool prometheus100[34] - T148408
  • 08:23 marostegui: Deploy schema change s4 db2019 (codfw master) - T160415
  • 08:01 ema: upgrading twisted to 16.2.0 on lvs4001 and lvs4002 (ulsfo primaries) T160433
  • 07:49 marostegui: Deploy schema change s4 on db1069 and db1056 - T160415 - T73563
  • 07:48 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1056, repool db1059 T160415 - T73563 (duration: 00m 43s)
  • 07:42 moritzm: installing git updates on trusty
  • 07:35 dcausse: cirrus: refresh comp suggest indices in elastic@codfw
  • 07:26 ema: upgrading twisted to 16.2.0 on lvs4003 and lvs4004 (ulsfo secondaries) T160433
  • 07:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore original weight for db1070, db1071 and db1082 - T137191 (duration: 00m 43s)
  • 06:10 Krinkle: Removing xhgui.results entries before 1-Dec-2016 finished. Running xhgui->command(compact=>results) now. T161196
  • 02:31 Krinkle: Reverted patch - https://gerrit.wikimedia.org/r/#/c/344569/
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.17) (duration: 08m 35s)
  • 02:28 Krinkle: Reminder to incident doc writer: Was difficult figuring out what the last "real" patch was, scap message for SAL is manually written (not says which commit in which repo), and git log contains noise from security patches. We need simple revert options from the flat git tree at /srv/mediawiki
  • 02:26 Krinkle: Reminder to incident doc writer: Logstash was (and is) not responsive serving Kibana-rendered errors about logstash Service unavailable
  • 02:25 Krinkle: All apaches are back up
  • 02:24 krinkle@tin: Synchronized php-1.29.0-wmf.17/extensions/Wikidata: revert (duration: 02m 34s)
  • 02:24 MaxSem: Killed l10nupdate on tin, was blocking emergency pushes
  • 02:22 Krinkle: Hard-killed all l10nupdate processes and rm'ed scap lock
  • 02:11 Krinkle: Removing xhgui.results entries from before 1 December 2016 in MongoDB on tungsten (T161196)
  • 01:45 mutante: bacula - on helium, attempt to start bacula-director process, attempt to fix permissions on key files as codified in director.pp
  • 01:40 catrope@tin: Finished scap: Wikidata cherry-picks (with i18n) (duration: 25m 03s)
  • 01:15 catrope@tin: Started scap: Wikidata cherry-picks (with i18n)

2017-03-23

  • 23:26 Krinkle: Removing xhgui.results entries from before 1 June 2016 (T161196)
  • 23:12 thcipriani@tin: Synchronized php-1.29.0-wmf.17/extensions/ORES: SWAT: Stats: Invert "false" thresholds so they are correct T161250 (duration: 00m 52s)
  • 23:05 Pchelolo: update RESTBase to 2536b25c7 - eqiad
  • 22:56 Pchelolo: update RESTBase to 2536b25c7 - staging
  • 22:39 Pchelolo: update RESTBase to 2536b25c7 - codfw
  • 21:36 krinkle@tin: Synchronized wmf-config/StartProfiler.php: (no justification provided) (duration: 00m 53s)
  • 21:05 ejegg: rolled back payments-wiki to 9622a4b
  • 21:00 ejegg: updated payments from 9622a4b to bb956bf
  • 20:16 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.29.0-wmf.17
  • 19:34 thcipriani@tin: Synchronized php: Swap symlink for 1.29.0-wmf.17 (duration: 00m 43s)
  • 19:11 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.17
  • 18:51 bblack: systemctl enable+start of lldpd on cp2009, cp1051, cp1061 (mysteriously dead and disabled)
  • 18:16 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Remove exception on Other Projects sidebar for Dutch Wikipedia (T159634) (duration: 00m 47s)
  • 18:04 Pchelolo: update RESTBase to 9d2b393fb - production
  • 17:52 Pchelolo: update RESTBase to 9d2b393fb - staging
  • 16:46 mobrovac@tin: Started restart [parsoid/deploy@0c22f72]: (no justification provided)
  • 16:45 _joe_: reenabling puppet on all jobqueue redises
  • 16:24 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1082 weight - T137191 (duration: 00m 43s)
  • 16:22 hashar: Merged operations/puppet.git Jenkins job in a single one that runs tox then rake - T160923
  • 16:10 urandom: T111113: Live-hacking client encryption to be non-optional, to verify cqlsh encryption, restbase1007-a.eqiad.wmnet
  • 16:07 mobrovac: restbase deploy 752ca4b7
  • 15:37 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1082 weight - T137191 (duration: 00m 43s)
  • 15:32 moritzm: upgrading restbase-test* to Linux 4.9
  • 14:59 akosiaris: enabling and running puppet on rdb200X fleet in a rolling restart scheme
  • 14:59 akosiaris: disabled puppet on rdb* fleet
  • 14:56 andrewbogott: dist-upgrading labvirt1001 and rebooting it a few times
  • 14:22 moritzm: installing exim4 updates from jessie point release
  • 14:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1082 with low weight - T137191 (duration: 00m 48s)
  • 13:53 dcausse: cirrus: refreshing comp suggest indices in elastic@eqiad to measure times
  • 12:59 marostegui: Deploy schema change s4 on db1064 https://phabricator.wikimedia.org/T160415 - https://phabricator.wikimedia.org/T73563
  • 12:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1059, depool db1064 T160415 - T73563 (duration: 00m 43s)
  • 12:27 moritzm: installing libxml2 security updates
  • 12:21 marostegui: Deploy schema change s4 on labsdb1003 https://phabricator.wikimedia.org/T160415 - https://phabricator.wikimedia.org/T73563
  • 12:05 jynus: converting es2014 tables back to uncompressed InnoDB T129350
  • 11:08 godog: codfw-prod: bump ms-be2028 ms-be2039 object weight to 2000 T158337
  • 11:01 godog: pool prometheus200[34] / depool prometheus200[12] - T148408
  • 11:01 jynus@tin: Synchronized wmf-config/db-codfw.php: Pool all es2XXX servers, depool es2014 for maintenance (duration: 00m 43s)
  • 10:59 hashar: Actually restarting Jenkins for email plugins upgrades
  • 10:20 hashar: Jenkins jobs got slightly blocked because I forgot to cancel the shutdown when jobs had to run.
  • 09:58 hashar: Jenkins: upgrading plugins email-ext and mailer
  • 09:14 hashar: Jenkins upgrading SSH Slaves plugin. Might cause disruption in CI
  • 08:47 moritzm: repooled mw1261 now that T161095 is deployed
  • 08:29 marostegui: Stop db1070 MySQL db1070 for maintenance - T137191
  • 08:06 moritzm: installing audiofile security updates
  • 07:37 marostegui: Deploy schema change s4 on db1059 and labsdb1001 T160415 - T73563
  • 07:36 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1068, depool db1059 T160415 - T73563 (duration: 00m 43s)
  • 07:08 marostegui: Stop MySQL db1082 for maintenance - https://phabricator.wikimedia.org/T137191
  • 06:52 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1082 - T137191 (duration: 00m 44s)
  • 03:14 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Mar 23 03:14:05 UTC 2017 (duration 5m 47s)
  • 03:08 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.17) (duration: 14m 39s)
  • 02:35 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.16) (duration: 13m 26s)
  • 01:46 mutante: added ottomata and milimetric to "wmf-deployers" in Gerrit web ui, both have existing (deployment resp. root) shell already (T161157)

2017-03-22

  • 22:59 RainbowSprinkles: gerrit: Quick service restart, picking up new config
  • 21:25 awight: reenabling Jenkins orphan rectifier job
  • 21:18 andrewbogott: rebooting labvirt1001 because it is being terrible. https://phabricator.wikimedia.org/T159835
  • 21:05 demon@tin: Synchronized wmf-config/InitialiseSettings-labs.php: No-op, beta (duration: 00m 47s)
  • 21:00 demon@tin: Synchronized wmf-config/CommonSettings-labs.php: No-op, beta (duration: 00m 43s)
  • 20:52 awight: disabling Ingenico orphan rectifier
  • 20:43 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: Group0 to php-1.29.0-wmf.17
  • 20:05 thcipriani@tin: Synchronized php-1.29.0-wmf.17/extensions/ZeroPortal/includes/ApiZeroPortal.php: Failure to parse json config should result in a usable error T161036 (duration: 00m 42s)
  • 20:04 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Remove redundant whitelist read list for grantswiki (duration: 00m 44s)
  • 19:55 thcipriani@tin: Synchronized php-1.29.0-wmf.17/extensions/Flow: Make sure topiclist queries always join against workflow table T121644 (duration: 00m 59s)
  • 19:45 thcipriani@tin: Synchronized php-1.29.0-wmf.17/includes/Revision.php: Make Revision::getRevisionText() cache the converted text (duration: 00m 44s)
  • 19:44 mutante: rsyncing /srv of netmon1001 to /srv/netmon1001 on gerrit2001 (T125020)
  • 19:37 jynus: deploying m2 dns additions on codfw
  • 19:12 jynus@tin: Synchronized wmf-config/db-eqiad.php: Pool db1094 with full weight (duration: 00m 43s)
  • 17:10 _joe_: restarted ocg on ogc1001, not serving http queries
  • 16:55 jynus: shutting down es2016's mariadb to clone to es2015
  • 15:41 hashar@tin: Synchronized php-1.29.0-wmf.16/languages/classes/LanguageKk.php: Check for string initialization in ucfirst() to make HHVM 3.18 happy - T161095 (duration: 00m 44s)
  • 15:40 hashar@tin: Synchronized php-1.29.0-wmf.16/languages/classes/LanguageAz.php: Check for string initialization in ucfirst() to make HHVM 3.18 happy - T161095 (duration: 00m 48s)
  • 15:36 hashar: Deploying LanguageAz.php and LanguageKk.php hotfix for HHVM 3.18 on mwdebug* and mw1261 - T161095
  • 15:34 hashar@tin: Synchronized php-1.29.0-wmf.17/languages/classes/LanguageKk.php: Check for string initialization in ucfirst() to make HHVM 3.18 happy - T161095 (duration: 00m 54s)
  • 15:33 hashar@tin: Synchronized php-1.29.0-wmf.17/languages/classes/LanguageAz.php: Check for string initialization in ucfirst() to make HHVM 3.18 happy - T161095 (duration: 00m 59s)
  • 15:25 ema: cp*: removed linux-image-amd64, linux-image-3.16.0-4-amd64 and linux-image-4.4.0-1-amd64 to reduce churn
  • 14:54 moritzm: rebooting elastic2001 to Linux 4.9
  • 14:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1087 original weight - T137191 (duration: 00m 44s)
  • 14:24 marostegui: Deploy schema change s4 to db1068 - https://phabricator.wikimedia.org/T160415 https://phabricator.wikimedia.org/T73563
  • 14:18 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1081, depool db1068 T160415 - T73563 (duration: 00m 43s)
  • 13:54 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1087 weight - T137191 (duration: 00m 47s)
  • 13:39 volans: stopped ircecho to avoid the message spam
  • 13:15 dcausse: eu swat done
  • 13:09 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Enable the completion suggester (duration: 00m 43s)
  • 13:07 bblack@puppetmaster1001: conftool action : set/ttl=275; selector: dnsdisc=appservers-rw
  • 12:48 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Enable db1087 for API - T137191 (duration: 00m 42s)
  • 12:29 dcausse: cirrus: reindexing lost writes (2017-03-21T13:30:00Z to 2017-03-21T17:50:00Z) during es5 upgrade in elastic@eqiad (T157479)
  • 12:26 marostegui: Deploy schema change on s4 to db1081 and labsdb1011 - T160415 T73563
  • 12:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1084, depool db1081 T160415 - T73563 (duration: 00m 43s)
  • 12:20 gehel: maps restarting kartotherian - T150354
  • 12:18 gehel: installing latest mapnik version on maps servers
  • 12:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1087 with low weight - T137191 (duration: 00m 43s)
  • 12:09 gehel: maps upgrade to nodejs 6 completed - T150354
  • 12:09 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
  • 12:05 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1004.eqiad.wmnet
  • 12:04 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1003.eqiad.wmnet
  • 12:02 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1003.eqiad.wmnet
  • 12:01 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1002.eqiad.wmnet
  • 11:58 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1002.eqiad.wmnet
  • 11:57 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1001.eqiad.wmnet
  • 11:54 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1001.eqiad.wmnet
  • 11:53 gehel: maps codfw fully upgraded to nodejs 6, starting upgrade on maps eqiad - T150354
  • 11:51 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2003.codfw.wmnet
  • 11:51 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2004.codfw.wmnet
  • 11:46 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2004.codfw.wmnet
  • 11:41 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2002.codfw.wmnet
  • 11:34 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2002.codfw.wmnet
  • 11:33 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2001.codfw.wmnet
  • 11:27 gehel: maps2001.codfw.wmnet upgraded to nodejs6
  • 11:19 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2001.codfw.wmnet
  • 11:15 akosiaris: Enable IGMP snooping for private1-d-eqiad on asw2-d. T133387
  • 11:15 akosiaris: Enable IGMP snooping for private1-d-eqiad. T133387
  • 11:05 gehel: disabling puppet on all maps servers - T150354
  • 11:04 gehel: upgrade maps to nodejs 6 - T150354
  • 10:53 akosiaris: cr1-eqiad: set ae4 and members to enable again. T133387
  • 10:41 akosiaris: reoot asw2-d T133387
  • 10:31 dcausse: cirrus: rebuilding comp suggest indices in elastic@eqiad
  • 10:15 akosiaris: Upgrading asw2-d-eqiad to JunOS 14.1X53 (T133387)
  • 10:09 akosiaris: cr1-eqiad: set ae4 and members to disable. T133387
  • 09:55 moritzm: upgrading mw1261 to HHVM 3.18.1
  • 09:50 moritzm: upgrading mwdebug* to HHVM 3.18.1
  • 09:40 marostegui: Deploy alter table s4 (commonswiki) db1084 - T73563 T160415
  • 09:39 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1084 T160415 - T73563 (duration: 00m 43s)
  • 09:29 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1091 T160415 - T73563 (duration: 00m 43s)
  • 09:20 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2002.codfw.wmnet
  • 08:46 marostegui: Stop MySQL db1070 to clone db1087 from it - T137191
  • 07:53 dcausse: rebuilding ttmserver index in elastic@eqiad to catchup lost writes during es5 upgrade
  • 07:40 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1091 T160415 - T73563 (duration: 00m 43s)
  • 07:10 oblivian@puppetmaster1001: conftool action : set/ttl=300; selector: dnsdisc=.*
  • 07:05 marostegui: Stop MySQL db1087 - T137191
  • 06:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1087 - T137191 (duration: 00m 43s)
  • 06:43 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2037 T160415 - T73563 (duration: 00m 43s)
  • 06:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1092 weight - T137191 (duration: 00m 49s)
  • 05:44 _joe_: finished tests on citoid/dns discovery; restbase successfully detects the change
  • 05:18 _joe_: depooling temporarily citoid in eqiad from dns discovery
  • 02:40 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Mar 22 02:40:42 UTC 2017 (duration 5m 29s)
  • 02:35 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.16) (duration: 13m 20s)
  • 02:02 krinkle@tin: Synchronized errorpages/: minor tweaks - I60344bd519d (duration: 00m 54s)
  • 00:15 Dereckson: SWAT done.
  • 00:15 dereckson@tin: Synchronized php-1.29.0-wmf.16/extensions/CirrusSearch/: CompSuggest: Increase default limit from 50 to 255 + speed optimization (Gerrit:343962 + Gerrit:343966) (duration: 00m 55s)
  • 00:05 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Allow translationadmin self-add for beta.wikiversity admins (T160120) (duration: 00m 43s)

2017-03-21

  • 23:57 eileen: update civicrm from 92e3b85 to d3c439f
  • 23:44 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Mapframe on sv.wikipedia (T161032) (duration: 00m 43s)
  • 23:29 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Translate on beta.wikiversity (T160120) (duration: 00m 45s)
  • 23:19 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Make rcenhancedfilters available as beta feature, enable on test wikis (Gerrit:343435 + Gerrit:343436) (duration: 00m 51s)
  • 22:45 mutante: lists: deactivate arbcom-ko per T160892 and Google translation of Korean talk pages
  • 22:44 Dereckson: Run namespaceDupes on pnbwiki (T159976)
  • 22:28 Dereckson: Create Translate tables on betawikiversity (T160120)
  • 20:59 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: Revert Group0 to 1.29.0-wmf.17
  • 20:56 demon@tin: Synchronized wmf-config/InitialiseSettings.php: logging for bad header stuff (duration: 00m 52s)
  • 20:33 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: T161001 Turn off completion suggester until length error is fixed (duration: 00m 44s)
  • 20:29 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: Group0 to 1.29.0-wmf.17
  • 19:54 thcipriani@tin: Finished scap: testwiki to php-1.29.0-wmf.17 and rebuild l10n cache (duration: 51m 16s)
  • 19:35 mutante: phab2001 - same as iridium, phab search config change
  • 19:33 mutante: iridium - ran puppet after gerrit:343936 - phabricator config change to use cluster search applied
  • 19:22 chasemp: clean out admin-monitoring for nova-fullstack T160908
  • 19:10 mutante: ruthenium - dev API enabled in parsoid config for parsoid rt tests
  • 19:03 thcipriani@tin: Started scap: testwiki to php-1.29.0-wmf.17 and rebuild l10n cache
  • 18:18 jynus@tin: Synchronized wmf-config/db-eqiad.php: Increase weight of db1092 and db1094 (duration: 00m 42s)
  • 18:02 twentyafterfour: refreshing phabricator's elasticsearch index in eqiad
  • 17:56 dcausse@tin: Synchronized wmf-config/CirrusSearch-common.php: [es5 upgrade] step 4: repool eqiad for writes (3/3) (duration: 00m 42s)
  • 17:54 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: [es5 upgrade] step 4: repool eqiad for writes (2/3) (duration: 00m 42s)
  • 17:53 dcausse@tin: Synchronized wmf-config/CommonSettings.php: [es5 upgrade] step 4: repool eqiad for writes (1/3) (duration: 00m 42s)
  • 17:45 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1092 weight - T137191 (duration: 00m 42s)
  • 17:24 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1092 weight - T137191 (duration: 00m 45s)
  • 17:13 thcipriani: starting branch cut for 1.29.0-wmf.17
  • 17:09 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1092 with low weight - T137191 (duration: 00m 42s)
  • 17:02 urandom: T111113: Rolling restart of RESTBase, eqiad, complete
  • 16:52 urandom: T111113: Rolling restart of RESTBase, eqiad
  • 16:41 urandom: T111113: Rolling restart of RESTBase, codfw, complete
  • 16:17 urandom: T111113: Enabling RESTBase client encryption on (remaining) codfw nodes
  • 16:11 urandom: T111113: Enabling RESTBase client encryption on restbase2001.codfw.wmnet (canary)
  • 15:56 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic2020.codfw.wmnet
  • 15:27 moritzm: removed "Directory Managers" group from LDAP (Bug T157131)
  • 15:01 bd808@tin: Synchronized php-1.29.0-wmf.16/extensions/OpenStackManager/special/SpecialNovaInstance.php: SpecialNovaInstance: Remove some totally useless domain code. (T160995) (duration: 00m 43s)
  • 14:58 gehel: elasticsearch upgrade on eqiad is completed - T157479
  • 14:50 moritzm: installing gnutls security updates on trusty (jessie already fixed)
  • 14:44 gehel: elasticsearch eqiad, full cluster restart after cleanup of known old indices - T157479
  • 14:39 gehel: deleting old v2 indices from each elasticsearch server - T157479
  • 14:34 gehel: deleting old v2 indices from elastic1030: azbwiki_general_first, vewikimedia_content_1415331110, vewikimedia_general_1415331150 - T157479
  • 14:07 gehel: upgrading elasticsearch eqiad to v5.x - T157479
  • 14:01 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2044, depool db2037 T160415 - T73563 (duration: 00m 42s)
  • 13:44 dcausse: eu SWAT done
  • 13:39 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: [es5 upgrade] Enable completion suggester (duration: 00m 42s)
  • 13:30 dcausse@tin: Synchronized wmf-config/CirrusSearch-common.php: [es5 upgrade] step 3: depool eqiad for writes (take 2) (3/3) (duration: 00m 41s)
  • 13:29 gehel: rolling restart of wdqs to load new configuration options
  • 13:29 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: [es5 upgrade] step 3: depool eqiad for writes (take 2) (2/3) (duration: 00m 43s)
  • 13:27 dcausse@tin: Synchronized wmf-config/CommonSettings.php: [es5 upgrade] step 3: depool eqiad for writes (take 2) (1/3) (duration: 00m 42s)
  • 13:15 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: T157111 pagePreviews: Increase perf instrumentation sample (duration: 00m 58s)
  • 13:14 Reedy: Make that clear 2FA for RickinBaltimore per T160671
  • 13:12 Reedy: Clear centralauth for RickinBaltimore per T160671
  • 12:54 moritzm: installing r-base security updates
  • 12:47 gehel: running stress and bonnie on elastic2020 - T149006
  • 12:34 Dereckson: Created OATHAuth tables on projectcomwiki (T143138)
  • 12:27 Dereckson: Create account Superzerocool on projectcomwiki (bureaucrat, T143138)
  • 11:00 ema: upgrading twisted to 16.2.0 on lvs1007-12 T160433
  • 10:33 marostegui: Run pt-table-checksum on s6 (ruwiki) - https://phabricator.wikimedia.org/T160509
  • 09:42 moritzm: installing libevent security updates on remaining hosts in eqiad
  • 09:42 marostegui: Stop MySQL db1070 to clone db1092 from it - T137191
  • 09:14 akosiaris: enable bacula deamons on helium, everything looks ok
  • 09:09 moritzm: installing wireshark security updates
  • 09:06 hashar: CI deploying config hack "High priority test pipeline"  : https://gerrit.wikimedia.org/r/343318 - T160667
  • 08:43 gehel: shutting down elasticsearch on elastic2020, investigating T149006
  • 07:50 gehel: banning elastic2020 from cluster to investigate T149006
  • 07:36 marostegui: Stop mysql db1092 for maintenance - T137191
  • 07:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1092 - T137191 (duration: 00m 42s)
  • 07:22 marostegui: Run pt-table-checksum on s6 (jawiki) - T160509
  • 07:18 marostegui: Deploy schema change on db2044 and labsdb1009 (s4) - https://phabricator.wikimedia.org/T160415 - https://phabricator.wikimedia.org/T73563
  • 07:18 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2044 - T160415 - T73563 (duration: 00m 41s)
  • 06:49 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2051 - T160415 - T73563 (duration: 01m 07s)
  • 06:01 joal@tin: Finished deploy [analytics/refinery@c3a9139]: (no justification provided) (duration: 06m 39s)
  • 05:55 joal@tin: Started deploy [analytics/refinery@c3a9139]: (no justification provided)
  • 03:38 eileen: update civicrm from 21afe66 to 92e3b85
  • 03:10 eileen: update civicrm from 0ed1659 to 21afe66
  • 02:37 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Mar 21 02:37:33 UTC 2017 (duration 5m 22s)
  • 02:32 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.16) (duration: 13m 22s)
  • 01:59 eileen: update civicrm from f454f16 to 0ed1659
  • 00:37 Amir1: ladsgroup@terbium:/srv/mediawiki/php-1.29.0-wmf.16$ mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=etwiki is done now (T159609)
  • 00:30 Amir1: ladsgroup@terbium:/srv/mediawiki/php-1.29.0-wmf.16$ mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=etwiki (T159609)
  • 00:28 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable ORES review tool in etwiki (T159609) (duration: 00m 42s)
  • 00:13 Amir1: mwscript maintenance/sql.php --wiki=etwiki extensions/ORES/sql/(ores_model|ores_classification).sql (T159609)
  • 00:04 Krinkle: mwscript deleteEqualMessages.php on public wikis (T45917)
  • 00:02 eileen: update civicrm from e058e8c to f454f16

2017-03-20

  • 23:59 mutante: phab2001 / iridium - running puppet after gerrit:343635 - switches phab search to codfw
  • 23:58 dereckson@tin: Synchronized php-1.29.0-wmf.16/extensions/CirrusSearch/includes/CompletionSuggester.php: Don't pass null suggest queries to elasticsearch (T160896) (duration: 00m 42s)
  • 23:54 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Restrict page images to lead section (T152115) (duration: 00m 43s)
  • 23:48 dereckson@tin: Synchronized php-1.29.0-wmf.16/extensions/CirrusSearch/includes/BuildDocument/Completion/SuggestBuilder.php: Gerrit:343754 Allow completion suggester to work with titles that look like integers (duration: 00m 45s)
  • 23:47 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable wgCiteResponsiveReferences on fr. en. it. la. no.wp + en.wikt (duration: 00m 46s)
  • 23:39 dereckson@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Gerrit:343781 Test ORES migration on ruwiki beta too (labs only, no-op in prod) (duration: 00m 42s)
  • 22:57 mutante: ruthenium: running puppet after gerrit:343782 added missing diffserver unit file. puppet run looked good: Visualdiff::Server[diffserver]/Service[diffserver]/ensure: ensure changed 'stopped' to 'running', systemctl status says failed though
  • 22:54 ppchelko@tin: Finished deploy [trending-edits/deploy@e4fa9b8]: Config: Set up 'trends_at' property T160127 (duration: 06m 20s)
  • 22:47 ppchelko@tin: Started deploy [trending-edits/deploy@e4fa9b8]: Config: Set up 'trends_at' property T160127
  • 22:45 ejegg: updated payments-wiki from f991f15 to 9622a4b
  • 22:38 ppchelko@tin: Finished deploy [trending-edits/deploy@5d3eb7f]: Do not purge articles that have trended T160127 (duration: 07m 57s)
  • 22:31 mutante: ruthenium - gerrit:343682 applied - puppet: OK nginx: OK diffserver service refresh: failed @ssastry
  • 22:30 ppchelko@tin: Started deploy [trending-edits/deploy@5d3eb7f]: Do not purge articles that have trended T160127
  • 20:52 mutante: DNS - new Wikipedias "khw" (Khowar) and "kbp" (Kabiye) created (T160868) (T160865) ( on ns0/ns1: authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones to trigger template recreation after edit to langs.tmpl)
  • 20:47 mutante: DNS - ns2 - authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones to create new WP languages 'khw' and 'kbp'
  • 20:19 bsitzmann@tin: Finished deploy [mobileapps/deploy@815ebb5]: Update mobileapps to c0ab01d (duration: 07m 31s)
  • 20:14 reedy@tin: Synchronized php-1.29.0-wmf.16/includes/api/ApiQueryAllPages.php: Limit query=allpages filterredir if MiserMode T160916 (duration: 00m 42s)
  • 20:12 reedy@tin: Synchronized php-1.29.0-wmf.16/includes/specials/SpecialAllPages.php: Re-enable Special:AllPages, disable redirect filter if MiserMode T160916 (duration: 00m 42s)
  • 20:12 bsitzmann@tin: Started deploy [mobileapps/deploy@815ebb5]: Update mobileapps to c0ab01d
  • 19:45 mutante: lists: disabled wikimediaro-l due to inactivity (disabling lists is easy nowadays and also revertable): fermium: sudo /usr/local/sbin/disable_list <list name> | (T146563)
  • 19:42 mobrovac@tin: Finished deploy [changeprop/deploy@decb6a1]: (no justification provided) (duration: 00m 56s)
  • 19:41 mobrovac@tin: Started deploy [changeprop/deploy@decb6a1]: (no justification provided)
  • 18:39 thcipriani@tin: Synchronized wmf-config: SWAT: Revert "Revert "Turn off patrolling for FlaggedRevs in bswiki"" T158662 (duration: 00m 44s)
  • 18:28 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: pagePreviews: Enable by default on "stage 0" wikis T136602 (duration: 00m 42s)
  • 18:18 ariel@tin: Finished deploy [dumps/dumps@91d3215]: more default config fixes, flagged rev table config fix (duration: 00m 02s)
  • 18:18 ariel@tin: Started deploy [dumps/dumps@91d3215]: more default config fixes, flagged rev table config fix
  • 17:35 akosiaris: slow rolling restart of redis databases in codfw T159850
  • 17:22 ariel@tin: Finished deploy [dumps/dumps@80d88cd]: fic buglet due to new default config file (duration: 00m 02s)
  • 17:22 ariel@tin: Started deploy [dumps/dumps@80d88cd]: fic buglet due to new default config file
  • 17:09 gehel@tin: Finished deploy [wdqs/wdqs@e9e7c95]: (no justification provided) (duration: 01m 41s)
  • 17:07 gehel@tin: Started deploy [wdqs/wdqs@e9e7c95]: (no justification provided)
  • 16:48 mobrovac: restbase deploying e4c327b0
  • 15:59 hashar: Special:AllPages being blank has a public task: https://phabricator.wikimedia.org/T160916
  • 15:50 dcausse@tin: Synchronized wmf-config/CommonSettings.php: Revert: T157479 [es5 upgrade] step 3: depool eqiad for writes (1/3) (duration: 00m 42s)
  • 15:49 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: Revert: T157479 [es5 upgrade] step 3: depool eqiad for writes (2/3) (duration: 00m 42s)
  • 15:40 dcausse@tin: Synchronized wmf-config/CirrusSearch-common.php: Revert: T157479 [es5 upgrade] step 3: depool eqiad for writes (3/3) (duration: 00m 42s)
  • 15:39 dcausse@tin: Synchronized wmf-config/CirrusSearch-common.php: T157479 [es5 upgrade] step 3: depool eqiad for writes (3/3) (duration: 00m 41s)
  • 15:37 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: T157479 [es5 upgrade] step 3: depool eqiad for writes (2/3) (duration: 00m 46s)
  • 15:34 dcausse@tin: Synchronized wmf-config/CommonSettings.php: T157479 [es5 upgrade] step 3: depool eqiad for writes (1/3) (duration: 00m 45s)
  • 15:15 hashar@tin: Synchronized php-1.29.0-wmf.16/extensions/Translate: ElasticTTM: set the index when deleting docs (duration: 00m 53s)
  • 15:08 hashar@tin: Synchronized php-1.29.0-wmf.16/includes/specials/SpecialWatchlist.php: Restoring Watchlist: Fix form and preference overriding https://gerrit.wikimedia.org/r/#/c/343433/ (duration: 00m 51s)
  • 14:36 hashar@tin: Synchronized php-1.29.0-wmf.16/includes/specials/SpecialAllPages.php: Disable SpecialAllPages on all wikis. Temporary workaround (duration: 01m 08s)
  • 14:36 hashar: Disabled Special:AllPages on all wikis making it spurts a blank page instead. ( https://gerrit.wikimedia.org/r/#/c/343647/ )
  • 14:32 akosiaris: disable puppet on all rdb* nodes to shepherd https://gerrit.wikimedia.org/r/343027 into production. T159850
  • 14:28 elukey: (Correct one) Temporary hack for T160888 - moved /srv/mw-log/archive/api.log-20170224.gz to /srv/mw-log/archive/api_log_backup_elukey/ to avoid rsync timeouts to stat1002 (the file is big and close to being deleted for retention)
  • 14:27 elukey: Temporary hack for T160886 - moved /srv/mw-log/archive/api.log-20170224.gz to /srv/mw-log/archive/api_log_backup_elukey/ to avoid rsync timeouts to stat1002 (the file is big and close to being deleted for retention)
  • 14:22 hashar@tin: Synchronized php-1.29.0-wmf.16/includes/specials/SpecialWatchlist.php: reverts commit SpecialWatchlist.php 0d675d2 (duration: 00m 43s)
  • 13:55 jynus: shutting down es2015 for maintenance T160242
  • 13:41 zfilipin@tin: Synchronized wmf-config/: SWAT: Enable CollaborationKit on beta enwiki (T138325) (duration: 00m 44s)
  • 13:35 zfilipin@tin: Synchronized php-1.29.0-wmf.16/tests/phpunit/includes/specials/SpecialWatchlistTest.php: SWAT: Watchlist: Fix form and preference overriding (T160734) (duration: 00m 48s)
  • 13:34 zfilipin@tin: Synchronized php-1.29.0-wmf.16/includes/specials/SpecialWatchlist.php: SWAT: Watchlist: Fix form and preference overriding (T160734) (duration: 01m 01s)
  • 11:39 akosiaris: return rdb1007 client-output-buffer-limit config to initially configured value T159850
  • 10:09 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1094 after crash (duration: 00m 47s)
  • 09:42 godog: swift bump ms-be2028 -> ms-be2039 weight - T158337
  • 09:37 jynus: restarting db1094 for upgrade
  • 09:02 dcausse: refreshing ttm documents in elastic@codfw
  • 08:47 hashar: Jenkins: depooling / deleting Precise instances. T158652
  • 08:28 dcausse: cirrus: refreshing all comp sugggest indices in elastic@codfw
  • 02:23 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Mar 20 02:23:38 UTC 2017 (duration 5m 25s)
  • 02:18 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.16) (duration: 06m 44s)

2017-03-19

  • 18:40 ariel@tin: Finished deploy [dumps/dumps@8cff500]: generate json status files for use by downloaders (duration: 00m 02s)
  • 18:39 ariel@tin: Started deploy [dumps/dumps@8cff500]: generate json status files for use by downloaders
  • 10:43 ariel@tin: Finished deploy [dumps/dumps@87d748b]: dump magic words and namespace info (duration: 00m 02s)
  • 10:43 ariel@tin: Started deploy [dumps/dumps@87d748b]: dump magic words and namespace info
  • 02:24 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Mar 19 02:24:53 UTC 2017 (duration 5m 23s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.16) (duration: 07m 29s)

2017-03-18

  • 20:16 chasemp: labstore1005 service nfs-exportd restart
  • 19:43 chasemp: test on labstore1004 nfs-exportd candidate /root/nfs-exportd-candidate.py --observer-pass xxxxxx --interval 0 --config-path /etc/nfs-mounts.yaml --exports-d-path /root/fake_export/ --debug
  • 18:38 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1094 after crash (duration: 01m 02s)
  • 18:20 jynus: powercycling db1094
  • 02:36 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Mar 18 02:36:00 UTC 2017 (duration 5m 23s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.16) (duration: 11m 29s)
  • 00:06 mutante: lists: creating new list wikimedia-nys (Noongar language) (T159499)

2017-03-17

  • 23:48 mutante: lists: creating new list wikispecies-admin (T159625)
  • 23:36 catrope@tin: Synchronized php-1.29.0-wmf.16/extensions/VisualEditor/lib/ve: Fixes for T154123 T160479 T160190 T160197 (duration: 00m 42s)
  • 23:31 mutante: lists: making Steinsplitter and Zhuyifei1999 list admins of commons-poty (T160672)
  • 16:16 elukey: reimage restbase-dev1001.eqiad.wmnet
  • 14:01 marostegui: Deploy schema change on dbstore1001 and db2051 (s4) - T160415 - T73563
  • 14:00 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2051 - T160415 - T73563 (duration: 00m 42s)
  • 13:39 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2058 - T160415 - T73563 (duration: 01m 06s)
  • 12:46 chasemp: labsdb10[01|03] maintain-views --table user_groups --all-database --replace-all --debug
  • 12:44 chasemp: labsdb10[09|10|11] maintain-views --table user_groups --all-database --replace-all --debug
  • 11:33 elukey: reimage analytics1044 (Hadoop Worker node) to Debian Jessie
  • 10:58 akosiaris: reimage helium.eqiad.wmnet to jessie
  • 09:04 jynus: killing 11h-running query on db1089 from terbium (orphan process)
  • 08:32 marostegui: Deploy schema change on dbstore2002 and db2058 (s4) - T160415 T73563
  • 08:31 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2058 - T160415 - T73563 (duration: 00m 43s)
  • 08:00 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2065 - T160415 - T73563 (duration: 00m 44s)
  • 07:24 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase weight for db1070 - T157931 (duration: 00m 45s)
  • 02:39 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Mar 17 02:39:10 UTC 2017 (duration 5m 22s)
  • 02:33 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.16) (duration: 12m 12s)
  • 01:54 urandom: T111113: Rolling restarts of Cassandra complete
  • 01:12 urandom: T111113: Rolling restarts of Cassandra, eqiad, rack 'd'
  • 00:41 ebernhardson@tin: Synchronized php-1.29.0-wmf.16/resources/src/mediawiki.special/: SWAT: Fix search result percentage width when no interwiki sidebar shown (duration: 00m 42s)
  • 00:40 ebernhardson@tin: Synchronized php-1.29.0-wmf.16/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: enabled sister search AB test on 8 wikis (duration: 00m 43s)
  • 00:34 urandom: T111113: Rolling restarts of Cassandra, eqiad, rack 'b'
  • 00:23 urandom: T111113: Rolling restarts of Cassandra on restbase1016
  • 00:13 urandom: T111113: Rolling restarts of Cassandra on restbase1011
  • 00:03 urandom: T111113: Rolling restarts of Cassandra on restbase1010

2017-03-16

  • 23:46 reedy@tin: Synchronized php-1.29.0-wmf.16/extensions/CodeReview: Fix preg_ error again (duration: 00m 47s)
  • 23:25 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Enable PageViewInfo to group2 T125917 (duration: 00m 49s)
  • 23:24 urandom: T111113: Rolling restarts of Cassandra in codfw, rack 'd' *correction*
  • 23:24 urandom: T111113: Rolling restarts of Cassandra in codfw, rack 'b'
  • 22:34 urandom: T111113: Rolling restarts of Cassandra in codfw, rack 'a'
  • 21:50 urandom: T111113: Rolling restarts of Cassandra in codfw, rack 'b'
  • 21:36 urandom: T111113: Restarting Cassandra on restbase1007-{b,c} to enable (optional) client encryption
  • 21:19 urandom: T111113: Restarting Cassandra on restbase1007-a to enable (optional) client encryption
  • 21:17 ebernhardson: reindexing group2 in cirrussearch for codfw downtime during 2.x -> 5.x upgrade
  • 21:06 ejegg: updated new CiviCRM from cca5921 to e058e8c
  • 20:08 mutante: repooled elastic2010, depooled correct host elastic2020 instead (T149006)
  • 20:08 dzahn@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic2020.codfw.wmnet
  • 20:08 dzahn@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic2010.codfw.wmnet
  • 20:06 mutante: depooled elastic2010 since it is powered-off/down. (set/pooled=inactive) - (T149006)
  • 20:05 dzahn@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic2010.codfw.wmnet
  • 20:05 twentyafterfour: restarted phd on iridium to fix workers dieing
  • 19:26 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.29.0-wmf.16
  • 19:07 thcipriani@tin: Synchronized php-1.29.0-wmf.16/extensions/VisualEditor/lib/ve: SWAT: Update VE core submodule to wmf/1.29.0-wmf.16 HEAD (50a6323d7) T154123 T160479 (duration: 00m 44s)
  • 19:02 gehel: restart relforge to activate new plugins - T160674
  • 16:57 ebernhardson: started cirrus completion indices rebuild for group2 on wasat.codfw.wmnet
  • 16:48 ebernhardson: manually adjusted wikiversions on wasat.codfw.wmnet to point all wikis at wmf.16 to rebuild cirrus completion search indices before group2 rolls forward
  • 16:44 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase weight for db1070 - T157931 (duration: 00m 41s)
  • 16:32 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2065 - T160415 - T73563 (duration: 00m 42s)
  • 16:01 marostegui: Deploy schema change on s4 (commonswiki) https://phabricator.wikimedia.org/T73563 and https://phabricator.wikimedia.org/T160415
  • 16:00 elukey: racadm serveraction powerdown on mw2256 for hw maintenance
  • 15:53 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1092 - T160415 (duration: 00m 42s)
  • 15:44 godog: reboot ms-be1008 after disk swap to clear stuck mkfs.xfs
  • 15:44 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1092 - T160415 (duration: 00m 42s)
  • 15:27 otto@tin: Finished deploy [eventlogging/eventbus@75ab39c]: /v1/schemas/:schema_uri endpoint, T159179 (duration: 00m 14s)
  • 15:27 otto@tin: Started deploy [eventlogging/eventbus@75ab39c]: /v1/schemas/:schema_uri endpoint, T159179
  • 15:18 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1087 - T160415 (duration: 00m 42s)
  • 15:13 elukey: restart hhvm on mw1200, high load and queued requests - hhvm-dump-debug on /tmp/hhvm.27107.bt.
  • 15:09 elukey: restart hhvm on mw1207, high load and queued requests - hhvm-dump-debug on /tmp/hhvm.27441.bt.
  • 15:00 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1087 - T160415 (duration: 00m 42s)
  • 14:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1082 - T160415 (duration: 00m 41s)
  • 14:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1082 - T160415 (duration: 00m 42s)
  • 14:31 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 with low weight - T157931 (duration: 00m 45s)
  • 14:18 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 - T160415 (duration: 00m 43s)
  • 14:12 Dereckson: EU SWAT, round 2, done
  • 14:11 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Create Wikichanzo namespace for swwiki T158041) (duration: 00m 42s)
  • 14:07 dereckson@tin: Synchronized wmf-config/throttle.php: Add Odia Wikipedia's 100 Women Editathon throttle rule (T160619) (duration: 00m 57s)
  • 13:52 Dereckson: Resume EU SWAT for two new changes
  • 13:52 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1045 - T160415 (duration: 00m 58s)
  • 13:38 marostegui: Shutdown es2015 for maintenance - T160242
  • 13:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026 - T160415 (duration: 00m 42s)
  • 13:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1026 - T160415, Repool db1067 - T160435 (duration: 00m 42s)
  • 13:01 addshore: EU SWAT done
  • 12:59 addshore@tin: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule T160427 (lift of IP cap for RIT - March 25, 2017) (duration: 00m 43s)
  • 12:49 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: wmgUseInterwikiSorting true for wikidataclients T160465 T150183 (duration: 00m 42s)
  • 12:39 marostegui: Deploy schema change on s5 - T160415
  • 12:38 addshore@tin: Synchronized php-1.29.0-wmf.16/extensions/InterwikiSorting: Use ExtensionFunctions instead of BeforeInitialize hook T160465 (duration: 00m 44s)
  • 12:31 addshore@tin: Synchronized php-1.29.0-wmf.16/extensions/InterwikiSorting: Use ExtensionFunctions instead of BeforeInitialize hook T160465 (duration: 00m 43s)
  • 12:17 addshore@tin: Synchronized php-1.29.0-wmf.15/extensions/InterwikiSorting: Use ExtensionFunctions instead of BeforeInitialize hook T160465 (duration: 00m 43s)
  • 11:48 godog: repair prometheus' leveldb database archived_fingerprint_to_metric on bast3002, upgrade prometheus to latest version from jessie-backports
  • 11:26 moritzm: enabled BBR as TCP congestion control algorithm on cp1008
  • 11:04 joal@tin: Finished deploy [analytics/aqs/deploy@006bf8c]: (no justification provided) (duration: 03m 30s)
  • 11:01 joal@tin: Started deploy [analytics/aqs/deploy@006bf8c]: (no justification provided)
  • 10:59 joal@tin: Finished deploy [analytics/aqs/deploy@006bf8c]: (no justification provided) (duration: 02m 13s)
  • 10:56 joal@tin: Started deploy [analytics/aqs/deploy@006bf8c]: (no justification provided)
  • 10:12 volans: upgraded cumin to version 0.0.2 in the repository and on neodymium/sarin
  • 10:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1051 - T160415 (duration: 00m 41s)
  • 09:56 moritzm: installing libevent security updates
  • 09:55 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1051 - T160415 (duration: 00m 42s)
  • 09:46 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 - T160415 (duration: 00m 42s)
  • 09:46 moritzm: upgrading apache on cobalt/gerrit
  • 09:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 - T160415 (duration: 00m 47s)
  • 09:22 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1066 - T160415 (duration: 00m 42s)
  • 09:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1066 - T160415 (duration: 00m 42s)
  • 09:11 moritzm: upgrading apache on fermium/lists.wikimedia.org
  • 09:10 moritzm: upgrading apache on mendelevium/OTRS
  • 09:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1072 - T160415 (duration: 00m 42s)
  • 09:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1072 - T160415 (duration: 00m 41s)
  • 08:57 godog: codfw-prod: add ms-be203[1-9] - T158337
  • 08:55 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1073 - T160415 (duration: 00m 41s)
  • 08:51 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1073 - T160415 (duration: 00m 41s)
  • 08:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 - T160415 (duration: 00m 43s)
  • 08:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T160415 (duration: 00m 41s)
  • 08:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1080 - T160415 (duration: 00m 46s)
  • 08:20 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1080 - T160415 (duration: 00m 47s)
  • 08:12 moritzm: upgrading apache on einsteinium/icinga.wikimedia.org
  • 07:51 marostegui: Deploy schema change on s1 - T160415
  • 07:36 marostegui: Deploy schema change on s7 - T160415
  • 07:08 marostegui: Starting pt-table-checksum on s6 (frwiki) - T160509
  • 03:01 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Mar 16 03:01:37 UTC 2017 (duration 5m 50s)
  • 02:55 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.16) (duration: 13m 39s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 08m 51s)
  • 00:14 eileen: updated civicrm from f1a3d64 to cca5921

2017-03-15

  • 23:36 twentyafterfour: train unblocked and wmf.16 is deployed to group1 wikis.
  • 23:32 twentyafterfour@tin: Synchronized php-1.29.0-wmf.16/extensions/ApiFeatureUsage/ApiFeatureUsageQueryEngineElastica.php: deploy I2d8603 refs T160578 T158997 (duration: 00m 42s)
  • 23:25 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Restrict page images to lead section on cawiki T152115 (duration: 00m 42s)
  • 23:17 thcipriani@tin: Synchronized wmf-config: SWAT: Set $wgOresExtension for I63b11eff3a4 T159763 (duration: 00m 44s)
  • 23:12 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy PageViewInfo to group1 T125917 (duration: 00m 43s)
  • 22:51 twentyafterfour@tin: Synchronized wmf-config/CirrusSearch-common.php: Deploy I4980da refs T160569 and T158997 (duration: 00m 42s)
  • 22:34 mutante: Cassandra test hosts: deploy break-fix gerrit:342912 , run puppet on cerium and praseodymium. on xenon puppet is disabled.
  • 21:54 twentyafterfour@tin: Synchronized wmf-config/CirrusSearch-common.php: Deploy I67d712 refs T160569 and T158997 (duration: 00m 42s)
  • 21:52 eileen: civicrm update from 639eb68 to f1a3d64
  • 21:29 twentyafterfour@tin: Synchronized wmf-config: deploy Iad9849 to fix 160569 and unblock the train refs T158997 (duration: 00m 49s)
  • 21:01 twentyafterfour@tin: Synchronized wmf-config: deploy I489c4a to fix 160569 and unblock the train refs T158997 (duration: 00m 45s)
  • 20:54 ladsgroup@tin: Finished deploy [ores/deploy@bc0bc74]: Mid-March deploy of ORES (T160279) (duration: 26m 46s)
  • 20:44 gehel: restarting postgresql on maps clusters - T160209
  • 20:38 urandom: T111113: Restarting xenon (RESTBase Staging) to enable client encryption (canary)
  • 20:34 jynus@tin: Synchronized wmf-config/db-eqiad.php: Move db1067 from s2 to s1 as a db1057 replacement (duration: 00m 42s)
  • 20:30 twentyafterfour: T160569 blocks the train until I can figure out what is causing it. The frequency is low so I haven't reverted to wmf.15, group 1 remains on wmf.16 refs T158997
  • 20:27 ladsgroup@tin: Started deploy [ores/deploy@bc0bc74]: Mid-March deploy of ORES (T160279)
  • 20:13 bsitzmann@tin: Finished deploy [mobileapps/deploy@fa43048]: Update mobileapps to bb8fcf2 (duration: 03m 51s)
  • 20:09 bsitzmann@tin: Started deploy [mobileapps/deploy@fa43048]: Update mobileapps to bb8fcf2
  • 20:05 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.16
  • 19:56 twentyafterfour@tin: Synchronized php-1.29.0-wmf.16/includes/specialpage/: deploy revert of 5b15728 (duration: 00m 44s)
  • 19:46 jynus: shutting down db1067 for maintenance (as a db1057 replacement) T160435
  • 19:16 mobrovac@tin: Finished deploy [changeprop/deploy@b68bf51]: Deploy producer fix for T159200 (duration: 00m 51s)
  • 19:15 mobrovac@tin: Started deploy [changeprop/deploy@b68bf51]: Deploy producer fix for T159200
  • 18:35 legoktm@tin: Synchronized php-1.29.0-wmf.16/resources/src/mediawiki.widgets/mw.widgets.SearchInputWidget.js: mw.widgets.SearchInputWidget: Do not pass to TextInputWidget - T148471 (2/2) (duration: 00m 42s)
  • 18:34 legoktm@tin: Synchronized php-1.29.0-wmf.16/includes/widget/SearchInputWidget.php: mw.widgets.SearchInputWidget: Do not pass to TextInputWidget - T148471 (1/2) (duration: 00m 41s)
  • 18:32 legoktm@tin: Synchronized php-1.29.0-wmf.16/includes/libs/filebackend/SwiftFileBackend.php: Make sure Swift store operations close the source file handle - T159607 (duration: 00m 44s)
  • 18:25 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to group0 and small wikis - T148609 (duration: 00m 42s)
  • 18:21 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Deploy PageViewInfo to group0 - T125917 (duration: 00m 42s)
  • 18:20 otto@tin: Finished deploy [eventstreams/deploy@eb8698e]: T159200 (duration: 06m 18s)
  • 18:19 ppchelko@tin: Finished deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0. T159200 (duration: 06m 11s)
  • 18:19 legoktm@tin: Synchronized wmf-config/logging.php: Use custom LogstashFormatter - T145133, T151290 (duration: 00m 42s)
  • 18:15 mobrovac@tin: Finished deploy [changeprop/deploy@614cb4b]: Deploy for switching to librdkafka 0.9.4 T159200 (duration: 00m 33s)
  • 18:15 mobrovac@tin: Started deploy [changeprop/deploy@614cb4b]: Deploy for switching to librdkafka 0.9.4 T159200
  • 18:14 mobrovac: restbase deploying f047dabb
  • 18:13 ppchelko@tin: Started deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0. T159200
  • 18:13 otto@tin: Started deploy [eventstreams/deploy@eb8698e]: T159200
  • 18:12 ottomata: upgrading librdkafka on scb eqiad nodes T159200
  • 18:12 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Show 'Publish' not 'Save' on most public wikis -T131132 (duration: 00m 42s)
  • 18:08 mobrovac@tin: Finished deploy [changeprop/deploy@614cb4b]: Deploy to EQIAD canary for switching to librdkafka 0.9.4 T159200 (duration: 00m 20s)
  • 18:07 mobrovac@tin: Started deploy [changeprop/deploy@614cb4b]: Deploy to EQIAD canary for switching to librdkafka 0.9.4 T159200
  • 18:07 ppchelko@tin: Finished deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0. Canary on scb1001.eqiad.wmnet. T159200 (duration: 01m 07s)
  • 18:06 ppchelko@tin: Started deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0. Canary on scb1001.eqiad.wmnet. T159200
  • 18:06 otto@tin: Started deploy [eventstreams/deploy@eb8698e]: T159200
  • 17:55 ppchelko@tin: Finished deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0 in codfw. T159200 (duration: 03m 51s)
  • 17:53 mobrovac@tin: Finished deploy [changeprop/deploy@614cb4b]: Deploy to CODFW for switching to librdkafka 0.9.4 T159200 (duration: 01m 44s)
  • 17:52 otto@tin: Finished deploy [eventstreams/deploy@eb8698e]: T159200 (duration: 01m 35s)
  • 17:51 mobrovac@tin: Started deploy [changeprop/deploy@614cb4b]: Deploy to CODFW for switching to librdkafka 0.9.4 T159200
  • 17:51 ppchelko@tin: Started deploy [trending-edits/deploy@85be190]: Update to node-rdkafka 0.8.0 in codfw. T159200
  • 17:50 otto@tin: Started deploy [eventstreams/deploy@eb8698e]: T159200
  • 17:50 ottomata: upgrading librdkafka on scb in codfw T159200
  • 17:46 otto@tin: Finished deploy [eventstreams/deploy@eb8698e]: T159200 (duration: 00m 17s)
  • 17:46 otto@tin: Started deploy [eventstreams/deploy@eb8698e]: T159200
  • 17:43 mobrovac@tin: Finished deploy [changeprop/deploy@614cb4b]: Canary deploy for switching to librdkafka 0.9.4 T159200 (duration: 00m 53s)
  • 17:43 ppchelko@tin: Finished deploy [trending-edits/deploy@85be190]: Trending: Update to node-rdkafka 0.8.0. Canary on scb2001. T159200 (duration: 01m 21s)
  • 17:42 mobrovac@tin: Started deploy [changeprop/deploy@614cb4b]: Canary deploy for switching to librdkafka 0.9.4 T159200
  • 17:41 ppchelko@tin: Started deploy [trending-edits/deploy@85be190]: Trending: Update to node-rdkafka 0.8.0. Canary on scb2001. T159200
  • 17:21 demon@tin: Synchronized wmf-config/CommonSettings.php: Stop calling an idiot user an idiot (duration: 00m 42s)
  • 17:03 demon@tin: Synchronized wmf-config/: pruning old extensionmessages files (duration: 00m 49s)
  • 15:58 moritzm: upgraded jessie systems running HHVM in deployment-prep to 3.18.1+dfsg-1+wmf1
  • 15:47 moritzm: uploaded new HHVM 3.18 package with backported patch for stat_cache regression (T158176)
  • 15:45 marostegui: For the record: deployed schema change on s2 and s6 for image table (add an index) - T160415
  • 14:22 moritzm: installing chromium security update on osmium
  • 14:05 moritzm: uploaded python-phabricator 0.6.1-1~bpo8~trusty1 for trusty-wikimedia to apt.wikimedia.org (required for Phabricator support in offboarding script running on terbium (trusty))
  • 13:48 phuedx@tin: Synchronized wmf-config/InitialiseSettings.php: T160403: Add d to enwikisource's import list (duration: 00m 42s)
  • 13:37 phuedx@tin: Synchronized wmf-config/InitialiseSettings.php: T157111: pagePreviews: Enable perf instrumentation (duration: 00m 42s)
  • 13:18 phuedx@tin: Synchronized wmf-config/InitialiseSettings.php: 342456: Remove "editusercssjs". (duration: 02m 50s)
  • 13:14 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Enable Cognate on beta wiktionary sites T156241 Beta Only (again) (duration: 02m 45s)
  • 13:04 gehel: syncing puppet git repo on wdqs-puppet.wikidata-query.eqiad.wmflabs
  • 12:13 godog: deploy thumbor 0.1.36-1 on thumbor100*
  • 10:41 Dereckson: Run namespaceDupes.php for pnb.wiktionary (T159976): all looks good for this one
  • 10:37 Dereckson: Run namespaceDupes.php for pnb.wikipedia (T159976)
  • 10:34 ema: upgrade cp4001 (misc) and cp4011 (maps) to linux 4.9 T154934
  • 09:11 marostegui: Disable parallel replication on dbstore2002, dbstore2001, dbstore1002, dbstore1001 - T160407
  • 09:02 marostegui: Disable parallel replication on x1 slaves (db1029, db2033) - T160407
  • 08:27 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Enable Cognate on beta wiktionary sites T156241 Beta Only (duration: 02m 48s)
  • 08:26 moritzm: removed imagemagick 6.8.9.9-5+deb8u7+wmf1 from apt.wikimedia.org (the sharpen patch is folded into the new 6.8.9.9-5+deb8u8 security update)
  • 08:22 marostegui: Deploy alter table x1 testing parallel replication - T160407
  • 08:11 moritzm: installing imagemagick security updates
  • 07:26 marostegui: Enable parallel replication on x1 slaves - T160407
  • 02:35 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 13m 33s)
  • 00:55 eileen: update civicrm from 31f19d6 to 639eb68
  • 00:41 maxsem@tin: Synchronized wmf-config/logging.php: https://gerrit.wikimedia.org/r/342778 (duration: 02m 46s)
  • 00:32 maxsem@tin: Synchronized php-1.29.0-wmf.16/extensions/RelatedSites/: Hide DMOZ links with https://gerrit.wikimedia.org/r/#/c/342753/ + https://gerrit.wikimedia.org/r/#/c/342768/ (duration: 02m 48s)
  • 00:27 maxsem@tin: Synchronized php-1.29.0-wmf.15/extensions/RelatedSites/: Hide DMOZ links with https://gerrit.wikimedia.org/r/#/c/342753/ + https://gerrit.wikimedia.org/r/#/c/342768/ (duration: 02m 48s)
  • 00:19 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/340697/2 (duration: 02m 53s)
  • 00:08 mutante: depooled mw2256 because it's down again (T155180)
  • 00:08 dzahn@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet
  • 00:05 dzahn@puppetmaster1001: conftool action : get/pooled; selector: dc=eqiad,name=mw2256.codfw.wmnet

2017-03-14

  • 23:59 maxsem@tin: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/342148/ (duration: 02m 47s)
  • 23:55 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/342148/ (duration: 02m 47s)
  • 23:42 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: - (duration: 02m 50s)
  • 23:31 tgr@tin: Finished scap: T125917: Deploy PageViewInfo to testwiki (duration: 48m 58s)
  • 22:42 tgr@tin: Started scap: T125917: Deploy PageViewInfo to testwiki
  • 21:29 ebernhardson: reindexed search in group0 for mondays codfw search downtime/upgrade
  • 20:45 mattflaschen@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Beta Cluster only (duration: 02m 50s)
  • 20:17 twentyafterfour: scap was unable to connect to mw2256.codfw.wmnet
  • 20:14 twentyafterfour@tin: Finished scap: full scap of new branch, move test wikis to 1.29.0-wmf.16 refs T158997 (duration: 56m 05s)
  • 19:18 twentyafterfour@tin: Started scap: full scap of new branch, move test wikis to 1.29.0-wmf.16 refs T158997
  • 19:14 ema: restarting pybal on lvs1010-11 T160405
  • 19:13 Reedy: Delete 2FA for User:Conny per request on IRC. Identy verified via Lydia_WMDE
  • 18:42 nuria@tin: Finished deploy [eventlogging/analytics@417c40f]: (no justification provided) (duration: 00m 02s)
  • 18:42 nuria@tin: Started deploy [eventlogging/analytics@417c40f]: (no justification provided)
  • 18:39 gehel: removing swap from elasticsearch servers - T158884
  • 18:37 ottomata: upgrading librdkafka to 0.9.4 and restarting varnishkafka on cache text hosts
  • 18:19 ottomata: upgrading librdkafka to 0.9.4 and restarting varnishkafka on cache upload hosts
  • 18:13 ottomata: upgrading librdkafka to 0.9.4 and restarting varnishkafka on cache misc hosts
  • 18:11 nuria@tin: Finished deploy [eventlogging/analytics@c3ccb4a]: (no justification provided) (duration: 00m 03s)
  • 18:11 nuria@tin: Started deploy [eventlogging/analytics@c3ccb4a]: (no justification provided)
  • 17:07 ottomata: upgrading librdkafka to 0.9.4 on cache misc and restarting varnishkafka
  • 16:29 jynus: no reponse from db1057 after powercycle- trying to hard reset it
  • 16:10 urandom: T111113: Restart Cassandra in RESTBase Staging to enable optional client encryption
  • 15:39 godog: shut ms-be2002 for idrac / bios troubleshooting T155689
  • 15:24 chasemp: silence toolschecker precise job start check in anticipation of removal
  • 15:18 twentyafterfour: preparing to branch 1.29.0-wmf.16 refs T158997
  • 14:36 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1080 - T132416 (duration: 00m 40s)
  • 14:36 marostegui: Enabled parallel replication (5 threads) on db2033 (x1) - T160407
  • 14:20 chasemp: labsdb100[9|10|11] 'maintain-views --all-databases --table page --replace-all --debug'
  • 14:18 chasemp: labsdb1003 time maintain-views --all-databases --table page --replace-all --debug
  • 14:01 Dereckson: Purged portals URL
  • 13:56 dereckson@tin: Synchronized portals: Resync portals/ directory after touch (duration: 00m 42s)
  • 13:56 chasemp: labsdb1001 maintain-views --all-databases --table page --replace-all --debug
  • 13:46 dereckson@tin: Synchronized portals: Bump to e576c18522ff (duration: 00m 41s)
  • 13:45 dereckson@tin: Synchronized portals/prod/wikipedia.org/assets: Bump to e576c18522ff (duration: 00m 41s)
  • 13:18 elukey: started redis-cli --bigkeys -i 0.1 on rdb1008 (eqiad jobqueue slave)
  • 13:15 dereckson@tin: Synchronized portals: (no justification provided) (duration: 00m 41s)
  • 13:14 dereckson@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 41s)
  • 13:00 gehel: restarting elasticsearch on relforge1001 to test gelf appender
  • 12:41 elukey: reimage analytics1043 to Debian Jessie
  • 12:32 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1054 with full weight after warmup (duration: 00m 40s)
  • 12:28 jynus: stopping mariadb on db1057, preparing to backup and reimage
  • 12:24 addshore@tin: Synchronized dblists/: T150183 wmgUseInterwikiSorting true for all wikidata clients #1 #2 PT 4/4 (duration: 00m 41s)
  • 12:23 addshore@tin: Synchronized docroot/: T150183 wmgUseInterwikiSorting true for all wikidata clients #1 #2 PT 3/4 NOOP (duration: 00m 44s)
  • 12:19 addshore@tin: Synchronized wmf-config/CommonSettings.php: T150183 wmgUseInterwikiSorting true for all wikidata clients #1 #2 PT 2/4 (duration: 00m 41s)
  • 12:18 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: T150183 wmgUseInterwikiSorting true for all wikidata clients #1 #2 PT 1/4 (duration: 00m 52s)
  • 09:15 addshore@tin: Synchronized dblists/interwikisorting.dblist: wmgUseInterwikiSorting true for wikidata clients, excluding wikipedias T150183 (duration: 00m 42s)
  • 08:38 elukey: moved some log files from /var/log/upstart/$logname.log.1 to /var/log/upstart/$logname.log.1.bis on labvirt1014, labtestvirt2001, labtestnet2001, labnet1001 to reduce cronspam
  • 08:15 moritzm: installing icu security updates on trusty (jessie already fixed)
  • 08:07 moritzm: installing icoutils security update on trusty (jessie already fixed)
  • 07:26 moritzm: installing python-imaging/pillow security updates on trusty (jessie already fixed)
  • 07:07 marostegui: Deploy alter table enwiki.revision db1080 - T132416
  • 07:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1080 - T132416 (duration: 00m 41s)
  • 07:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1083 - T132416 (duration: 00m 41s)
  • 02:31 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 12m 36s)

2017-03-13

  • 23:32 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1054 after upgrade with low weight (duration: 00m 41s)
  • 22:29 bawolff@tin: Synchronized php-1.29.0-wmf.15/extensions/SemanticForms/includes/SF_ValuesUtils.php: Backport bb42c6f401b9 (duration: 00m 48s)
  • 21:40 bawolff: Deployed fix for T160266
  • 20:45 addshore: InterwikiSorting deploy (to group0) done
  • 20:43 addshore@tin: Synchronized wmf-config/CommonSettings.php: T150183 Enable InterwikiSorting on group0 #1 #2 PT 4/4 (duration: 00m 40s)
  • 20:42 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: T150183 Enable InterwikiSorting on group0 #1 #2 PT 3/4 (duration: 00m 41s)
  • 20:41 addshore@tin: Synchronized docroot/noc/conf/interwikisorting.dblist: T150183 Enable InterwikiSorting on group0 #1 #2 PT 2/4 NOOP (duration: 00m 42s)
  • 20:39 addshore@tin: Synchronized dblists/interwikisorting.dblist: T150183 Enable InterwikiSorting on group0 #1 #2 PT 1/4 (duration: 00m 51s)
  • 18:37 dcausse@tin: Synchronized php-1.29.0-wmf.15/extensions/CirrusSearch/: Make incoming link counting compatible with 5.x (duration: 00m 53s)
  • 18:06 jynus: chowning /var/lib/git/operations/puppet to gitpuppet on labscontrol1002
  • 18:03 jynus: chowning /var/lib/git/operations/puppet to gitpuppet on labscontrol1001
  • 17:46 reedy@tin: Synchronized wmf-config/throttle.php: Throttle rule for event currently ongoing (duration: 00m 43s)
  • 17:29 gehel: re-configuring cluster settings after elasticsearch upgrade - T158680
  • 17:29 dcausse: done re-enabling writes to elastic@codfw (elastic5 upgrade)
  • 17:28 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: [es5 upgrade] step 2: repool codfw and send wmf16 to codfw 3/3 (duration: 00m 41s)
  • 17:26 dcausse@tin: Synchronized wmf-config/CirrusSearch-common.php: [es5 upgrade] step 2: repool codfw and send wmf16 to codfw 2/3 (duration: 00m 44s)
  • 17:24 dcausse@tin: Synchronized wmf-config/CommonSettings.php: [es5 upgrade] step 2: repool codfw and send wmf16 to codfw 1/3 (duration: 00m 46s)
  • 17:23 gehel@tin: Finished deploy [wdqs/wdqs@202a106]: (no justification provided) (duration: 01m 46s)
  • 17:22 gehel@tin: Started deploy [wdqs/wdqs@202a106]: (no justification provided)
  • 17:19 jynus: stopping mariadb at db1054 and preparing for backup and reimage
  • 17:18 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1003.eqiad.wmnet
  • 16:57 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1054 for upgrade (duration: 00m 53s)
  • 16:55 godog: outdated swift rings pushed in eqiad-prod, pushed again updated rings from git repo - T158337
  • 16:35 godog: add ms-be2028/29/30 to swift codfw-prod, initial add - T158337
  • 16:25 gehel: restarting elasticsearch on all codfw cluster after upgrade - T158680
  • 16:23 gehel: restarting elasticsearch on elastic2001 after upgrade - T158680
  • 16:06 gehel: upgrading plugins to 5.1.2 on elasticsearch codfw - T158680
  • 15:41 gehel: shutting down elasticsearch on codfw for v5.1.2 upgrade - T158680
  • 15:21 dcausse: elastic@codfw stopped to receive writes
  • 15:21 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: [es5 upgrade] step 1: depool codfw for writes 2/2 (duration: 00m 44s)
  • 15:19 dcausse@tin: Synchronized wmf-config/CommonSettings.php: [es5 upgrade] step 1: depool codfw for writes 1/2 (duration: 00m 45s)
  • 14:33 marostegui: Deploy alter table enwiki.revision db1083 - T132416
  • 14:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1083 - T132416 (duration: 00m 41s)
  • 14:25 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 - T132416 (duration: 00m 41s)
  • 13:01 hashar@tin: Synchronized wmf-config/CommonSettings.php: +$wgAvailableRights[] = autoreviewrestore; (duration: 00m 41s)
  • 12:08 ema: restart pybal on lvs1003 to add swift-https_443
  • 12:05 moritzm: install libevent security updates
  • 11:56 elukey: reimage analytics1042 (Hadoop worker node) to Debian Jessie
  • 11:15 godog: bounce pybal on lvs1006 to try picking up swift https changes
  • 11:06 zeljkof: purge bswiki logo - T158815
  • 10:44 Dereckson: Update site statistics on gu.wikipedia (T160328)
  • 09:23 gehel: downgrading elasticsearch to v5.1.2 on relforge, a full reindex will be needed - T156150
  • 08:40 marostegui: Compress dewiki - db1070 - T153743
  • 08:31 marostegui: Stop replication on labsdb1009,10 and 11 - T153743
  • 08:30 marostegui: Stop MySQL on db1095 (sanitarium2) to take a backup - T153743
  • 08:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 - T153743 (duration: 00m 41s)
  • 08:08 marostegui: Deploy alter table s6 - db1050 (master) - T159414
  • 08:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1030 - T159414 (duration: 00m 41s)
  • 07:46 moritzm: upgrading apache on remaining mediawiki servers in eqiad
  • 07:24 marostegui: Deploy alter table enwiki.revision db1089 - T132416
  • 07:24 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T132416 (duration: 00m 41s)
  • 07:13 marostegui: Deploy alter table s6 revision table on db1030 - T159414
  • 07:12 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1030 - T159414 (duration: 00m 52s)
  • 06:52 elukey: powercycle mw2256, stuck in boot (looked in the console)
  • 02:28 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Mar 13 02:28:18 UTC 2017 (duration 5m 21s)
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 08m 57s)

2017-03-12

  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Mar 12 02:25:37 UTC 2017 (duration 5m 32s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 07m 25s)

2017-03-11

  • 08:39 jynus: powercycle es2015 - unresponsive
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 07m 41s)
  • 00:19 smalyshev@tin: Finished deploy [wdqs/wdqs@UNKNOWN]: Deploy new updater on 1003 for potential connection drop fix (duration: 02m 15s)
  • 00:16 smalyshev@tin: Started deploy [wdqs/wdqs@UNKNOWN]: Deploy new updater on 1003 for potential connection drop fix
  • 00:08 smalyshev@tin: Finished deploy [wdqs/wdqs@UNKNOWN]: Deploy new updater on 1003 for potential connection drop fix (duration: 00m 16s)
  • 00:08 smalyshev@tin: Started deploy [wdqs/wdqs@UNKNOWN]: Deploy new updater on 1003 for potential connection drop fix
  • 00:07 SMalyshev: going to deploy updater patch on wdq1003. The host is in maintenance, not a production deployment.

2017-03-10

  • 21:42 hashar: restarted Zuul
  • 20:06 gehel: restart kartotherian / tilerator(ui) on maps-test*
  • 20:06 gehel@tin: Finished deploy [kartotherian/deploy@76adf21]: (no justification provided) (duration: 00m 54s)
  • 20:05 gehel@tin: Started deploy [kartotherian/deploy@76adf21]: (no justification provided)
  • 20:03 gehel@tin: Finished deploy [tilerator/deploy@b501046]: (no justification provided) (duration: 00m 16s)
  • 20:03 gehel@tin: Started deploy [tilerator/deploy@b501046]: (no justification provided)
  • 19:57 gehel: restarting tilerator(ui) on maps-test2004
  • 19:57 gehel@tin: Finished deploy [tilerator/deploy@b501046]: (no justification provided) (duration: 00m 04s)
  • 19:57 gehel@tin: Started deploy [tilerator/deploy@b501046]: (no justification provided)
  • 19:47 gehel: restarting tilerator(ui) on maps-test2004
  • 19:47 gehel@tin: Finished deploy [tilerator/deploy@b501046]: (no justification provided) (duration: 00m 03s)
  • 19:47 gehel@tin: Started deploy [tilerator/deploy@b501046]: (no justification provided)
  • 19:45 gehel: failed tilerator deploy on maps-test2004
  • 19:45 gehel@tin: Finished deploy [tilerator/deploy@b501046]: (no justification provided) (duration: 01m 20s)
  • 19:44 gehel@tin: Started deploy [tilerator/deploy@b501046]: (no justification provided)
  • 19:36 ejegg: ran wmf_civicrm db updates through 7500 - Add benevity as a financial type for benevity imports.
  • 19:34 gehel: restart kartotherian on maps-test2004
  • 19:28 gehel@tin: Finished deploy [kartotherian/deploy@76adf21]: (no justification provided) (duration: 00m 23s)
  • 19:27 gehel@tin: Started deploy [kartotherian/deploy@76adf21]: (no justification provided)
  • 19:19 gehel: upgrading kartotherian on maps-test2004 - T150354
  • 19:07 MaxSem: Unmasked kartotherian on maps-test2004
  • 18:28 smalyshev@tin: Finished deploy [wdqs/wdqs@1f2973c]: Deploy new updater on 1003 for potential connection drop fix (duration: 00m 03s)
  • 18:28 smalyshev@tin: Started deploy [wdqs/wdqs@1f2973c]: Deploy new updater on 1003 for potential connection drop fix
  • 17:28 ottomata: installed librdkafka 0.9.4 via dpkg -i on cp1052 (cache text) and restarted varnishkafka in preparation for fleet upgrade next week
  • 17:24 ottomata: installed librdkafka 0.9.4 via dpkg -i on cp1058 (cache misc) and restarted varnishkafka in preparation for fleet upgrade next week
  • 16:44 papaul: oresrdb2002 - signing puppet certs, salt-key, initial run
  • 16:25 elukey: reboot mw22(5[1-9]|60) to enable mw-cgroup mountpoint
  • 15:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1030 - T159414 (duration: 02m 42s)
  • 15:18 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1022 - T159414 (duration: 00m 45s)
  • 15:03 marostegui: Stop slave db2033 for maintenance - T159707
  • 14:05 hashar: contint1001 and contint2001 : Migrating git-daemon to systemd . Would stop zuul merger briefly
  • 13:58 elukey: added 3 new MW api-appservers (mw2251-53) and 7 new appservers (mw2254-60) to codfw
  • 13:35 hashar: Restarting Jenkins. Deadlocks in ssh connections. T160168
  • 07:28 moritzm: upgrading libarchive on trusty systems (jessie already fixed)
  • 07:13 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Added weight 1 for db1061 - T159414 (duration: 00m 40s)
  • 07:13 marostegui: Deploy alter table s6 revision table on db1022 - T159414
  • 07:09 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1022 - T159414 (duration: 00m 41s)
  • 04:29 mutante: codfw mw jobrunner: they start but then fail again shortly after: mw2248 jobrunner[67314]: [Fri Mar 10 04:23:07 2017] [hphp] [67314:7f6a34b746c0:0:000024] [] LightProcess::closeShadow failed due to exception: Failed in afdt::sendRaw: Broken pipe
  • 04:12 mutante: more codfw appservers ... - systemctl start jobchron, systemctl start jobrunner (both were failed but are now active (running)
  • 04:09 mutante: mw2155 - systemctl start jobchron, systemctl start jobrunner (both were failed but are now active (running)
  • 04:02 mutante: mw2249 systemctl start jobrunner - now Active: active (running)
  • 03:56 mutante: codfw appserver jobrunner service fail related to https://gerrit.wikimedia.org/r/#/c/259660/ ?
  • 03:54 mutante: codfw appservers showing "systemd degraded" alerts are failed jobrunner service unit. after puppet-agent "Mediawiki::Jobrunner/Package[jobrunner]/ensure) ensure changed..." ..then jobrunner.service: main process exited, code=exited, status=143/n/a
  • 02:51 AaronSchulz: Restarted job services for 5101424 (statsd batching) after monitoring mw1161
  • 02:39 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Mar 10 02:39:25 UTC 2017 (duration 5m 28s)
  • 02:33 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 12m 17s)
  • 00:54 ppchelko@tin: Finished deploy [trending-edits/deploy@1673068]: Replayed events are purged based on current timestamp T160136 (duration: 06m 24s)
  • 00:48 ppchelko@tin: Started deploy [trending-edits/deploy@1673068]: Replayed events are purged based on current timestamp T160136
  • 00:39 ppchelko@tin: Finished deploy [trending-edits/deploy@a5716b9]: Replayed events are purged based on current timestamp T160136 (duration: 02m 23s)
  • 00:38 dereckson@tin: Synchronized php-1.29.0-wmf.15/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTargetLoader.js: ArticleTargetLoader: wikitext switch shouldn't require FullRestbaseURL (T158692) (duration: 00m 41s)
  • 00:37 ppchelko@tin: Started deploy [trending-edits/deploy@a5716b9]: Replayed events are purged based on current timestamp T160136
  • 00:31 ppchelko@tin: Finished deploy [trending-edits/deploy@a5716b9]: Replayed events are purged based on current timestamp T160136 (duration: 07m 17s)
  • 00:30 eileen: update CiviCRM from d20ed40 to 31f19d6
  • 00:24 ppchelko@tin: Started deploy [trending-edits/deploy@a5716b9]: Replayed events are purged based on current timestamp T160136
  • 00:22 dereckson@tin: Synchronized wmf-config/CommonSettings.php: Move NavigationTiming config to EventLogging section + Remove setting of unused $wgPercentHHVM (Gerrit:342147 and Gerrit:342149, no-op) (duration: 00m 40s)
  • 00:19 maxsem@tin: Finished deploy [tilerator/deploy@160f314]: https://gerrit.wikimedia.org/r/#/c/342153/ - revert submodule updates due to broken manik->libc dependency (duration: 00m 16s)
  • 00:19 maxsem@tin: Started deploy [tilerator/deploy@160f314]: https://gerrit.wikimedia.org/r/#/c/342153/ - revert submodule updates due to broken manik->libc dependency

2017-03-09

  • 22:50 mutante: prometheus1003/1004 - systemctl stop prometheus (as opposed to /etc/init.d/prometheus), as they are low on disk but are not in production yet
  • 22:49 maxsem@tin: Finished deploy [tilerator/deploy@fb06c99]: https://gerrit.wikimedia.org/r/#/c/342140/ (duration: 00m 05s)
  • 22:48 maxsem@tin: Started deploy [tilerator/deploy@fb06c99]: https://gerrit.wikimedia.org/r/#/c/342140/
  • 22:46 mutante: prometheus1003 - stopping service: [....] Stopping monitoring system and time series database: prometheusInvalid --pidfile argument: '/var/run/prometheus/prometheus.pid' (Parent directory does not exist)
  • 22:46 maxsem@tin: Finished deploy [tilerator/deploy@fb06c99]: https://gerrit.wikimedia.org/r/#/c/342140/ (duration: 00m 21s)
  • 22:45 maxsem@tin: Started deploy [tilerator/deploy@fb06c99]: https://gerrit.wikimedia.org/r/#/c/342140/
  • 22:18 maxsem@tin: Finished deploy [tilerator/deploy@367df80]: no-op (duration: 00m 22s)
  • 22:18 maxsem@tin: Started deploy [tilerator/deploy@367df80]: no-op
  • 22:00 mobrovac@tin: Finished deploy [trending-edits/deploy@57a654e]: Bump max_pages for T156411 (duration: 06m 07s)
  • 21:54 mobrovac@tin: Started deploy [trending-edits/deploy@57a654e]: Bump max_pages for T156411
  • 21:37 mutante: fluorine - puppet node clean, puppet node deactivate, salt-key -d, remove from Icinga.. (T159996)
  • 21:35 mutante: fluorine - shutdown -h now (decom) T159996
  • 20:09 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.29.0-wmf.15
  • 20:02 mutante: cobalt: remove crontab entry of user gerrit2 that created reviewer counts, gzip /var/www/reviewer-counts.json and moved to /root/ for backup (re: gerrit:341592) T54329
  • 19:53 reedy@tin: Synchronized php-1.29.0-wmf.15/extensions/ConfirmEdit: Fixup maintenance script (duration: 00m 43s)
  • 19:22 legoktm: foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php linter
  • 18:21 moritzm: rebooting cp1008 for upgrade to Linux 4.9
  • 17:50 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet
  • 17:45 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1004.eqiad.wmnet
  • 17:11 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1003.eqiad.wmnet
  • 17:11 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1002.eqiad.wmnet
  • 17:11 bblack: reboot lvs1001 (post-incident cleanup reboot)
  • 17:02 bblack: reboot lvs1004 (post-incident cleanup reboot)
  • 16:58 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1001.eqiad.wmnet
  • 16:10 elukey: remove Piwik/bohrium health check from Varnish cache misc (https://gerrit.wikimedia.org/r/#/c/342007/)
  • 15:17 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1085 - T159414 (duration: 00m 41s)
  • 15:07 reedy@tin: Synchronized php-1.29.0-wmf.15/extensions/ConfirmEdit: Fixup maintenance script (duration: 00m 43s)
  • 15:02 moritzm: installing nettle security updates
  • 14:42 zeljkof: EU SWAT finished
  • 14:39 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add HD logos for several projects (T150618) (duration: 00m 41s)
  • 14:38 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Add HD logos for several projects (T150618) (duration: 00m 42s)
  • 14:35 moritzm: removed cn=svn group from LDAP directory (Bug: T129788)
  • 14:25 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: throttle] Add new throttle rule+remove expired rules (T159957) (duration: 00m 45s)
  • 14:15 addshore@tin: Synchronized wmf-config/CommonSettings.php: Don't show rdf2latex table hint with ElectronPdfService enabled T157432 (duration: 00m 49s)
  • 13:53 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1051 with normal weight after warmup (duration: 00m 40s)
  • 13:52 moritzm: removed cn=svnadm group from LDAP directory (Bug: T129788)
  • 13:46 moritzm: removed cn=trebuchet group from LDAP directory (Bug: T129788)
  • 13:43 gehel: invalidating Tasmania zoom level 10 tiles in varnish - T159631
  • 13:21 marostegui: Deploy alter table s6 revision table on db1085 - T159414
  • 13:21 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1085 - T159414 (duration: 00m 41s)
  • 13:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1088 - T159414 (duration: 00m 43s)
  • 12:34 moritzm: rebooting multatuli to Linux 4.9
  • 12:23 jynus: purging old rc rows from non-production database replicas
  • 11:24 marostegui: Stop replication db2033 - T159707
  • 10:49 marostegui: Deploy alter table s6 revision table on db1088 - T159414
  • 10:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1088 - T159414 (duration: 00m 41s)
  • 10:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1093 - T159414 (duration: 00m 42s)
  • 10:25 ema: service systemd-sysctl restart on lvs hosts
  • 08:21 marostegui: Deploy alter table s6 revision table on db1093 - T159414
  • 08:13 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1093 - T159414 (duration: 00m 49s)
  • 08:10 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1051 after maintenance with low weight (duration: 00m 43s)
  • 05:24 bblack: poweroff lvs1001 from idrac
  • 03:15 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Mar 9 03:15:39 UTC 2017 (duration 5m 53s)
  • 03:09 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 14m 35s)
  • 02:36 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 14m 34s)
  • 01:08 twentyafterfour: phabricator update complete.
  • 01:06 twentyafterfour: updating phabricator to tag release/2017-03-08/1
  • 00:54 mutante: iridium - tested stop/start of phd service with upstart, unlink /etc/init.t/phd which was the formerly used symlink to a phab php script
  • 00:41 mutante: iridium - re-enable puppet, convert to base::service unit, phd restarting
  • 00:36 mutante: iridium - temp. disable puppet | phab1001 - converting service to base::service_unit (T137928)
  • 00:18 catrope@tin: Synchronized php-1.29.0-wmf.15/extensions/Echo/modules/styles/mw.echo.ui.NotificationBadgeWidget.less: Fix RTL popup alignment (T159999) (duration: 00m 42s)

2017-03-08

  • 22:10 legoktm: resuming running refreshLinks.php on small wikis
  • 21:43 arlolra@tin: Started restart [parsoid/deploy@0c22f72]: (no justification provided)
  • 21:41 legoktm@tin: Synchronized wmf-config/CommonSettings.php: Enable Linter on testwiki - T148609 (2/2) (duration: 00m 41s)
  • 21:39 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Linter on testwiki - T148609 (1/2) (duration: 00m 44s)
  • 21:38 legoktm: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki linter
  • 21:31 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.15
  • 21:31 twentyafterfour@tin: Synchronized php-1.29.0-wmf.15/extensions/CodeReview/backend/CodeCommentLinker.php: deploy https://gerrit.wikimedia.org/r/#/c/341857/ (duration: 00m 46s)
  • 21:27 arlolra: Updated Parsoid to dec47257 (T59603)
  • 21:19 arlolra@tin: Finished deploy [parsoid/deploy@0c22f72]: Updating Parsoid to dec47257 (duration: 08m 19s)
  • 21:11 arlolra@tin: Started deploy [parsoid/deploy@0c22f72]: Updating Parsoid to dec47257
  • 19:54 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Reenable Collection on srn.wikipedia (T158467) (duration: 00m 46s)
  • 19:43 madhuvishy: Upgraded nslcd and libnss-ldapd in labstore100[1,2,4,5]
  • 19:36 reedy@tin: Synchronized php-1.29.0-wmf.14/extensions/ConfirmEdit: Maintenance script updates (duration: 00m 50s)
  • 17:52 marostegui@tin: Synchronized wmf-config/db-eqiad.php: db1070 ROW based replication comments - T153743 (duration: 00m 41s)
  • 17:28 Pchelolo: update RESTBase to 20e2c44c
  • 17:25 Pchelolo: update RESTBase to 20e2c44c: canary on restbase1007
  • 17:23 Pchelolo: update RESTBase to 20e2c44c: staging
  • 17:21 moritzm: installing Ubuntu imagemagick security updates (jessie already fixed)
  • 16:13 marostegui: Deploy alter table s6 revision table on dbstore1002 - T159414
  • 16:06 mobrovac@tin: Finished deploy [eventstreams/deploy@78e248c]: Deploy for T159486 (duration: 01m 48s)
  • 16:04 mobrovac@tin: Started deploy [eventstreams/deploy@78e248c]: Deploy for T159486
  • 15:37 moritzm: uploaded firmware-nonfree 20161130 for jessie-wikimedia/experimental to apt.wikimedia.org
  • 15:33 reedy@tin: Synchronized wmf-config/CommonSettings.php: Remove EducationProgram config back compat (duration: 00m 41s)
  • 15:32 reedy@tin: Synchronized wmf-config/flaggedrevs.php: Whitespace (duration: 00m 41s)
  • 15:29 moritzm: uploaded linux 4.9.13 for jessie-wikimedia/experimental to apt.wikimedia.org
  • 15:19 elukey: rebooting mw22(5[4-9]|60) as part of sanity check for T155180
  • 15:08 elukey: rebooting mw225[123] as part of sanity check for T155180
  • 14:42 zeljkof: EU SWAT finished
  • 14:42 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add HD logos for several projects (T150618) (duration: 00m 41s)
  • 14:41 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Add HD logos for several projects (T150618) (duration: 00m 44s)
  • 14:27 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update logo for bswiki (Bosnian Wikipedia) (T158815) (duration: 00m 41s)
  • 14:26 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Update logo for bswiki (Bosnian Wikipedia) (T158815) (duration: 00m 41s)
  • 14:16 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule (T159803) (duration: 00m 41s)
  • 13:42 marostegui: Deploy alter table s6 revision table on db1023 - T159414
  • 13:11 godog: make mwlog1001 the primary logging host, deprecate fluorine
  • 12:35 godog: add mwlog[12]001 to analytics-in4 term rsync-http-https - T123728
  • 11:35 moritzm: installing texlive-base security updates
  • 10:34 jynus: restarting labsdb1004's mariadb T159572
  • 10:31 marostegui: Shutdown postgresql on labsdb1007 for maintenance - T157359
  • 10:12 elukey: reimage analytics1041 to Debian Jessie
  • 09:51 gehel: re-enabled waterline import on maps[12]001 - T159631
  • 09:39 marostegui: Stop replication on db2033 - T159707
  • 09:07 ariel@tin: Finished deploy [dumps/dumps@e30fbd0]: run monitor.py relative to cwd, to pick up default config files (duration: 00m 02s)
  • 09:07 ariel@tin: Started deploy [dumps/dumps@e30fbd0]: run monitor.py relative to cwd, to pick up default config files
  • 09:00 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 - T153743 (duration: 00m 41s)
  • 08:36 moritzm: upgrading apache on mw1161-mw1208
  • 08:36 marostegui: Restart mysql on db1070 to change binlog to ROW - T153743
  • 08:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 - T153743 (duration: 00m 41s)
  • 07:27 marostegui: Start pt-table-checksum on plwiki (s2) - T154485
  • 07:19 marostegui: Deploy alter table s6 revision table on db1061 - T159414
  • 07:13 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1060 original weight - T158193 (duration: 00m 47s)
  • 03:40 krinkle@tin: Synchronized docroot/noc/: Fix conftool link (I2f34be0a5), Remove IE6 css (Iae8a356e2), add db-codfw.php (I9f02dee3c) (duration: 00m 42s)
  • 03:17 bblack: authdns back to normal (puppet enabled, do normal things!)
  • 03:09 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Mar 8 03:09:21 UTC 2017 (duration 5m 49s)
  • 03:03 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.15) (duration: 15m 08s)
  • 02:46 bblack: disabling puppet on production authdns caches (testing dns lint related bits)
  • 02:29 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 07m 53s)
  • 01:33 demon@tin: Synchronized scap/plugins/clean.py: no-op (duration: 00m 41s)
  • 00:35 mobrovac@tin: Finished deploy [electron-render/deploy@5ec5614]: (no justification provided) (duration: 00m 59s)
  • 00:34 mobrovac@tin: Started deploy [electron-render/deploy@5ec5614]: (no justification provided)
  • 00:33 mobrovac@tin: Finished deploy [electron-render/deploy@5ec5614]: (no justification provided) (duration: 04m 08s)
  • 00:29 mobrovac@tin: Started deploy [electron-render/deploy@5ec5614]: (no justification provided)
  • 00:27 mobrovac@tin: Finished deploy [electron-render/deploy@5ec5614]: Deploy for T159486 (duration: 04m 46s)
  • 00:27 mobrovac@tin: Finished deploy [mobileapps/deploy@d6202e4]: Deploy for T159486 (duration: 03m 52s)
  • 00:26 catrope@tin: Synchronized php-1.29.0-wmf.15/extensions/Echo/modules/ui/: Fix regression in Echo popup (duration: 00m 42s)
  • 00:23 mobrovac@tin: Started deploy [mobileapps/deploy@d6202e4]: Deploy for T159486
  • 00:23 mobrovac@tin: Started deploy [electron-render/deploy@5ec5614]: Deploy for T159486
  • 00:22 mobrovac@tin: Finished deploy [mathoid/deploy@83f80ee]: Deploy for T159486 (duration: 04m 53s)
  • 00:22 mobrovac@tin: Finished deploy [graphoid/deploy@485ca11]: Deploy for T159486 (duration: 04m 45s)
  • 00:20 mobrovac@tin: Finished deploy [electron-render/deploy@51cff8a]: Deploy for T159486 (duration: 03m 29s)
  • 00:19 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Modify add/remove groups for flood group on wikitech (duration: 00m 42s)
  • 00:18 mobrovac@tin: Started deploy [mathoid/deploy@83f80ee]: Deploy for T159486
  • 00:17 mobrovac@tin: Finished deploy [cxserver/deploy@7e22281]: Deploy for T159486 (duration: 02m 24s)
  • 00:17 mobrovac@tin: Started deploy [graphoid/deploy@485ca11]: Deploy for T159486
  • 00:17 mobrovac@tin: Started deploy [electron-render/deploy@51cff8a]: Deploy for T159486
  • 00:16 mobrovac@tin: Finished deploy [changeprop/deploy@99280e3]: Deploy for T159486 (duration: 01m 09s)
  • 00:16 mobrovac@tin: Finished deploy [trending-edits/deploy@88e2f74]: Deploy changes for T156666 T156680 T159486 T156411 (duration: 06m 58s)
  • 00:15 mobrovac@tin: Started deploy [cxserver/deploy@7e22281]: Deploy for T159486
  • 00:15 mobrovac@tin: Started deploy [changeprop/deploy@99280e3]: Deploy for T159486
  • 00:13 Reedy: Clear 2FA for "User:Steven Walling"; identity confirmed via facebook
  • 00:09 mobrovac@tin: Started deploy [trending-edits/deploy@88e2f74]: Deploy changes for T156666 T156680 T159486 T156411
  • 00:08 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Disable wgCiteResponsiveReferences by default for back-compat (T33597) (duration: 00m 41s)

2017-03-07

  • 23:38 mutante: gerrit restarting for config changes 341701, 341587
  • 22:45 papaul: ms-be2028-ms-be2039 - signing puppet certs, salt-key, initial run
  • 22:11 mobrovac@tin: Finished deploy [citoid/deploy@5a7e053]: Deploy for T158675 T103478 T159486 (duration: 02m 36s)
  • 22:08 mobrovac@tin: Started deploy [citoid/deploy@5a7e053]: Deploy for T158675 T103478 T159486
  • 22:02 mobrovac@tin: Finished deploy [zotero/translators@35da336]: Update transators for T158675 (duration: 00m 06s)
  • 22:01 mobrovac@tin: Started deploy [zotero/translators@35da336]: Update transators for T158675
  • 21:59 mobrovac@tin: Finished deploy [trending-edits/deploy@f855460]: (no justification provided) (duration: 04m 48s)
  • 21:54 mobrovac@tin: Started deploy [trending-edits/deploy@f855460]: (no justification provided)
  • 21:40 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.29.0-wmf.15 refs T158996
  • 21:30 twentyafterfour@tin: Finished scap: bump test wikis to 1.29.0-wmf.5 refs T158996 (duration: 53m 17s)
  • 21:23 mutante: mw1177 - service hhvm restart
  • 20:37 twentyafterfour@tin: Started scap: bump test wikis to 1.29.0-wmf.5 refs T158996
  • 20:29 mutante: iridium - re-enabling puppet, ssh-phab service converted to base::service_unit, upstart template moved but unchanged, service restarted just fine.
  • 20:27 mutante: phab2001 - phab-ssh service converted to base::service_unit and with working systemd unit file. 'systemctl ssh-phab status' is active (running) (T158434)
  • 20:26 ottomata: installing librdkafka 0.9.4 on cp1045 (cache misc host) via .deb package to try it with varnishkafka in prod (ping bblack, ema, just in case)
  • 20:23 mutante: iridium - temp disabled puppet - converting phab-ssh service to base::service_unit, systemd on phab2001, upstart on iridium
  • 19:23 twentyafterfour: branching 1.29.0-wmf15 refs T158996
  • 19:20 bblack: rebooting baham (ns1) AGAIN - low cpu frequencies issues like T147905 - checking bios/idrac stuff
  • 19:08 bblack: rebooting baham (ns1) - low cpu frequencies issues like T147905
  • 18:52 volans: rmmod acpi_pad on baham, was using 100% CPU T137647
  • 18:37 mobrovac: restbase deploy start of cd53670b
  • 16:58 akosiaris: re-increase temporarily the client-output-buffer-limit for rbd1007, phab task filling to follow
  • 16:40 akosiaris: decrease client-output-buffer-limit soft-limit back to normal values
  • 16:22 filippo@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe1008.eqiad.wmnet
  • 16:22 filippo@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe1007.eqiad.wmnet
  • 16:22 filippo@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe1006.eqiad.wmnet
  • 16:22 filippo@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe1005.eqiad.wmnet
  • 15:28 joal@tin: Finished deploy [analytics/aqs/deploy@e0da1bd]: (no justification provided) (duration: 06m 08s)
  • 15:22 joal@tin: Started deploy [analytics/aqs/deploy@e0da1bd]: (no justification provided)
  • 15:15 akosiaris: increase client-output-buffer-limit soft-limit to 500MB temporarily on rdb1007
  • 14:46 jynus: restart labsdb1004 for config and data check
  • 14:32 moritzm: uploaded HHVM 3.18 builds of hhvm-tidy, hhvm-luasandbox and hhvm-wikidiff2 to the experimental section of apt.wikimedia.org (Bug: T158176)
  • 14:03 reedy@tin: Synchronized docroot/: Fixup filebackend symlinks (duration: 00m 41s)
  • 13:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1060 weight - T158193 (duration: 00m 58s)
  • 12:53 marostegui: Just for the sake of having it logged: gtid_domain_id has been deployed in all the database servers - T149418
  • 12:53 elukey: analytics1040 back in service - testing the new Debian configuration
  • 12:39 marostegui: Deploy ALTER table on db2028 (codfw s6 master) on the revision table - T159414
  • 12:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060 with less weight - T158193 (duration: 00m 40s)
  • 12:19 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2053 - T159414 (duration: 00m 43s)
  • 12:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2034 - T132416 (duration: 00m 50s)
  • 11:41 gehel: cleaning empty log file on elastic2001 (cronspam)
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=trendingedits'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=pdfrender'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=eventstreams'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=ores'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=cxserver'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=apertium'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=citoid'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=graphoid'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=mathoid'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2006.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=mobileapps'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=trendingedits'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=pdfrender'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=eventstreams'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=ores'])
  • 11:33 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=cxserver'])
  • 11:32 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=apertium'])
  • 11:32 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=citoid'])
  • 11:32 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=graphoid'])
  • 11:32 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=mathoid'])
  • 11:32 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: scb2005.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=mobileapps'])
  • 11:27 elukey: end of hacking on install1002 (puppet re-enabled)
  • 09:23 ema: cache_text, cache_upload: upgrading to varnish 4.1.5 T159424
  • 09:10 elukey: temporary live hacking analytics-flex.cfg partman config on install1002
  • 08:25 moritzm: installing systemd bugfix updates from jessie point release
  • 07:39 marostegui: Stop MySQL db1067 to clone db1060 from it - T158193
  • 07:16 marostegui: Deploy ALTER table on db2053 (s6) for the revision table - T159414
  • 07:16 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2053 - T159414 (duration: 00m 41s)
  • 05:22 Krinkle: foreachwikiindblist 'all - closed - private' deleteEqualMessages.php (T45917) - purge upstreamed translations from remaining wikis
  • 03:28 Krinkle: foreachwikiindblist closed deleteEqualMessages.php (T45917) - purge upstreamed translations from closed wikis
  • 02:28 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Mar 7 02:28:59 UTC 2017 (duration 5m 32s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 08m 19s)
  • 00:49 RainbowSprinkles: gerrit: coming back online now
  • 00:43 RainbowSprinkles: gerrit: taking offline for a minute or two for case-insensitive login conversion
  • 00:39 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: In CSP policy for foundationwiki, wikidata.org -> www.wikidata.org (duration: 00m 40s)
  • 00:19 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Add other WMF domains to foundationwiki CSP policy for Special:HideBanners (duration: 00m 40s)
  • 00:00 mobrovac: restbase restarting in labs for T158628

2017-03-06

  • 22:14 awight: update payments-wiki config to a591e4c
  • 21:51 mutante: bast3001 - powerdown (T159480), decom in progress
  • 21:48 mutante: bast3001 - schedule downtime for host and all services in Icinga, remove from puppet, salt .. (T159480)
  • 21:36 hashar@tin: Synchronized static/images/project-logos: [fixup] Fix up wrongly updated sr.wikibooks and bs.wiktionary logos - T159542 T159534 (duration: 00m 42s)
  • 21:02 matt_flaschen: populateContentModel.php --wiki=cawiki --ns=103 run for revision, archive, page . T159047 complete
  • 21:00 mattflaschen@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Flow for Viquiprojecte Discussió on cawiki (duration: 00m 40s)
  • 20:46 ottomata: removing old cdh packages from thirdparty component in apt
  • 20:34 gehel: reimport waterlines data on maps1001.eqiad.wmnet - T159631
  • 20:34 matt_flaschen: For T159047
  • 20:34 matt_flaschen: Ran (time mwscript extensions/Flow/maintenance/convertNamespaceFromWikitext.php --wiki=cawiki 'Viquiprojecte_Discussió') 2>&1|tee --append ~/2017-03-02_cawiki_convertNamespacesFromWikitext_Viquiprojecte_Discussió.log
  • 20:26 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Disable Cognate on beta wiktionary sites T156241 Beta Only (duration: 00m 46s)
  • 20:11 thcipriani@tin: Synchronized wmf-config: SWAT: Enable Cognate for beta wiktionaries T156241 beta-only change (duration: 00m 43s)
  • 20:05 ejegg: updated payments-wiki from 66d8125 to f991f15
  • 20:05 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create "flood" flag for labswiki (duration: 00m 40s)
  • 19:53 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Add "flow-create-board" to CommonSettings.php for global groups (duration: 00m 40s)
  • 19:52 gehel: restarting wdqs-updater on wdqs* servers to activate GC logs - T159248
  • 19:43 thcipriani: mwscript migrateUserGroup.php --wiki=trwiki 'technician' 'interface-editor' on terbium for T159636
  • 19:43 thcipriani@tin: Synchronized wmf-config: SWAT: Rename "technician" to "interface-editor" on trwiki T144638 (duration: 00m 46s)
  • 19:41 gehel@tin: Finished deploy [wdqs/wdqs@1f2973c]: (no justification provided) (duration: 01m 25s)
  • 19:39 gehel@tin: Started deploy [wdqs/wdqs@1f2973c]: (no justification provided)
  • 19:24 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1060 (duration: 00m 40s)
  • 18:22 elukey: analytics1040 has been silenced and it is not ready to work, need to fix its partman recipe
  • 18:15 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2060 - T159414 (duration: 00m 44s)
  • 18:04 gehel@tin: Finished deploy [wdqs/wdqs@7b77735]: (no justification provided) (duration: 01m 46s)
  • 18:03 demon@tin: Synchronized wmf-config/interwiki.php: Sync interwiki list, T159680 (duration: 00m 41s)
  • 18:02 gehel@tin: Started deploy [wdqs/wdqs@7b77735]: (no justification provided)
  • 15:01 hashar: restarting Jenkins
  • 14:59 addshore: EU SWAT done
  • 14:50 chasemp: labnet1001 'service nova-fullstack restart'
  • 14:44 addshore@tin: Synchronized wmf-config/extension-list-labs: Remove InterwikiSorting and add Cognate to extension-list-labs T150183 T156241 BETA ONLY (duration: 00m 39s)
  • 14:42 addshore@tin: Synchronized wmf-config/extension-list: Add InterwikiSorting extension to prod extension-list T150183 NOOP (duration: 00m 38s)
  • 14:39 addshore@tin: Synchronized wmf-config/db-labs.php: SWAT: Create extension1 db cluster for beta T156241 BETA ONLY (duration: 00m 39s)
  • 14:37 addshore@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Add a CSP policy to foundationwiki to prevent privacy breach T159386 (duration: 00m 39s)
  • 14:23 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change account creation throttle for idwiki to default (6) (duration: 00m 39s)
  • 14:15 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Translation memories multi-DC support T132076 2/2 (NOOP) (duration: 00m 42s)
  • 14:13 addshore@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Enable Translation memories multi-DC support T132076 1/2 (duration: 00m 50s)
  • 14:05 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Bs.wiktionary namespace changes T159538 (duration: 00m 40s)
  • 14:00 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: srwikibooks & bswiktionary logos T159534 T159542 2/2 (duration: 00m 39s)
  • 13:58 addshore@tin: Synchronized static/images/project-logos/: SWAT: srwikibooks & bswiktionary logos T159534 T159542 1/2 (duration: 00m 39s)
  • 13:23 godog: reenable puppet on graphite2001
  • 13:07 marostegui: Deploy ALTER table on db2060 (s6) for the revision table - T159414
  • 13:07 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2060 - T159414 (duration: 00m 39s)
  • 13:02 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2046 - T159414 (duration: 00m 50s)
  • 12:45 moritzm: upgrading apache on mw1209-mw1235
  • 12:44 moritzm: upgrading apache on graphite*
  • 11:49 moritzm: installing imagemagick security updates
  • 11:36 moritzm: upgrading apache on krypton
  • 11:30 moritzm: upgrading apache on planet.wikimedia.org
  • 11:05 elukey: reimage the first Hadoop worker node (an1040) to Debian Jessie
  • 10:46 moritzm: upgrading apache on mediawiki servers in codfw
  • 10:36 gehel: upgrade to elasticsearch 5.2.2 on relforge cluster - T156150
  • 10:24 elukey: (shamefully) replaced /etc/init.d/hadoop-hdfs-datanode script with "exit 0" to prevent the HDFS datanode daemon to start on analytics1028 (broken disk) and leave the rest running (puppet included) - T159632
  • 10:12 gehel: postgresql upgrade on maps* (postgresql-9.4 postgresql-9.4-postgis-2.3 postgresql-9.4-postgis-2.3-scripts postgresql-client-9.4 postgresql-client-common postgresql-common postgresql-contrib-9.4)
  • 10:06 ariel@tin: Finished deploy [dumps/dumps@8521be0]: fix: retries of broken runs could except on uninited var (duration: 00m 01s)
  • 10:06 ariel@tin: Started deploy [dumps/dumps@8521be0]: fix: retries of broken runs could except on uninited var
  • 09:46 gehel: postgresql upgrade on maps-test* (postgresql-9.4 postgresql-9.4-postgis-2.3 postgresql-9.4-postgis-2.3-scripts postgresql-client-9.4 postgresql-client-common postgresql-common postgresql-contrib-9.4)
  • 09:14 ariel@tin: Finished deploy [dumps/dumps@04794df]: move default config into a file and clean up (duration: 00m 02s)
  • 09:14 ariel@tin: Started deploy [dumps/dumps@04794df]: move default config into a file and clean up
  • 09:09 gehel: killing stuck tilerator notification on maps-test2001 - T145534
  • 07:22 marostegui: Resume pt-table-checksum on plwiki (s2) - T154485
  • 06:59 marostegui: Deploy ALTER table on db2046 (s6) for the revision table - T159414
  • 06:46 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2046 - T159414 (duration: 00m 51s)
  • 02:24 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Mar 6 02:24:24 UTC 2017 (duration 5m 19s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 07m 15s)
  • 01:29 cwd: updated staging civicrm database and triggers

2017-03-05

  • 22:23 Reedy: Generating some more captchas again T159581
  • 10:19 elukey: disabled puppet on analytics1028 to avoid puppet to start the HDFS daemon (T159632)
  • 02:24 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Mar 5 02:24:02 UTC 2017 (duration 5m 20s)
  • 02:18 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 07m 07s)

2017-03-04

  • 16:43 Reedy: Manually generating even more captchas (going upto 10k total) in screen as reedy on terbium T159581
  • 16:35 Reedy: Manually generating some more captchas T159581
  • 03:28 legoktm: pausing refreshLinks.php run due to increase in job queue
  • 03:05 mutante: planet2001 - and this time it just worked and i can't reproduce the issue. install finished. re-adding to puppet, signing certs...
  • 03:00 mutante: planet2001 - reinstalling once more (T159432)
  • 02:36 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Mar 4 02:36:25 UTC 2017 (duration 5m 19s)
  • 02:31 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 12m 10s)
  • 00:52 mutante: conf2002 - ran "systemctl reset-failed" to fix Icinga alert about broken systemd state due to formerly existing but failed service etcdmirror-eqiad-wmnet. turns out you need this to remove missing units. found on http://serverfault.com/questions/606520/how-to-remove-missing-systemd-units (T131959)

2017-03-03

  • 23:23 RainbowSprinkles: phabricator: restarted apache 1 last time, removed hack
  • 23:19 mutante: icinga: for special external hosts benefactorevents and eventdonations, "submit passive check result for this host" -> "check_tcp -p 80" to avoid "crit hosts" that just don't respond to ICMP (http://www.htmlgraphic.com/nagios-check-host-without-ping/)
  • 23:12 RainbowSprinkles: phabricator: restarting apache real quick
  • 22:03 hashar: rebooting contint2001
  • 21:54 hashar: restarting Jenkins
  • 21:51 hashar: enabling puppet on contint1001 and puppet-run
  • 21:05 hashar: disabled puppet on contint1001
  • 20:26 mattflaschen@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Beta Cluster only (duration: 00m 40s)
  • 19:35 ebernhardson: restart elasticsearch on relforge1002 to update remote reindex whitelist
  • 19:33 ebernhardson: restart elasticsearch on relforge1001 to update remote reindex whitelist
  • 19:11 legoktm: running refreshLinks.php across small wikis
  • 18:43 addshore@tin: Synchronized php-1.29.0-wmf.14/extensions/RevisionSlider/modules/ext.RevisionSlider.css: T159428 Quick fix for misplaced tooltips on RTL wikis (duration: 00m 42s)
  • 17:35 hashar: CI is mostly recovered. It could not spawn instance anymore. The queue is being processed and will take a while to be completed. Check status on https://integration.wikimedia.org/zuul/ | T159543
  • 16:17 hashar: Stopped Jenkins from processing builds while instances are being recycled
  • 13:37 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2067 - T159414 (duration: 00m 50s)
  • 13:12 elukey: removed apache2 (rc state) and apache2-utils from analtytics1027
  • 11:11 elukey@tin: Finished deploy [analytics/refinery@1440646]: (no justification provided) (duration: 00m 14s)
  • 11:11 elukey@tin: Started deploy [analytics/refinery@1440646]: (no justification provided)
  • 11:09 elukey@tin: Finished deploy [analytics/refinery@1440646]: (no justification provided) (duration: 00m 02s)
  • 11:09 elukey@tin: Started deploy [analytics/refinery@1440646]: (no justification provided)
  • 11:05 jynus: stopping mariadb and restarting db1051 for maintenance
  • 11:03 joal@tin: Finished deploy [analytics/refinery@1440646]: (no justification provided) (duration: 01m 23s)
  • 11:02 joal@tin: Started deploy [analytics/refinery@1440646]: (no justification provided)
  • 10:53 marostegui: Start pt-table-checksum on plwiki (s2) - T154485
  • 10:48 joal@tin: Finished deploy [analytics/refinery@1440646]: (no justification provided) (duration: 15m 33s)
  • 10:33 joal@tin: Started deploy [analytics/refinery@1440646]: (no justification provided)
  • 09:28 hashar: Restarting Jenkins (2)
  • 09:03 hashar: Restarting Jenkins
  • 08:27 moritzm: upgrading apache on bromine
  • 08:22 marostegui: Run pt-table-checksum on s2 (nowiki) - T154485
  • 08:20 marostegui: Deploy alter table s6 on db2067 - T159414
  • 08:13 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2067 - T159414 (duration: 00m 40s)
  • 07:30 moritzm: installing w3m security updates on trusty (jessie already fixed)
  • 04:39 mutante: planet2001 last log message was for T159432
  • 04:38 mutante: planet2001 - reinstall, boot into installer, scheduled downtime (T15943)
  • 04:16 legoktm: running refreshLinks.php on aawiki
  • 04:13 legoktm@tin: Synchronized php-1.29.0-wmf.14/maintenance/refreshLinks.php: Queue non-recursive updates - https://gerrit.wikimedia.org/r/340920 (duration: 00m 40s)
  • 03:27 awight: rerunning schema_update wmf_civicrm:7480
  • 03:26 awight: update civicrm from 133bde2 to d20ed40
  • 02:38 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Mar 3 02:38:40 UTC 2017 (duration 5m 19s)
  • 02:33 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 13m 28s)
  • 01:45 awight: rerun schema change wmf_civicrm:7480
  • 01:34 Krinkle: terbium$ foreachwiki purgeModuleDeps.php (T158105)
  • 01:34 Krinkle: terbium$ foreachwikiindblist group0 purgeModuleDeps.php (T158105)
  • 01:33 Krinkle: terbium$ mwscript purgeModuleDeps.php --wiki test2wiki (T158105)
  • 01:28 awight: update civicrm from 0cab193 to 133bde2
  • 01:12 MaxSem: Restarted tilerator on codfw tileservers to catch latest code changes
  • 01:11 mattflaschen@tin: Synchronized php-1.29.0-wmf.14/autoload.php: resourceloader: Add purgeModuleDeps.php maintenance script (duration: 00m 39s)
  • 01:10 mattflaschen@tin: Synchronized php-1.29.0-wmf.14/maintenance/cleanupRemovedModules.php: resourceloader: Add purgeModuleDeps.php maintenance script (duration: 00m 40s)
  • 01:09 mattflaschen@tin: Synchronized php-1.29.0-wmf.14/maintenance/purgeModuleDeps.php: resourceloader: Add purgeModuleDeps.php maintenance script (duration: 00m 40s)
  • 01:02 ejegg: re-running fix for missing names
  • 00:42 ejegg: re-enabled CiviCRM de-dupe jobs
  • 00:41 ejegg: CiviCRM geocoding update finished, name fix failed on badly formatted comment
  • 00:35 mattflaschen@tin: Synchronized wmf-config/CirrusSearch-common.php: CirrusSearch: Enable super_detect_noop (duration: 00m 39s)
  • 00:16 mattflaschen@tin: Synchronized php-1.29.0-wmf.14/extensions/Flow/: Fix autoload data and script (duration: 00m 59s)

2017-03-02

  • 23:49 ejegg: running batched geocoding update and donor name fixes
  • 23:43 ejegg: updated civicrm from d012767 to 2d1de87
  • 23:42 ejegg: disabled dedupe jobs for civi update
  • 23:07 bblack: all authdns servers puppet re-enabled
  • 23:05 bblack@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=appservers-rw,name=eqiad
  • 23:05 bblack@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=appservers-rw
  • 22:55 Krinkle: Stopped statsd-mw-js-deprecate service on hafnium per https://gerrit.wikimedia.org/r/338929
  • 22:46 catrope@tin: Synchronized dblists/: T63729: disable Flow on metawiki (duration: 00m 58s)
  • 22:36 MaxSem: killed stuck updates on maps-test2001
  • 22:09 mutante: bast3002 - stop rsyncd, remove rsyncd config snippets (T156506)
  • 20:05 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.14
  • 19:58 demon@tin: Synchronized wmf-config/CommonSettings.php: Stacktraces are useful when cli scripts fail (duration: 00m 56s)
  • 19:58 bblack@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=appservers-rw,named=eqiad
  • 19:57 bblack@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=appservers-rw
  • 19:53 maxsem@tin: Finished deploy [tilerator/deploy@edb97c5]: Trying https://gerrit.wikimedia.org/r/#/c/340607/ once again (duration: 00m 04s)
  • 19:53 maxsem@tin: Started deploy [tilerator/deploy@edb97c5]: Trying https://gerrit.wikimedia.org/r/#/c/340607/ once again
  • 19:49 maxsem@tin: Finished deploy [tilerator/deploy@0fe5a1d]: Reverting to previous version
  • 19:49 maxsem@tin: Started deploy [tilerator/deploy@0fe5a1d]: Reverting to previous version
  • 19:46 maxsem@tin: Finished deploy [tilerator/deploy@edb97c5]: https://gerrit.wikimedia.org/r/#/c/340607/ (duration: 00m 05s)
  • 19:46 maxsem@tin: Started deploy [tilerator/deploy@edb97c5]: https://gerrit.wikimedia.org/r/#/c/340607/
  • 19:43 maxsem@tin: Finished deploy [tilerator/deploy@edb97c5]: https://gerrit.wikimedia.org/r/#/c/340607/ (duration: 00m 03s)
  • 19:43 maxsem@tin: Started deploy [tilerator/deploy@edb97c5]: https://gerrit.wikimedia.org/r/#/c/340607/
  • 19:42 maxsem@tin: Finished deploy [tilerator/deploy@edb97c5]: https://gerrit.wikimedia.org/r/#/c/340607/ (duration: 00m 03s)
  • 19:42 maxsem@tin: Started deploy [tilerator/deploy@edb97c5]: https://gerrit.wikimedia.org/r/#/c/340607/
  • 19:42 maxsem@tin: Finished deploy [tilerator/deploy@edb97c5]: https://gerrit.wikimedia.org/r/#/c/340607/ (duration: 00m 23s)
  • 19:42 maxsem@tin: Started deploy [tilerator/deploy@edb97c5]: https://gerrit.wikimedia.org/r/#/c/340607/
  • 19:16 addshore@tin: Synchronized dblists/all-labs.dblist: Add beta hewiktionary T158628 2/2 NOOP (duration: 00m 39s)
  • 19:15 addshore@tin: Synchronized wikiversions-labs.json: Add beta hewiktionary T158628 1/2 NOOP (duration: 00m 42s)
  • 19:06 awight: reenabling donation and recurring queue consumers
  • 19:05 addshore@tin: Synchronized wmf-config/throttle.php: Add new rules for WMUK T159454 T159461 (duration: 00m 43s)
  • 19:04 awight: update civicrm from fb91fa8 to d012767
  • 18:22 demon@tin: Synchronized php-1.29.0-wmf.14/includes/changes/EnhancedChangesList.php: T159466 (duration: 00m 40s)
  • 17:51 bblack: disabling puppet on authdns prod machines for hacky discovery testing
  • 17:44 bblack@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=codfw
  • 17:44 bblack@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=appservers-rw,name=eqiad
  • 17:38 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad
  • 16:52 bblack: puppet re-enabled on authdns production boxes
  • 16:27 bblack: puppet disabled on authdns production boxes, for hacky testing of discovery-related commits
  • 16:00 jynus: restarting db1001 for kernel and mariadb upgrade
  • 15:49 moritzm: uploaded 6.8.9.9-5+deb8u7+wmf1 to apt.wikimedia.org (CMYK sharpen bugfix rebased on latest Debian update)
  • 15:42 moritzm: installing libfcgi-perl security updates
  • 14:47 phuedx@tin: Synchronized wmf-config/InitialiseSettings.php: T157700: Re-enable Page Previews instrumentation (duration: 00m 40s)
  • 14:37 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1008.eqiad.wmnet
  • 14:37 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1007.eqiad.wmnet
  • 14:37 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1006.eqiad.wmnet
  • 14:32 phuedx@tin: Synchronized portals: (no justification provided) (duration: 00m 41s)
  • 14:32 phuedx@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 40s)
  • 14:26 jynus: running alter table on db2040 T147747
  • 14:22 elukey@tin: Finished deploy [analytics/refinery@c3dd129]: (no justification provided) (duration: 02m 18s)
  • 14:20 elukey@tin: Started deploy [analytics/refinery@c3dd129]: (no justification provided)
  • 14:12 phuedx@tin: Synchronized wmf-config/InitialiseSettings.php: Remove Page Previews experiment config (duration: 00m 40s)
  • 14:10 phuedx@tin: Synchronized wmf-config/CommonSettings.php: Remove Page Previews experiment config (duration: 01m 06s)
  • 13:47 moritzm: removed obsolete kernels on ocg1002
  • 13:46 moritzm: removed obsolete kernels on eventlog1001
  • 13:03 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1005.eqiad.wmnet
  • 12:52 moritzm: installing shadow security updates on jessie hosts
  • 12:43 jynus: running ANALYZE table on revision at db1051 (depooled) T159319
  • 12:36 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1051 for maintenance (duration: 00m 42s)
  • 11:58 hashar: CI composer based builds are now ok. Only operations/mediawiki-config was impacted as far as I can tell.
  • 11:10 kartik@tin: Finished deploy [cxserver/deploy@5101090]: (no justification provided) (duration: 02m 24s)
  • 11:07 kartik@tin: Started deploy [cxserver/deploy@5101090]: (no justification provided)
  • 10:51 hashar: CI composer based builds are sometime broken since composer got upgraded to 1.1.0 . See https://phabricator.wikimedia.org/T159431
  • 10:23 moritzm: installing bind updates (we're using client-side libs/tools)
  • 10:04 moritzm: installing tiff security updates on trusty hosts (jessie already fixed)
  • 09:55 elukey: increased PHP memory_limit on bohrium for Piwik (T154558)
  • 09:26 moritzm: installing glibc updates from jessie point release
  • 09:24 hashar: Upgrading composer to 1.1.0 on CI instances
  • 09:08 moritzm: installing apache2 security updates on mw1262-mw1265
  • 08:51 jynus: running alter table on db2039 T147747
  • 08:45 jynus: running alter table on db2035 T147747
  • 08:27 marostegui: Start pt-table-checksum on itwiki (s2)  - T154485
  • 07:20 marostegui: Deploy alter table enwiki.revision db2016 (codfw master) - T132416
  • 07:09 marostegui: Resume pt-table-checksum on idwiki (s2) - T154485
  • 03:04 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Mar 2 03:04:16 UTC 2017 (duration 5m 49s)
  • 02:58 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 14m 52s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 09m 30s)
  • 01:52 eileen1: civicrm changed...
  • 00:48 mutante: tin/mira - you will notice in the output of keyholder status you will not see the pathes in the "comment" column anymore. this is due to newer versions of openssh-client and caused our problem last time i attempted this. thanks to thcipriani's fix https://gerrit.wikimedia.org/r/#/c/312947/ we don't rely on this anymore and all is good, keyholder stays armed even after re-encrypting the
  • 00:44 mutante: tin - disarm/rearm keyholder after changing passphrases of all deployment keys to new passphrase (T154943)
  • 00:41 mutante: mira - disarm/rearm keyholder after changing passphrases of all other deployment keys (T154943)
  • 00:37 dereckson@tin: Synchronized wmf-config/interwiki.php: Update interwiki map (ref T159103) (duration: 00m 41s)
  • 00:23 mutante: mira - disarming keyholder, changed password of analytics deploy key - rearming to test changes for T154943

2017-03-01

  • 23:28 mutante: contint1002, contint2001: rm /usr/lib/ganglia/python_modules/diskstat.py*; rm /etc/ganglia/conf.d/diskstat.pyconf (re: gerrit 340657)
  • 21:44 arlolra@tin: Finished deploy [parsoid/deploy@32ca3fb]: (no justification provided) (duration: 00m 15s)
  • 21:44 arlolra@tin: Started deploy [parsoid/deploy@32ca3fb]: (no justification provided)
  • 21:44 arlolra@tin: Finished deploy [parsoid/deploy@32ca3fb]: (no justification provided) (duration: 00m 15s)
  • 21:44 arlolra@tin: Started deploy [parsoid/deploy@32ca3fb]: (no justification provided)
  • 21:43 arlolra@tin: Finished deploy [parsoid/deploy@32ca3fb]: Updating parsoid to 9f96b2a0 (duration: 02m 00s)
  • 21:41 arlolra@tin: Started deploy [parsoid/deploy@32ca3fb]: Updating parsoid to 9f96b2a0
  • 21:41 arlolra@tin: Finished deploy [parsoid/deploy@32ca3fb]: Updating parsoid to 9f96b2a0 (duration: 03m 50s)
  • 21:40 demon@tin: Synchronized php-1.29.0-wmf.14/extensions/Echo/includes/model/Event.php: better logging and such (duration: 00m 40s)
  • 21:37 arlolra@tin: Started deploy [parsoid/deploy@32ca3fb]: Updating parsoid to 9f96b2a0
  • 21:37 arlolra@tin: Finished deploy [parsoid/deploy@32ca3fb]: Updating parsoid to 9f96b2a0 (duration: 05m 14s)
  • 21:32 arlolra@tin: Started deploy [parsoid/deploy@32ca3fb]: Updating parsoid to 9f96b2a0
  • 21:32 arlolra@tin: Finished deploy [parsoid/deploy@32ca3fb]: Updating parsoid to 9f96b2a0 (duration: 07m 39s)
  • 21:28 demon@tin: Synchronized php-1.29.0-wmf.14/extensions/CentralAuth/: Unbreak pending real fix (duration: 00m 49s)
  • 21:24 arlolra@tin: Started deploy [parsoid/deploy@32ca3fb]: Updating parsoid to 9f96b2a0
  • 21:04 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.14
  • 21:03 demon@tin: Synchronized php: Symlink swap (duration: 00m 39s)
  • 20:41 mutante: netmon1001, labsdb1006,labsdb1007, fluorine, helium same fix as above, were not covered by salt targeting as they are precise. this is all now. ubuntu.wikimedia.org does not appear in sources when checking *
  • 20:35 mutante: [neodymium:~] $ sudo salt --out=txt -b 10 -C 'G@lsb_distrib_codename:trusty' cmd.run "sed -i 's/ubuntu.wikimedia/mirrors.wikimedia/g' /etc/apt/sources.list && apt-get update" (https://phabricator.wikimedia.org/rOPUPe9da17d739233a4db197e947e627cf2a47ce6e6f#2080366)
  • 20:27 mutante: all trusty hosts via salt - fix APT sources list. replace ubuntu.wikimedia (deleted) with mirrors.wikimedia, apt-get update (re: https://phabricator.wikimedia.org/rOPUPe9da17d739233a4db197e947e627cf2a47ce6e6f)
  • 20:02 smalyshev@tin: Finished deploy [wdqs/wdqs@2b8ffef]: Bump memory limit for Java to 16g (duration: 03m 36s)
  • 19:59 smalyshev@tin: Started deploy [wdqs/wdqs@2b8ffef]: Bump memory limit for Java to 16g
  • 19:40 mutante: ocg1001, db1047, californium, db1051, rcs1002, db1041, iridium - fix APT sources list. replace ubuntu.wikimedia (deleted) with mirrors.wikimedia, apt-get update
  • 19:30 mutante: labsdb1001, labtestcontrol2001, labtestvirt2001 - fix APT sources list. replace ubuntu.wikimedia (deleted) with mirrors.wikimedia
  • 19:19 awight: applying civicrm db migration wmf_civicrm:7465
  • 19:18 awight: update civicrm from b3f6eef to 58c8c06
  • 19:07 mutante: terbium - install multiple pending package upgrades
  • 19:04 mutante: terbium - uses ubuntu.wikimedia.org in APT sources but that does not exist anymore. replaced 'ubuntu' with 'mirrors' globally, apt-get update
  • 18:35 thcipriani@tin: Synchronized README: test sync for scap 3.5.3-1 (duration: 00m 46s)
  • 17:54 jynus: autoremoving old kernels on terbium to make room on /boot
  • 17:52 jynus: running alter table on db2044 T147747
  • 15:47 joal@tin: Finished deploy [analytics/refinery@f4a5020]: (no justification provided) (duration: 02m 33s)
  • 15:45 marostegui: Resume pt-table-checksum on idwiki (s2) - T154485
  • 15:45 joal@tin: Started deploy [analytics/refinery@f4a5020]: (no justification provided)
  • 15:44 joal@tin: Finished deploy [analytics/refinery@b4a8fcc]: (no justification provided) (duration: 00m 13s)
  • 15:44 joal@tin: Started deploy [analytics/refinery@b4a8fcc]: (no justification provided)
  • 15:35 jynus: running alter table on db1034 T147747
  • 15:28 gehel: deploying on eqiad completed - T158782
  • 15:26 elukey@tin: Finished deploy [analytics/refinery@b4a8fcc]: (no justification provided) (duration: 02m 15s)
  • 15:23 elukey@tin: Started deploy [analytics/refinery@b4a8fcc]: (no justification provided)
  • 15:18 gehel: testing a few host on codfw looks good, deploying on eqiad - T158782
  • 15:10 gehel: mw1209 looks good, deploying on codfw - T158782
  • 15:05 gehel: mwdebug1001 looks good, deploying on mw1209 - T158782
  • 14:54 gehel: starting deployment of mediawiki apache config - T158782
  • 14:31 elukey@tin: Finished deploy [analytics/refinery@33db287]: (no justification provided) (duration: 01m 13s)
  • 14:30 elukey@tin: Started deploy [analytics/refinery@33db287]: (no justification provided)
  • 14:29 dcausse: EU SWAT Done
  • 14:27 dcausse@tin: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] cleanup old A/B test (duration: 00m 40s)
  • 14:27 elukey@tin: Finished deploy [analytics/refinery@33db287]: (no justification provided) (duration: 01m 24s)
  • 14:26 elukey@tin: Started deploy [analytics/refinery@33db287]: (no justification provided)
  • 14:12 dcausse@tin: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] Test disable super_detect_noop script (duration: 00m 47s)
  • 13:16 marostegui: run pt-table-checksum on idwiki - T154485
  • 12:43 moritzm: installing apache2 security updates on mw1261
  • 12:22 godog: upgrade thumbor to 0.1.13 on thumbor100[12]
  • 11:32 jynus: running alter table on db2037 T147747
  • 11:27 moritzm: upgrading nginx on meiterium/archiva.wikimedia.org to 1.11.4 (using openssl 1.1)
  • 11:02 moritzm: uploaded lz4 0.0~r131 for jessie-wikimedia to apt.wikimedia.org (required by HHVM 3.18)
  • 09:33 jynus: running alter table on db1037 T147747
  • 09:20 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 after maintenance (duration: 00m 41s)
  • 09:14 marostegui: Deploy alter table s3 (all wikis) user_groups table - T155605
  • 08:49 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026 after maintenance (duration: 00m 40s)
  • 08:18 moritzm: installing libgd2 security updates on trusty (jessie already fixed)
  • 07:05 marostegui: Deploy alter table enwiki.revision - dbstore2002 - T132416
  • 03:06 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Mar 1 03:06:24 UTC 2017 (duration 5m 46s)
  • 03:00 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.14) (duration: 13m 51s)
  • 02:29 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 08m 03s)
  • 01:00 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Remove DonationInterface loading as gone from master (primarily to unbreak beta) (duration: 00m 42s)
  • 00:59 eileen1: Update CiviCRM from 04b49b0 to b3f6eef
  • 00:59 reedy@tin: Synchronized wmf-config/CommonSettings.php: Remove DonationInterface loading as gone from master (primarily to unbreak beta) (duration: 00m 40s)

2017-02-28

  • 22:29 mutante: (T157675) - delete salt keys - [neodymium:~] $ for mcnode in $(seq 2001 2016); do sudo salt-key -d mc${mcnode}.codfw.wmnet; done
  • 22:26 mutante: (T157675) - revoke puppet certs, deactivate nodes, rm from icinga. [puppetmaster1001:~] $ for mcnode in $(seq 2001 2016); do puppet node clean mc${mcnode}.codfw.wmnet && puppet node deactivate mc${mcnode}.codfw.wmnet ; done
  • 21:58 awight: update payments from 2a0c3b2 to 66d8125
  • 21:51 eileen1: update CiviCRM from a2875c5 to 04b49b0
  • 21:44 urandom: Updating RESTBase mobileapps tables (all remaining) to use time-windowed compaction
  • 21:40 maxsem@tin: Finished deploy [kartotherian/deploy@81db48c]: Second attempt at 81db48c (duration: 06m 39s)
  • 21:34 maxsem@tin: Started deploy [kartotherian/deploy@81db48c]: Second attempt at 81db48c
  • 21:23 MaxSem: Completely disabled kartotherian on maps-test2004, it just logs errors
  • 21:05 _joe_: manually installing nodejs on wasat T156922
  • 20:50 maxsem@tin: Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/340357/2 (duration: 00m 40s)
  • 20:33 urandom: Updating RESTBase mobileapps tables (phase0) to use time-windowed compaction
  • 20:30 demon@tin: Synchronized wmf-config/wikitech.php: no moar forms on wikitech (duration: 00m 39s)
  • 20:03 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.14
  • 19:34 demon@tin: Synchronized php-1.29.0-wmf.14/extensions/WikimediaEvents/modules/ext.wikimediaEvents.geoFeatures.js: Roan made me do it (duration: 00m 39s)
  • 19:26 demon@tin: Finished scap: testwiki to wmf.14 + l10n bootstrap (duration: 55m 14s)
  • 19:04 urandom: Updating RESTBase mobileapps tables (wikimedia) to uses time-windowed compaction
  • 18:31 demon@tin: Started scap: testwiki to wmf.14 + l10n bootstrap
  • 17:43 urandom: Updating RESTBase mobileapps tables (wikipedia) to uses time-windowed compaction
  • 17:11 elukey: Analytics Hadoop cluster upgraded to CDH 5.10
  • 17:09 jynus: disabling replication lag alerts on db1026 (depooled)
  • 17:05 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1001.eqiad.wmnet
  • 17:04 gehel: restarting blazegraph on wdqs1001 - T159245
  • 15:47 jynus: running alter table on db1056 T147747
  • 15:30 gehel: depooling wdqs1001 due to instability
  • 15:29 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1001.eqiad.wmnet
  • 15:23 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1056 for maintenance (duration: 00m 40s)
  • 14:51 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1003.eqiad.wmnet
  • 14:46 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1053 after maintenance (duration: 00m 39s)
  • 14:35 elukey: start the Analytics Hadoop cluster upgrade (https://etherpad.wikimedia.org/p/analytics-cdh5.10)
  • 14:32 marostegui: run pt-table-checksum on eowiki (s2) - T154485
  • 14:08 phuedx@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Make Page Previews use RESTBase on Beta Cluster (duration: 00m 42s)
  • 14:02 reedy@tin: Synchronized php-1.29.0-wmf.13/extensions/Dashiki/extension.json: Register JsonConfigModels (duration: 00m 42s)
  • 13:57 jynus: running alter table on db1036 T147747
  • 13:23 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 after maintenance (duration: 00m 39s)
  • 13:03 Reedy: ran namespaceDupes on meta to fix some Config pages
  • 12:48 _joe_: flushed memcached in codfw, restarting hhvm on appserver to flush APC in order to test warmup script
  • 11:47 gehel: restarting wdqs-blazegraph on wdqs1003
  • 11:40 gehel: depooling wdqs1003 for investigation (high 5xx rate)
  • 11:40 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet
  • 11:22 jynus: running alter table on db1053 T147747
  • 11:18 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1053 for maintenance (duration: 00m 40s)
  • 10:56 elukey: restart zookeeper on conf1002
  • 10:53 marostegui: run pt-table-checksum on enwiktionary (s2) - T154485
  • 10:35 elukey: restar zookeeper on conf1003
  • 10:28 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1026 for maintenance (duration: 00m 39s)
  • 10:23 marostegui: run pt-table-checksum on enwikiquote (s2) - T154485
  • 10:09 marostegui: Deploy alter table s2 on all wikis for table user_groups - T155605
  • 10:00 elukey: restart zookeeper on conf1001
  • 09:47 jynus: running alter table on db1055 T147747
  • 09:43 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 for maintenance (duration: 00m 40s)
  • 09:38 marostegui: Deploy alter table s7 on all wikis for table user_groups - T155605
  • 09:06 jynus: running alter table on db2042 T147747
  • 09:03 marostegui: Deploy alter table s1 (enwiki).user_groups - T155605
  • 08:59 marostegui: run pt-table-checksum on cswiki (s2) - T154485
  • 08:43 moritzm: installing python-crypto security updates
  • 08:38 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 after maintenance (duration: 00m 40s)
  • 08:37 hashar: nodepool deleted alien instances 541585 541586 and 541587
  • 08:35 marostegui: Deploy alter table s6 (frwiki,jawiki,ruwiki).user_groups - T155605
  • 08:24 marostegui: run pt-table-checksum on bgwiktionary (s2) - T154485
  • 08:22 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1051 after maintenance (duration: 00m 41s)
  • 08:18 marostegui: Deploy alter table s5 wikidatawiki.user_groups - T155605
  • 08:15 marostegui: Deploy alter table s5 dewiki.user_groups - T155605
  • 07:41 marostegui: Deploy alter table s4.user_groups - T155605
  • 07:12 marostegui: run pt-table-checksum on bgwiki (s2) - T154485
  • 07:00 marostegui: Deploy alter table enwiki.revision db2034 - T132416
  • 02:35 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Feb 28 02:35:56 UTC 2017 (duration 5m 20s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 11m 40s)
  • 02:18 mutante: rsyncing prometheus metrics data from bast3001 to bast3002 (T156506)
  • 01:42 mutante: mw1198 - restart hhvm
  • 01:01 demon@tin: Synchronized scap/plugins/clean.py: No-op, more cleanups for clean.py (duration: 00m 42s)
  • 00:33 ebernhardson: restart elasticsearch on relforge1002, putting too much load on the machine got it stuck in a GC spiral with 1minute+ collections
  • 00:29 ebernhardson: restart elasticsearch on relforge1001, putting too much load on the machine got it stuck in a GC spiral with 1minute+ collections
  • 00:15 demon@tin: Synchronized php-1.29.0-wmf.13/extensions/MobileFrontend/resources/skins.minerva.base.styles/ui.less: Fix the incorrect magnify glass icon position in lang search (duration: 00m 39s)
  • 00:13 demon@tin: Synchronized php-1.29.0-wmf.13/extensions/Nuke/Nuke_body.php: Move back to old caller names (duration: 00m 43s)
  • 00:09 demon@tin: Synchronized wmf-config/CommonSettings.php: Enable editmyoptions right for all users on loginwiki (duration: 00m 41s)

2017-02-27

  • 23:50 demon@tin: Finished scap: Enabling Dashiki on meta (duration: 20m 46s)
  • 23:29 demon@tin: Started scap: Enabling Dashiki on meta
  • 23:17 demon@tin: Synchronized scap/plugins/clean.py: no-op (duration: 00m 48s)
  • 22:37 otto@tin: Finished deploy [eventstreams/deploy@76c763e]: Deploying swagger-ui /?doc endpoint (duration: 01m 45s)
  • 22:36 otto@tin: Started deploy [eventstreams/deploy@76c763e]: Deploying swagger-ui /?doc endpoint
  • 22:34 otto@tin: Finished deploy [eventstreams/deploy@76c763e]: Deploying /?doc swagger-ui endpoint only to scb2001 (duration: 00m 17s)
  • 22:34 otto@tin: Started deploy [eventstreams/deploy@76c763e]: Deploying /?doc swagger-ui endpoint only to scb2001
  • 22:10 otto@tin: Finished deploy [eventstreams/deploy@2f73b52]: Deploying /?doc swagger-ui endpoint only to scb2001 (duration: 00m 18s)
  • 22:10 otto@tin: Started deploy [eventstreams/deploy@2f73b52]: Deploying /?doc swagger-ui endpoint only to scb2001
  • 21:42 bsitzmann@tin: Finished deploy [mobileapps/deploy@872a615]: Update mobileapps to c924126 (duration: 03m 14s)
  • 21:39 bsitzmann@tin: Started deploy [mobileapps/deploy@872a615]: Update mobileapps to c924126
  • 21:16 mutante: ganglia - switching esams aggregator to bast3002 - except short gaps in esams graphs
  • 20:51 robh: disabled puppet on einstienium for icinga update of config
  • 18:26 gehel: restarting wdqs-updater on all wdqs servers
  • 18:25 gehel@tin: Finished deploy [wdqs/wdqs@daca9b3]: (no justification provided) (duration: 01m 39s)
  • 18:24 gehel: redeploying wdqs (previous deploy was not latest version)
  • 18:24 gehel@tin: Started deploy [wdqs/wdqs@daca9b3]: (no justification provided)
  • 18:18 awight: update civicrm from 20660c4 to a2875c5
  • 18:14 gehel: restarting wdqs-updater on all wdqs servers
  • 18:14 gehel@tin: Finished deploy [wdqs/wdqs@62354ed]: (no justification provided) (duration: 00m 52s)
  • 18:13 gehel@tin: Started deploy [wdqs/wdqs@62354ed]: (no justification provided)
  • 18:12 gehel@tin: Finished deploy [wdqs/wdqs@62354ed]: log (duration: 00m 12s)
  • 18:12 ema: temporarily bumping timeout_idle to 120s on cache_misc T154558
  • 18:12 gehel@tin: Started deploy [wdqs/wdqs@62354ed]: log
  • 18:04 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1001.eqiad.wmnet
  • 16:08 jynus: starting schema change on db1051 T147747
  • 16:01 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1051 for maintenance (duration: 00m 40s)
  • 15:55 jynus: starting schema change on db2038 T147747
  • 14:58 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad
  • 14:42 Dereckson: Fix namespace dupes pages on ext.wikipedia (T158914)
  • 14:30 hashar: European SWAT done. Pushed https://gerrit.wikimedia.org/r/#/c/339446/ and https://gerrit.wikimedia.org/r/#/c/339348/
  • 14:29 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: New namespace aliases for itwikiversity - T158775 (duration: 00m 43s)
  • 14:13 moritzm: installed apache2 security updates on mwdebug*
  • 14:10 aude@tin: Synchronized wmf-config/Wikibase-production.php: Disable geo-shape datatype on wikidata for now (duration: 00m 41s)
  • 13:58 marostegui: Manually deploy gtid_domain_id on s2 - T149418
  • 13:06 elukey: restart zookeeper on conf2003
  • 12:39 elukey: restart zookeeper on conf2002
  • 12:14 _joe_: reissuing the certificate for etcd.codfw.wmnet due to a previous error
  • 12:00 elukey: rebooting mw2092 due to puppet errors for mw-cgroup - T151427
  • 11:58 volans: re-enabled icinga-wm
  • 11:37 ema: cp1052 repooled T148891
  • 11:19 elukey: zookeeper status report - new changes rolled out to druid nodes and conf2001 - conf1* and conf200[23] still pending, waiting for more metrics before proceeding
  • 11:09 volans: temporarily stopped ircecho (icinga-wm)
  • 11:04 ema: rebooting cp1052 into kernel 4.4.2-3+wmf8 T148891
  • 10:49 moritzm: uploaded apache2 2.4.10-10+deb8u8+wmf1 to apt.wikimedia.org (rebase of local patches on top on latest DSA)
  • 10:34 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: wme: Set ReadingDepth sampling rate to 0.1% - T155639 (duration: 00m 40s)
  • 10:31 elukey: limiting the Zookeeper Maximum heap size to 1G (https://gerrit.wikimedia.org/r/#/c/337797/) - setting applied gradually to Zookeeper on Druid and Conf* hosts
  • 10:11 _joe_: upgrading conftool to 0.4.0 across the cluster T149617
  • 10:03 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1045 for maintenance (duration: 00m 43s)
  • 09:19 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026 after maintenance with full weight (duration: 00m 39s)
  • 08:42 _joe_: upload conftool 0.4.0 to trusty-wikimedia
  • 08:42 _joe_: promote conftool 0.4.0 to jessie-wikimedia main
  • 07:59 marostegui: Run pt-table-checksum on s2 (nlwiki) on revision table - T154485
  • 07:29 marostegui: Deploy alter table enwiki.revision - db2034 - T132416
  • 07:21 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2034 - T132416 (duration: 00m 40s)
  • 07:15 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2070 - T132416 (duration: 00m 40s)
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Feb 27 02:25:10 UTC 2017 (duration 5m 21s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 07m 23s)

2017-02-26

  • 17:10 Reedy: ran namespaceDupes for extwiki
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Feb 26 02:25:07 UTC 2017 (duration 5m 21s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 07m 26s)

2017-02-25

  • 20:06 elukey: depooled cp2017 (via local sudo -i depool command) since the host froze (it got back after a powercycle)
  • 19:54 elukey: powercycled cp2017, mgmt console stuck
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Feb 25 02:25:10 UTC 2017 (duration 5m 21s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 07m 20s)
  • 01:43 mutante: bast3002 - sign puppet cert, initial run with basic "bastion" role, to replace broken bast3001, but WIP, ganglia/prometheus roles not moved yet (T156506)

2017-02-24

  • 22:46 Krinkle: (terbium) sql --write mediawikiwiki 'DELETE FROM module_deps' (in batches of 500; 42292 rows affected) - per T158105.
  • 22:28 smalyshev@tin: Finished deploy [wdqs/wdqs@62354ed]: Deploy new updater on 1001 for timeout increase (duration: 00m 16s)
  • 22:27 smalyshev@tin: Started deploy [wdqs/wdqs@62354ed]: Deploy new updater on 1001 for timeout increase
  • 22:23 smalyshev@tin: Finished deploy [wdqs/wdqs@62354ed]: Deploy new updater on 2001 for testing (duration: 00m 26s)
  • 22:23 smalyshev@tin: Started deploy [wdqs/wdqs@62354ed]: Deploy new updater on 2001 for testing
  • 20:50 ebernhardson: restart elasticsearch on logstash1002
  • 20:05 demon@tin: Synchronized wmf-config/wikitech.php: (no justification provided) (duration: 00m 48s)
  • 19:30 Pchelolo: restarting RESTBase on xenon.eqiad.wmnet in staging
  • 17:15 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026 after maintenance with low load (duration: 00m 40s)
  • 16:55 volans: manually cleaning ferm leftovers on dbproxy1011 - T158798
  • 15:35 ema: temporarily bumping timeout_idle to 60s on cache_misc T154558
  • 14:27 volans: re-started and re-armed keyholder after upgrade on: mira.codfw.wmnet,neodymium.eqiad.wmnet,sarin.codfw.wmnet,tin.eqiad.wmnet T158660 T158659
  • 10:41 ema: cache_misc: upgrading to varnish 4.1.5
  • 10:30 moritzm: installing imagemagick regression update for security update on trusty (the Debian update seems unaffected)
  • 10:23 moritzm: installing spice updates on trusty
  • 09:48 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1036 - T154485 (duration: 00m 40s)
  • 09:39 elukey: stop Redis and Memcached on mc2001->mc2016 as extra precautionary step before decom - T157675
  • 08:44 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1001.eqiad.wmnet
  • 08:16 volans: temporary disabled puppet on neodymium and sarin to deploy Gerrit 339183 - T158753
  • 07:32 marostegui: Deploy alter table enwiki.revision on db2070 - T132416
  • 07:26 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2069 and depool db2070 - T132416 (duration: 00m 45s)
  • 02:32 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Feb 24 02:32:21 UTC 2017 (duration 5m 22s)
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 07m 02s)
  • 00:26 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Store goodfaith scores in the ORES tables T137966 (duration: 00m 40s)
  • 00:17 mobrovac: restbase deploying b477ab46

2017-02-23

  • 21:11 dereckson@tin: Finished scap: Full scap to deploy new l10n keys on wikitech (gerrit:339456), take two (duration: 22m 55s)
  • 20:49 dereckson@tin: Started scap: Full scap to deploy new l10n keys on wikitech (gerrit:339456), take two
  • 20:48 dereckson@tin: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap"; owner is "dereckson"; reason is "Full scap to deploy new l10n keys on wikitech (gerrit:339456)" (duration: 00m 00s)
  • 20:46 dereckson@tin: Started scap: Full scap to deploy new l10n keys on wikitech (gerrit:339456)
  • 20:04 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.13
  • 19:47 dereckson@tin: Synchronized php-1.29.0-wmf.13/extensions/WikimediaMessages/extension.json: Create user group messages for wikitech.wikimedia.org (T158417) (duration: 00m 39s)
  • 19:45 dereckson@tin: Synchronized php-1.29.0-wmf.13/extensions/WikimediaMessages/i18n/wikitech/: (no justification provided) (duration: 00m 43s)
  • 18:29 chasemp: labnodepool1001:~# service nodepool restart
  • 17:40 gehel: removing old prod indices from relforge1002 - T156150
  • 17:37 gehel: removing old prod indices from relforge1002 (jawikiprod_content, enprodwiki_content, ruwikiprod_content) - T156150
  • 16:33 paravoid: cleaning up openstack packages from einstenium & tegment
  • 16:19 gehel: starting upgrade relforge cluster to elasticsearch 5.2.1 - expect significant downtime - T156150
  • 16:19 gehel: unban relforge1001 - T156150
  • 15:45 gehel: banning relforge1001 from clsuter to prepare for ES5 upgrade - T156150
  • 15:18 godog: roll-restart pybal in codfw to pick up swift https service
  • 15:08 marostegui: Power off dbstore1001 to change its disks and reimage - T153768
  • 14:42 addshore: addshore@tin scap clean 1.29.0-wmf.6 && scap clean 1.29.0-wmf.7 (to remove warning on scap pull on mwdebug1002, T157030)
  • 14:39 addshore: EU SWAT done
  • 14:39 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT T158832 nable TwoColConflict on hewiki (duration: 00m 40s)
  • 14:29 addshore@tin: Synchronized php-1.29.0-wmf.13/extensions/ContentTranslation/ContentTranslation.hooks.php: SWAT T158297 Really disable europeana2802016 campaign (duration: 00m 39s)
  • 14:26 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT T156794 Enable v2 of Minerva's header on cawiki and itwiki (duration: 00m 42s)
  • 14:18 paravoid: upgrading grafana to 4.1 on krypton
  • 13:52 gehel: restart logstash on relforge1001 to test logging configuration - T158664
  • 13:03 ema: cache_maps: upgrading to varnish 4.1.5
  • 12:40 moritzm: installing libssh security updates on trusty (jessie already fixed)
  • 12:40 moritzm: installing libssh security updates (jessie already fixed)
  • 12:35 moritzm: installing tomcat updates
  • 09:39 elukey: increase cassandra system_auth replication from 6 to 12 on AQS
  • 09:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1036 - T154485 (duration: 00m 40s)
  • 09:09 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1036 - T154485 (duration: 00m 40s)
  • 09:06 _joe_: uploaded conftool 0.4.0 to jessie-wikimedia experimental
  • 08:54 marostegui: Stop pt-table-checksum on nlwiki.revision - T154485
  • 08:51 marostegui: Run pt-table-checksum on s2 (nlwiki) on revision table - T154485
  • 07:59 marostegui: Run pt-table-checksum on s2 (nlwiki) on logging table - T154485
  • 07:16 marostegui: Deploy alter table enwiki.revision db2069 - T132416
  • 07:14 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2062 and depool db2069 - T132416 (duration: 00m 42s)
  • 07:06 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1060 original load - T158194 (duration: 00m 40s)
  • 03:02 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Feb 23 03:02:10 UTC 2017 (duration 5m 47s)
  • 02:56 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 14m 38s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.12) (duration: 08m 40s)
  • 00:19 dereckson@tin: Synchronized wmf-config/throttle.php: Throttle rule for it.wikiversity (T158767) (duration: 00m 40s)
  • 00:18 Krinkle: mwscript deleteEqualMessages.php --wiki simplewikibooks (T45917)

2017-02-22

  • 23:46 ejegg: turned off 3DS requirement for Denmark on payments-wiki
  • 23:17 matt_flaschen: Exported https://meta.wikimedia.org/wiki/Talk:Flow/Developer_test_page to https://meta.wikimedia.org/wiki/Talk:Flow/Developer_test_page/Wikitext using extensions/Flow/maintenance/convertToText.php
  • 23:17 matt_flaschen: Migrated https://meta.wikimedia.org/wiki/Research_talk:ORES_paper to https://www.mediawiki.org/wiki/Talk:ORES/Paper using extensions/Flow/maintenance/dumpBackup.php and importDump.php
  • 22:53 Pchelolo: update RESTBase to 3340714f0
  • 22:52 jynus: stopping dbstore1001 mariadb in preparation for tomorrow's reimage T153768
  • 22:50 Pchelolo: update RESTBase to 3340714f0: canary on restbase1007
  • 22:46 Pchelolo: update RESTBase to 3340714f0: staging
  • 21:57 maxsem@tin: Finished deploy [kartotherian/deploy@81db48c]: Deploying https://gerrit.wikimedia.org/r/#/c/339093/ (duration: 15m 05s)
  • 21:42 maxsem@tin: Started deploy [kartotherian/deploy@81db48c]: Deploying https://gerrit.wikimedia.org/r/#/c/339093/
  • 20:30 demon@tin: Finished scap: group1 to wmf.13 (duration: 25m 39s)
  • 20:04 demon@tin: Started scap: group1 to wmf.13
  • 20:02 gehel@tin: Finished deploy [wdqs/wdqs@7768422]: (no justification provided) (duration: 02m 04s)
  • 19:59 gehel@tin: Started deploy [wdqs/wdqs@7768422]: (no justification provided)
  • 19:56 gehel: deploying latest wdqs version
  • 19:46 godog: roll-HUP rsyslog on mw1* to pick up DNS udplog change - T123728
  • 19:45 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Finish removing "shellmanagers" on Wikitech T158482 (duration: 00m 40s)
  • 19:37 thcipriani@tin: Synchronized php-1.29.0-wmf.13/extensions/Flow: SWAT: Import dump: support importing a board that exist in the farm T154830 (duration: 00m 56s)
  • 19:34 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Removing the "shellmanagers" group from Wikitech T158482 (duration: 00m 49s)
  • 19:14 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configuration changes for wikitech.wikimedia.org T158516 T158554 T158482 (duration: 00m 40s)
  • 18:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060 with less weight - T158194 (duration: 00m 39s)
  • 18:17 Dereckson: Last two deployment entries were to rollback portals/ to last known state (T158782)
  • 18:17 dereckson@tin: Synchronized portals: (no justification provided) (duration: 00m 39s)
  • 18:17 dereckson@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 40s)
  • 16:29 gehel: reimage of relforge1001 starting
  • 16:21 marostegui: Shutdown db1060 for BBU replacement - T158194
  • 16:20 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1060 - T158194 (duration: 00m 40s)
  • 16:19 ema: cp3006 upgraded to varnish 4.1.5
  • 16:15 ema: cp4019 upgraded to varnish 4.1.5
  • 15:48 moritzm: installing tcpdump security updates on ubuntu systems (jessie already fixed for a while)
  • 15:43 jynus: stopping mariadb replication on db1026 for maintenance T147747
  • 15:21 marostegui: Restart MySQL on db1095 to apply new replication filters - https://phabricator.wikimedia.org/T154485
  • 15:16 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1026 for maintenance (duration: 00m 41s)
  • 15:11 marostegui: Restart MySQL on db1069 to apply new replication filters - https://phabricator.wikimedia.org/T154485
  • 14:50 zeljkof: finished EU SWAT
  • 14:49 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: New throttle rule (T158762) (duration: 00m 41s)
  • 14:30 gehel: resetting to usual values for low/high watermark on elasticsearch eqiad (75% / 80%)
  • 14:17 hashar: Nuked Jenkins workspaces for the job operations-puppet-typos
  • 14:17 zfilipin@tin: Synchronized dblists/compact-language-links.dblist: SWAT: Deploy Compact Language Links in Swedish Wikipedia (T157114) (duration: 00m 50s)
  • 14:17 gehel: temporary raising high/low watermarks on elasticsearch eqiad to allow allocation of all shards
  • 14:04 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic1047.eqiad.wmnet
  • 12:38 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 with low load (3rd time a charm) (duration: 00m 39s)
  • 12:22 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 with low load (again) (duration: 02m 47s)
  • 12:18 dcausse: rebuild of translation memories index is done
  • 12:05 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 with low load (duration: 02m 49s)
  • 12:03 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic10(45|46).eqiad.wmnet
  • 11:48 paravoid: upgrading labmon1001 to grafana 4.1
  • 10:55 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic10(45|46|47).eqiad.wmnet
  • 10:54 moritzm: upgrading remaining mediawiki servers to HHVM 3.12.14
  • 10:54 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic10(35|37|39|43|44).eqiad.wmnet
  • 10:42 elukey: reinstall mw211[89] as MW videoscalers (trusty) and mw2243 as MW jobrunner
  • 10:05 filippo@tin: Synchronized wmf-config/ProductionServices.php: Move udp2log from fluorine to mwlog1001 - T123728 (duration: 00m 41s)
  • 10:01 hashar: enabling puppet on contint1001 and running it
  • 09:56 volans: restarting salt-master on neodymium after openssl upgrade
  • 09:37 ema: cache_text, cache_upload: libssl1.1 upgraded to 1.1.0e-1+wmf1, libevent-2.0-5 upgraded to 2.0.21-stable-2+deb8u1
  • 09:28 hashar: disable puppet on contint1001. Will use contint2001 as a canary
  • 09:14 marostegui: Run pt-table-checksum on s2.nlwiki over some tables - T154485
  • 09:04 dcausse: rebuilding translation memories index - ETA ~4hours (from terbium, logs in ~dcausse/ttm-refresh)
  • 09:02 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic10(35|39|43|44).eqiad.wmnet
  • 08:07 moritzm: upgrading openssl on redis clusters / various base service restarts
  • 07:44 gehel: restart elasticsearch on elastic1035
  • 07:43 gehel: trncating logs on elastic10(35|39|44)
  • 07:23 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2062 - T132416 (duration: 00m 40s)
  • 07:23 marostegui: Deploy alter table enwiki.revision db2062 - T132416
  • 07:18 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2055 - T132416 (duration: 00m 40s)
  • 03:10 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Feb 22 03:10:13 UTC 2017 (duration 5m 46s)
  • 03:04 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.13) (duration: 13m 54s)
  • 02:32 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.12) (duration: 11m 48s)
  • 01:03 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ORES review tool in cswiki T151611 (duration: 00m 39s)
  • 00:48 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Send "exception" channel to logstash Do not send "exception-json" channel to logstash T136849 (duration: 00m 40s)
  • 00:34 thcipriani@tin: Synchronized wmf-config: SWAT: Set $wgSoftBlockRanges T154698 PART II (duration: 00m 42s)
  • 00:33 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgSoftBlockRanges T154698 PART I (duration: 00m 40s)
  • 00:20 thcipriani@tin: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Fix SiteConfiguration array merge syntax T157656 Fix Sentry URL scheme on beta Fix PageViewInfo config T158698 (beta-only changes) (duration: 00m 39s)
  • 00:17 thcipriani@tin: Synchronized php-1.29.0-wmf.12/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Turn off sister search AB test. T157942 (duration: 00m 39s)
  • 00:16 thcipriani@tin: Synchronized php-1.29.0-wmf.13/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Turn off sister search AB test. T157942 (duration: 00m 43s)
  • 00:13 smalyshev@tin: Finished deploy [wdqs/wdqs@7768422]: Deploy 2.1.5RC WAR on 2001 for testing (duration: 00m 25s)
  • 00:13 smalyshev@tin: Started deploy [wdqs/wdqs@7768422]: Deploy 2.1.5RC WAR on 2001 for testing
  • 00:05 demon@tin: Synchronized scap/plugins/clean.py: More code cleanup (duration: 00m 40s)

2017-02-21

  • 23:30 MaxSem: Kartotherian deploy did not happen
  • 23:22 demon@tin: Synchronized scap/plugins/clean.py: Code cleanup (duration: 00m 46s)
  • 23:21 demon@tin: scap aborted: scap/plugins/clean.py Code cleanup (duration: 00m 10s)
  • 23:21 demon@tin: Started scap: scap/plugins/clean.py Code cleanup
  • 22:01 mutante: carbon - removed from icinga, shutdown -h now (T158020)
  • 21:31 mutante: carbon - puppet node clean, node deactivate (T158020)
  • 21:10 demon@tin: Synchronized scap/plugins/prep.py: Completeness (duration: 00m 42s)
  • 20:48 Krinkle: (terbium) sql --write test2wiki 'DELETE FROM module_deps' (3687 rows affected, 0.01 sec) - per T158105.
  • 20:47 Krinkle: (terbium) sql --write testwiki 'DELETE FROM module_deps' (per T158105)
  • 20:44 mutante: carbon - backup /root data to install1002:/root/root-carbon/ before shutdown (T158020)
  • 20:36 mutante: rsyncing /home/ dirs excl. dot files, from carbon to install1002 (T158020)
  • 20:15 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic10(35|39|43|44).eqiad.wmnet
  • 20:08 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.13
  • 19:50 demon@tin: Finished scap: prime wmf.13 - testwiki plus l10n build (pt 3 because ugh) (duration: 17m 17s)
  • 19:32 demon@tin: Started scap: prime wmf.13 - testwiki plus l10n build (pt 3 because ugh)
  • 19:32 demon@tin: scap failed: RuntimeError 2 test canaries had check failures (rerun with --force to override this check) (duration: 15m 00s)
  • 19:17 demon@tin: Started scap: prime wmf.13 - testwiki plus l10n build (pt 2 because T156851)
  • 19:16 demon@tin: Finished scap: prime wmf.13 - testwiki plus l10n build (duration: 26m 15s)
  • 18:49 demon@tin: Started scap: prime wmf.13 - testwiki plus l10n build
  • 18:45 moritzm: installing PHP security updates on iridium (phabricator.wikimedia.org)
  • 18:36 ppchelko@tin: Finished deploy [changeprop/deploy@4706f9d]: Change-Prop: Make ORES return minified responses T157693 (duration: 00m 55s)
  • 18:35 ppchelko@tin: Started deploy [changeprop/deploy@4706f9d]: Change-Prop: Make ORES return minified responses T157693
  • 18:34 Pchelolo: changeprop deploy 4706f9da
  • 18:14 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic10(27|32|34|38|41).eqiad.wmnet
  • 18:12 godog: roll-restart nodepool on labnodepool1001 to pick up statsd.eqiad.wmnet DNS changes - T157022
  • 18:12 godog: roll-restart zuul on cont1001 to pick up statsd.eqiad.wmnet DNS changes - T157022
  • 18:04 godog: roll-restart eventstreams in codfw/eqiad to pick up statsd.eqiad.wmnet DNS changes - T157022
  • 18:03 godog: roll-restart trendingedits in codfw/eqiad to pick up statsd.eqiad.wmnet DNS changes - T157022
  • 17:58 demon@tin: Synchronized tests/multiversion/MWMultiVersionTest.php: No op in prod, completeness, etc (duration: 00m 40s)
  • 17:57 demon@tin: Synchronized multiversion/MWMultiVersion.php: Shut up dumb invalid hostname errors (duration: 00m 52s)
  • 17:50 godog: roll-restart ocg in codfw/eqiad to pick up statsd.eqiad.wmnet DNS changes - T157022
  • 17:47 godog: roll-restart jmxtrans in codfw/eqiad on conf* to pick up statsd.eqiad.wmnet DNS changes - T157022
  • 17:35 godog: roll-restart parsoid in codfw/eqiad to pick up statsd.eqiad.wmnet DNS changes - T157022
  • 17:35 Amir1: done restarting ores services
  • 17:20 Amir1: restarting ores uwsgi and celery services in scb nodes
  • 16:59 ema: cache_misc, cache_maps: libssl1.1 upgraded to 1.1.0e-1+wmf1, libevent-2.0-5 upgraded to 2.0.21-stable-2+deb8u1
  • 16:37 gehel: restarting elasticsearch on elastic1030
  • 16:34 gehel: truncating elasticsearch logs on elastic1023
  • 16:31 gehel: truncating elasticsearch logs on elastic1030
  • 16:18 dcausse: truncated main elastic log, daemon.log and syslog on elastic1023
  • 16:08 moritzm: restarting apache on uranium for openssl update
  • 16:06 dcausse: truncated main log file on elastic1030
  • 15:50 gehel: restarting wdqs-updater on wdqs1002
  • 15:40 elukey: restart eventlogging on kafka200[123] for openssl upgrades
  • 15:40 godog: restart navtiming ve asset-check statsd-mw-js-deprecate on hafnium to pick up statsd.eqiad.wmnet change - T157022
  • 15:39 elukey: restart jmxtrans on kafka[12]00[123] for T157022
  • 15:34 mobrovac@tin: Started restart [mobileapps/deploy@cd3b897]: Restarting for Graphite DNS switch T157022
  • 15:32 elukey: correction on my previous entry: restart eventlogging on kafka100[123] for openssl upgrades
  • 15:30 mobrovac@tin: Started restart [graphoid/deploy@da37386]: Restarting for Graphite DNS switch T157022
  • 15:22 elukey: restart eventlogging on kafka200[123] for openssl upgrades
  • 15:21 mobrovac@tin: Started restart [cxserver/deploy@0e4ae4f]: Restarting for Graphite DNS switch T157022
  • 15:20 moritzm: rolling restart of swift frontend servers to pick up openssl update
  • 15:19 mobrovac@tin: Started restart [citoid/deploy@95df861]: Restarting for Graphite DNS switch T157022
  • 15:18 mobrovac@tin: Started restart [mathoid/deploy@ba3217e]: Restarting for Graphite DNS switch T157022
  • 15:17 hashar: European SWAT complete
  • 15:17 hashar@tin: Synchronized php-1.29.0-wmf.12/extensions/UniversalLanguageSelector/UniversalLanguageSelector.hooks.php: Fix si