Server admin log/Archive 14

From Wikitech
Jump to: navigation, search

December 15

  • 01:09 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'adding CN banner Jimmy1'
  • 01:04 logmsgbot: !log tfinc synchronized php-1.5/wmf-config/CommonSettings.php
  • 23:29 logmsgbot: !log tfinc synchronized php-1.5/extensions/GeoLite/GeoLite_body.php
  • 23:29 logmsgbot: !log tfinc synchronized php-1.5/extensions/GeoLite/GeoLite.php
  • 21:30 Mark: !log Stopped backend squid on sq20; broken disk drive /dev/sda
  • 20:55 Roan: !log Running namespaceDupes on kowiki for bug 20863
  • 18:44 logmsgbot: !log andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Add DPL to strategywiki'
  • 17:35 Mark: !log Shutdown will for decommissioning
  • 14:23 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21439: Add autopatrolled group on simplewiktionary'
  • 14:18 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21571: Allow sysops to add/remove flood flag for all users on simplewiki'
  • 14:13 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21560: Add rollbacker group on cawiki'
  • 11:47 RoanKattouw: Fixed three more broken renames on arwiki about 15 mins ago, but morebots doesn't like Arabic usernames :(
  • 11:36 RoanKattouw: Fixing broken rename from Mohammed Khalil to MK on arwiki
  • 10:23 tomaszf: outage window closed early for payments server. cc pipeline back up.
  • 09:00 tomasz: taking outage on cc payments server.
  • 03:22 tomaszf: bouncing mobile1 through cluster stop/start script
  • 00:16 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Adding Craigs banner and landing page to tracking.'

December 14

  • 22:47 mark: Migrated smokeping from will to streber
  • 22:29 mark: Migrated torrus data from will to streber and moved the torrus service IP
  • 20:08 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21601: space -> _ in ckbwiki namespace alias'
  • 19:27 mark: Moved torrus from will to streber, but pending RRD migration
  • 17:04 mark: Cleaned up configuration of br1-knams
  • 16:54 mark: Moved Rancid from will to streber
  • 16:15 mark: OS-installed Karmic on streber
  • 14:33 RoanKattouw: Running namespaceDupes on enwikinews for bug 21428
  • 14:33 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21428: Add WN: alias on enwikinews'
  • 14:29 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21516: Enable subpages in Template namespace on eswikibooks'
  • 14:24 RoanKattouw: Running namespaceDupes on zhwiki for bug 20641
  • 14:24 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20641: namespace aliases for zhwiki'
  • 14:18 RoanKattouw: Running namespaceDupes on kowiki for bug 20863
  • 14:16 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20863: Add Portal namespace on kowiki'
  • 11:47 logmsgbot: midom synchronized php-1.5/languages/classes/LanguageEt.php

December 13

  • 20:42 RoanKattouw_away: Corrected enwiki arbcom end time from 00:00 UTC on 14 Dec to 23:59 UTC on 14 Dec
  • 19:24 logmsgbot: catrope synchronized php-1.5/extensions/WikimediaMessages/WikimediaGrammarForms.php 'r60010'
  • 19:24 logmsgbot: catrope synchronized php-1.5/languages/classes/LanguageEt.php 'r60010'
  • 19:23 RoanKattouw: Updating Estonian grammar forms for bug 20332
  • 19:08 RoanKattouw: Running cleanupTitles on frwikisource for bug 20741
  • 19:04 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20332: new logos, sitenames for etwiktionary, etwikibooks, etwikiquote'
  • 18:38 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20280: add de, es, fr, it, pl as import sources on enwiki'
  • 10:07 hcatlin: Loss of several thins on mobile1, required cluster restart

December 12

  • 22:41 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20666: disallow anonymous page creation on idwiki'
  • 22:26 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20442: rename Wikipedia namespace on mhrwiki'
  • 22:24 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20442: change sitename for mhrwiki'
  • 22:20 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20280: add meta and nostalgia as import sources on enwiki'
  • 21:50 RoanKattouw: DB trouble on enwiki is over, thanks Domas
  • 21:49 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 21:38 logmsgbot: catrope synchronized php-1.5/wmf-config/db.php
  • 21:38 RoanKattouw: Trying master switch from db22 to db12 (s1) again
  • 21:36 RoanKattouw: Master switch failed because db22 is unresponsive
  • 21:36 logmsgbot: catrope synchronized php-1.5/wmf-config/db.php
  • 21:35 RoanKattouw: Switching master on enwiki from db22 to db12
  • 21:33 RoanKattouw: db22 (enwiki master) is down
  • 20:35 RoanKattouw: Clearing out some ghost entries in categorylinks on {de,en,fr,it,ru}wiki per bug 15152
  • 16:59 RoanKattouw: Strike my last, will do it later
  • 16:58 RoanKattouw: Clearing out some ghost entires in categorylinks on {en,de,fr,ru,it}wiki per bug 15152
  • 16:20 RoanKattouw: Fixing another rename (English peasant -> King of the North East) on enwiki
  • 15:05 RoanKattouw: Cleaning up another incomplete rename (Lucasbfr -> Luk) on enwiki. Running this one in batches because it's larger
  • 14:49 RoanKattouw: Cleaning up incomplete rename RS2007 -> RS1900 on enwiki (bug 12969)
  • 14:33 logmsgbot: midom synchronized php-1.5/includes/User.php 'livehack to check who and when invalidates caches'
  • 14:13 RoanKattouw: Running namespaceDupes on ptwiktionary
  • 00:02 logmsgbot: fred synchronized php-1.5/wmf-config/CommonSettings.php 'Adding 2009_Notice_18_g-22_g-21_g.'

December 11

  • 23:55 logmsgbot: fred synchronized php-1.5/wmf-config/CommonSettings.php 'Adding 2009_Notice46 to tracking.'
  • 21:17 Rob: srv217 mainboard replaced, needs OS reinstallation, then its good to go
  • 21:17 Rob: fred fixed morebots, yay
  • 19:17 Rob: db27 back online, restarting mysql.
  • 18:38 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionReporting.i18n.php
  • 18:33 Rob: cleared errors on ms2 by reseating the power connections
  • 18:33 Rob: cleared errors on ms2
  • 18:29 Rob: db27 is down for hardware replacement
  • 18:26 RoanKattouw: Running namespaceDupes on ckbwiki
  • 18:25 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21601, bug 21415: Create Portal namespace and rename Wikipedia namespace on ckbwiki'
  • 18:23 Rob: shutting down myswl on db27, as it has to come down for hardware replacement
  • 18:12 logmsgbot: catrope synchronized php-1.5/extensions/ContributionReporting/FundraiserStatistics_body.php 'Deploying r59957'
  • 18:12 logmsgbot: catrope synchronized php-1.5/extensions/ContributionReporting/ContributionReporting.i18n.php 'Deploying r59957'
  • 18:03 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20996: import sources for pLwikt, not pTwikt'
  • 17:01 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21783: Only allow sysops to upload on svwiktionary'
  • 16:45 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21794: new logo for ruwikiversity'
  • 16:42 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21792: add autoreviewer group on ptwiki'
  • 16:36 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21783: Activate file uploads on svwiktionary'
  • 16:17 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'actually enable patrolling on ruwikiversity'
  • 16:15 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'fix broken apcond'
  • 16:14 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21629: automatically grant patrol, autopatrol to users with 1000+ edits on ruwikiversity'
  • 16:04 RoanKattouw: Running namespaceDupes on plwikisource
  • 16:04 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21034: WS: alias, change name of user talk namespace on plwikisource'
  • 15:58 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20996: import sources for plwikisource'
  • 12:39 logmsgbot: midom synchronized php-1.5/includes/api/ApiQueryUserContributions.php
  • 12:06 logmsgbot: midom synchronized php-1.5/includes/api/ApiQueryUserContributions.php
  • 08:21 logmsgbot: tstarling synchronized php-1.5/includes/filerepo/LocalRepo.php 'reverted profiling'
  • 08:17 logmsgbot: tstarling synchronized php-1.5/includes/filerepo/LocalRepo.php 'profiling file redirects'
  • 01:26 logmsgbot: midom synchronized php-1.5/includes/api/ApiQueryUserContributions.php
  • 00:37 Tim: killed long-running (34ks) API contributions query on db12, apparently filled up /a/tmp
  • 00:02 logmsgbot: aZaFred synchronized php-1.5/wmf-config/CommonSettings.php 'Adding 2009_Notice_47-48 to tracking.'

December 10

  • 23:03 domas: enwiki is on 5.1
  • 23:03 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 18:49 Fred: ro.planet.w.o is now working
  • 17:05 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 16:05 Rob: srv217 is powered on, but not in LVS pool and is just running dell diagnostics, please do not touch.
  • 15:47 Rob: ms8 required controller board assembly to be reseated, system LOM now can control it properly.
  • 15:39 domas: btw, side effect of my maintenance is that job queue was unclogged, yay.
  • 15:19 Rob: bringing down srv217 to run diagnostics
  • 14:38 domas: s1 master switch to db22-bin.000001:106
  • 14:37 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 14:36 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 14:36 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 14:29 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 14:15 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 14:12 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 12:41 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 12:33 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 11:38 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 11:32 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking down one more server oh noes'
  • 11:30 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 11:29 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 11:18 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 04:55 atglenn: unpacking the last three months of incrementals on ms7
  • 02:19 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 00:34 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'stress testing hehehehehe'
  • 00:27 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 00:26 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php 'increasing db12 load'
  • 00:24 Andrew: db12 recovered, db26 still barely coping with the load
  • 00:17 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php 'unloading general load from db12, excessive lag >800s'
  • 00:09 Andrew: db12 seems not to be replicating at all.
    • Note that it actually was replicating, it was just slow due to excessive read load, added by domas at 23:06 when he balanced the loads incorrectly in db.php.
  • 00:09 Andrew: make that db12, not db22
  • 00:06 Andrew: s1 lagged, db12 282s, db26 33s -- no slave servers up-to-date enough to serve pages

December 9

  • 23:53 logmsgbot: fred synchronized php-1.5/wmf-config/CommonSettings.php 'Adding 2009_Notice_43-44-45 to tracking.'
  • 23:06 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking down one more server oh noes'
  • 22:59 Fred: fixed noc.wikimedia.org
  • 21:08 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'db14 offline'
  • 20:54 atglenn: open support call with sun for ms8, won't power up (no POST), even after resetting ILOM
  • 17:27 logmsgbot: andrew ran sync-common-all
  • 17:18 Andrew: Updating production LiquidThreads installations to new beta version, branch of current alpha.
  • 17:08 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20253 Install Extension:NewUserMessage on strategy wiki'
  • 13:39 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/lqt.js 'Merge r59886'
  • 13:38 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r59885'
  • 13:08 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r59883'
  • 13:05 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/jquery/js2.combined.js 'merge r59882'
  • 12:50 logmsgbot: andrew ran sync-common-all
  • 12:39 Andrew: Updating LiquidThreads to trunk state, using sync-common-all to deploy to apaches (since scap crashes the cluster)
  • 00:04 RoanKattouw: Manually reallocating edits from User:Until It Sleeps to User:The Thing That Should Not Be on enwiki

December 8

  • 23:51 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21718: Allow sysops to add/remove abusefilter too on arwiki'
  • 23:46 logmsgbot: Big_Fred synchronized php-1.5/wmf-config/CommonSettings.php 'Added tracking for Notice_42'
  • 23:31 logmsgbot: catrope synchronized php-1.5/includes/DefaultSettings.php 'Deploying r59858'
  • 23:31 logmsgbot: catrope synchronized php-1.5/includes/Skin.php 'Deploying r59858'
  • 23:31 logmsgbot: catrope synchronized php-1.5/skins/vector/main-rtl.css 'Deploying r59858'
  • 23:31 logmsgbot: catrope synchronized php-1.5/skins/vector/main-ltr.css 'Deploying r59858'
  • 23:31 logmsgbot: catrope synchronized php-1.5/skins/common/ajaxwatch.js 'Deploying r59858'
  • 23:26 TrevorParscal: Running sync-file repeatedly to deploy r59858 (usability fixes)
  • 23:16 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21718: allow bureaucrats to add&remove abusefilter group on arwiki'
  • 23:12 logmsgbot: catrope synchronized php-1.5/wmf-config/abusefilter.php 'bug 21718: Restrict abusefilter-modify to abusefilter group on arwiki'
  • 22:13 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeText.php 'picking up r59852'
  • 19:09 logmsgbot: Big_Fred synchronized php-1.5/wmf-config/CommonSettings.php 'added geo-location for Italie.'
  • 16:43 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20216 AbuseFilter for itwikiquote'
  • 16:43 logmsgbot: andrew synchronized php-1.5/wmf-config/abusefilter.php 'Bug 20216 AbuseFilter for itwikiquote'
  • 16:20 Rob: running diagnostics on srv217, please do not try to put it into service.
  • 16:06 Rob: srv229 memory replaced, system restarted.
  • 15:58 Rob: shutting down srv229 to swap out faulty dimm
  • 15:46 Rob: sq43 disk replaced, system back online and in rotation
  • 15:31 Rob: replaced bad disk in sq43
  • 15:23 Rob: replaced bad disk in sq37, detected, restarted squid backend
  • 15:18 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 15:09 Rob: db27 back up with mysql running.
  • 15:07 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 15:03 Rob: mysql stopped normally on db27, restarting it.
  • 15:02 Rob: was able to hotswap the fan, shutdown was not needed
  • 14:58 Rob: stopping mysql on db27 to shut it down and swap out a bad fan assembly.
  • 14:56 Rob: disregard, did not receive part for db28, just db27
  • 14:52 Rob: db28 powered down already, pulling to replace bad fan
  • 11:12 rainman-sr: searchidx1 rebooted by domas, restarting indexing
  • 11:00 rainman-sr: searchidx1 has a "cp -lr" process stuck in I/O, cannot be killed, dmesg says "aacraid: Host adapter reset request. SCSI hang ?". Stopping indexing until someone sorts this out. Hardware check, reboot?
  • 01:38 domas: new s2 position db30-bin.000001:106
  • 01:28 domas: s2 is pure 5.1 now
  • 01:27 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 01:22 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 01:22 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 01:21 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'up up up'
  • 00:37 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 00:36 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Added Site_Notice41 to tracking.'
  • 00:19 logmsgbot: midom synchronized php-1.5/wmf-config/db.php

December 7

  • 22:16 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out one more server for a clone'
  • 21:47 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Lower $wgClickTrackThrottle to 1 on strategywiki, requested by Eugene and Nimish'
  • 19:37 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Lower $wgClickTrackThrottle from 1000 to 100'
  • 19:35 RoanKattouw: Cranking up click tracking ratio from 1:1000 to 1:100 now that we're only tracking toolbars
  • 19:34 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/ClickTracking/ClickTracking.php 'Deploy r59793, r59794: disable clicktracking for left navigation'
  • 19:33 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/ClickTracking/ClickTracking.js 'Deploy r59793, r59794: disable clicktracking for left navigation
  • 19:29 RoanKattouw: Issues mentioned by Rob probably caused by running svn up as root. Please don't do that again
  • 19:26 Rob: spoke with Roan and Brion and updated the permissions of the wmf-deployment to put all files to wikidev group and all writable by that group
  • 19:24 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php '... and add w:en'
  • 19:22 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20552: Remove b: prefix from import sources on ptwikibooks'
  • 18:41 RoanKattouw: Running namespaceDupes on ttwiki for bug 21656
  • 18:40 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21656: Add Portal namespace on ttwiki'
  • 18:25 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21506: add import sources for plwiki'
  • 18:11 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21755: add commons as import source on testwiki'
  • 17:09 Fred: removed locke from wmnet zonefile since locke doesn't have internal address.
  • 17:04 logmsgbot: aaron ran sync-common-all
  • 16:27 logmsgbot: aaron ran sync-common-all
  • 16:10 aaron: nvm, autopromote was fine since the duplicate setting was supposed to be the new plwiktionary addition
  • 16:17 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'flaggedrevs for plwiktionary'
  • 16:10 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'removed duplicate ptwikisource config (autopromote setting must have been broken)'
  • 14:37 mark: Drive 0 SMART predicted failure on fuchsia
  • 14:36 mark: Power cycling fuchsia
  • 14:31 mark: Enabled knsq7 frontend squid in pybal
  • 14:31 RoanKattouw: Running namespaceDupes on nlwiki for bug 21722
  • 14:30 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21722: Add WP, H, P aliases for nlwiki'
  • 13:59 logmsgbot: mark synchronized php-1.5/wmf-config/mc.php 'Removed decommissioned servers, added in a few more spares'
  • 13:57 mark: Updated memcached list in puppet, wielded out old/down/decommissioned servers
  • 13:23 mark: Removed memcached from srv151-153, 156
  • 13:19 logmsgbot: root synchronized php-1.5/wmf-config/mc.php 'Replaced srv151,152,153,156 by new memcached nodes, added ~30 new spares'
  • 13:04 mark: Installed memcached on srv226..253
  • 11:42 Andrew: Ran sync-common on srv125, srv156 as they missed some of the most recent updates.
  • 11:37 logmsgbot: andrew synchronized php-1.5/wmf-config/ExtensionMessages.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:36 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/pages/ThreadPermalinkView.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:36 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/pages/TalkpageView.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:33 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/pages/SpecialNewMessages.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:33 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/lqt.js 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:30 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/i18n/Lqt.i18n.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:29 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:28 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/ThreadRevision.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:24 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/Thread.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:22 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/api/ApiThreadAction.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:22 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/i18n/Lqt.i18n.php 'Updating LiquidThreads alpha to trunk state (r59781)'
  • 11:16 Andrew: Some root-owned .svn directories spread throughout wmf-deployment.
  • 11:13 Andrew: Updating LiquidThreads alpha to trunk state, this time by quickly sync-file-ing each individual file in a for loop so I don't bring down the site with scap
  • 10:12 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'full load on db30'
  • 00:51 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'reducing load for some inspection'

December 6

  • 23:39 domas: s2 slave load is being handled by single 5.1-wm-3139 server
  • 23:23 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'wheeeeeeeeee'
  • 23:12 domas: srv83 was not in sync groups, but alive and serving load..
  • 22:50 mark: ...on db30 :P
  • 22:50 domas: and yes, it was db30
  • 22:17 domas: s/bank8/bank9/
  • 22:15 domas: bank13 & bank8 MCE warnings ( http://p.defau.lt/?KWMB35Z13ysXpN6IHcca9A )
  • 22:13 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 20:22 mark: Puppetized NTP configuration on the Solaris servers
  • 19:37 mark: Installed CSW pkgutil and puppet on ms4, updated it on ms5
  • 18:00 apergos: resending incrementals for last three months from ms1 to ms7 with -I to get the intermediate snaps. using netcat, running in screen as root on both hosts
  • 15:54 apergos: cleaned up / on ms1, was out of space (tossed some old files from /root)
  • 01:20 mark: Disabled xinetd and extdist crontab on zwinger
  • 00:40 logmsgbot: mark synchronized php-1.5/wmf-config/CommonSettings.php 'Moved svn-invoker (ExtensionDistributor) from zwinger to fenari'
  • 00:27 mark: sq27 is flooding syslog; placed temporary firewall entry for syslog packets on nfs1

December 5

  • 03:26 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionStatistics_body.php 'picking up bugfix from r59753'
  • 00:46 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'adding CN Notice 22'
  • 00:44 atglenn: start transfer of incremental via zfs send (600gb?) from ms1 to file on ms4, in prep for nc to ms7 later, running in screen as root on ms1
  • 00:14 logmsgbot: Fred synchronized php-1.5/wmf-config/InitialiseSettings.php 'changed logo for usabilitywiki.'
  • 00:11 logmsgbot: Fred synchronized php-1.5/wmf-config/InitialiseSettings.php 'changed logo for usabilitywiki.'

December 4

  • 23:30 atglenn: started netcat of the bulk of the data from ms5 to ms7. running in screen as root on both hosts.
  • 23:21 atglenn: started ncat of (small piece of) image date from ms5 to ms7, running in screen as root on both hosts
  • 20:47 Rob: which doesnt work, damn.
  • 20:47 Rob: got sick of racktables.wikimedia.org not redirecting correctly, put in a rewrite for non ssl connections to ssl
  • 20:24 Fred: fixed nrpe on db20 and db7
  • 20:13 logmsgbot: root ran sync-common-all
  • 20:12 Rob: running sync-common-all to update configuration for support of flaggedrevs on plwiktionary
  • 19:20 Rob: srv144 removed from node groups & pybal, nagios resynced.
  • 19:19 Rob: srv144 is out of warranty and rebooting randomly, decommissioning.
  • 19:05 Fred: finished setup of srv245.
  • 19:02 Rob: srv126 removed from node groups and lvs. nagios restarted to exclude it.
  • 19:01 Rob: srv126 refuses to even post when benched, out of warranty, slating for immediate decommissioning
  • 19:00 Rob: srv144 reinstalling with a single hard disk, no more raid1
  • 18:50 Rob: swapped primary srv144 drive with old decommissioned spare. reinstalling OS, will reinstall packages and get online later.
  • 18:45 Rob: sq22 back online, all drives nominal, rebuilding cache and ensuring it is in rotation
  • 18:41 Rob: rebooted sq22
  • 18:38 Rob: rebooted srv144 and srv126
  • 18:36 Rob: srv245 package install failed. I do not have time to tinker with it while in the DC, I have other things that require my physical access to the machines. Leaving it alone for now to work on remotely.
  • 18:28 Rob: srv245 OS installed, setting up wikimedia-task-appserver
  • 18:06 Rob: srv245 was sitting idle with no OS, depooled from apaches. reinstalling system.
  • 17:57 Rob: rebooted srv83 per fred
  • 17:35 Fred: removed srv83 from the nodelist since it was causing ddsh to never finish executing.
  • 17:26 Fred: fixed broken apache. Seems like there is a machine down that is preventing normal sync-file from finishing... Looking into it.
  • 16:50 rainman-sr: stopped logging of search queries on searchidx1 until someone sets up proper log archiving to a different machine
  • 16:48 rainman-sr: searchidx1 had full disk, freed some 100gb of space by deleting logs and stuff laying around
  • 16:14 Rob: srv245 down and unresponsive, rebooting
  • 16:12 Rob: sq43's replacement disk is also bad (talk about bad luck), placing rma with dell. system will remain powered down for now.
  • 15:55 Rob: sq43 isn't seeing a replaced disk, rebooting and troubleshooting
  • 15:33 domas: 'arcconf setcache 1 logicaldrive 0 roff ' - disabling any read caching on db11-db30 RAIDs
  • 15:13 Rob: after tinkering with it with domas, it appears rebuild is indeed automatic. db21 rebuilding raid array
  • 15:09 Rob: db21 bad disk swapped out, rebuild should be automatic
  • 14:57 Rob: sq14 back up, rebuilding its cache
  • 14:54 Rob: sq13 primary disk dead, out of warranty
  • 14:53 Rob: swapping sdc in sq13 and sq14 to bring sq14 back online
  • 14:53 Rob: sq14 disk sdc dead, out of warranty.
  • 05:18 Tim: on fenari: running all pending renameUser jobs from enwiki
  • 03:37 Tim: Around 03:12, accidentally renamed enwiki's job table and so renamed it back a second later. This caused all slaves to stop due to a replication bug. Fixed now.
  • 03:25 Tim: testing fixJobQueueExplosion.php on commonswiki
  • 02:46 Tim: srv156 not responding to ssh, trying reboot
  • 01:13 Tim: restarting job runners
  • 01:13 logmsgbot: tstarling synchronized php-1.5/includes/HTMLCacheUpdate.php 'patching out all category backlink updates, major bug causing job queue to stall'
  • 00:12 Tim: granted access to root@fenari on all servers in the mysql node group

December 3

  • 23:46 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Allow bcrats to add and remove new arbcom group on nlwiki'
  • 23:40 RoanKattouw: Synced InitiatiseSettings.php: allow bcrats to add and remove new arbcom group on nlwiki
  • 22:49 RoanKattouw: Importing 365 images into Commons as User:GeographBot, requested by Multichill
  • 22:39 RoanKattouw: Synced InitialiseSettings.php for bug 21238: self-removal of flood flag on plwiki
  • 22:33 RoanKattouw: Synced InitialiseSettings.php for bugs 20775 and 21719. sync-file is stalling on what seems to be an unresponsive server
  • 21:35 RoanKattouw: Running namespaceDupes on usabilitywiki for bug 21753
  • 21:35 RoanKattouw: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21753 Fix Multimedia talk NS on usabilitywiki'
  • 04:20 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionTrackingStatistics_body.php 'fixing conversion rate bugs'

December 2

  • 23:28 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'reenabling db18 and db25, also, attempting to overwrite stale db.php copies'
  • 23:25 Fred: massaged mc.php to retrieve working spare, and remove broken memcached nodes. all is now good in the land of memcache
  • 22:12 mark: Recovered torrus from deadlock
  • 21:00 Fred: rebooted srv194 (hung)
  • 20:48 Rob: removed bayle and khaldun from dsh, both are in rack running wipe with network pulled
  • 20:38 Fred: bart removed from nagios (well that sounds funny)
  • 20:36 Rob: khaldun is down forever! decomissioned and running wipe in rack with the network pulled
  • 20:35 Rob: isidore rebooted by accident due to power cable issues
  • 20:21 Rob: srv136 crashed with temp warnings, going to decommission it, rebooting to wipe and remove network
  • 20:15 Rob: bart decommissioned, unracked, wipe running on testbench with usbcdrom
  • 19:49 Rob: decommissioned, unracked srv66, srv51, srv81, srv118 (previously removed from pybal)
  • 19:39 Rob: decommissioned srv130, unracked
  • 19:20 Rob: srv122 decommissioned, wiped, unracked
  • 18:19 Rob: ms7/ms8 racked in sdtpa a2, network wired, dns setup, racktables updated, & LOM online
  • 18:18 Rob: serial connection to ps1-a4-sdtpa returned to normal
  • 18:05 Rob: ps1-a4-sdtpa temp losing its serial connection, stealing adapter to setup ms7/8
  • 18:04 Rob: added ms7/ms8 to dns for wmnet and mgmt nics
  • 16:20 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Deploy r59665'
  • 16:19 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/jquery/js2.combined.js 'Deploy r59665'
  • 15:40 Rob: rebooted the following per domas request: srv101 srv105 srv112 srv117 srv138 srv183 srv89 srv84 srv89 srv91 srv96 srv98 srv99
  • 15:02 mark: Shutting down mint for installation of a wifi card
  • 14:35 Andrew: Scapping to update LiquidThreads alpha
  • 14:34 Andrew: Updates of LiquidThreads alpha to trunk state in progress
  • 13:46 RoanKattouw: Remove all LU cache files and rebuild from scratch to mitigate problems with root-owned cache files
  • 13:41 RoanKattouw: Purging LocalisationUpdate hashes and running the update script to ensure we didn't miss anything in the Nov 19 - Dec 1 blackout
  • 05:28 Tim: edited authz on mediawiki svn to deny write access to root
  • 01:17 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'new banner for cn'
  • 01:04 Tim: restarted apache on srv211 and srv127

December 1

  • 20:35 logmsgbot: catrope synchronized php-1.5/includes/DefaultSettings.php 'Deploying r59608 (clicktracking for old toolbar)'
  • 20:34 logmsgbot: catrope synchronized php-1.5/skins/common/edit.js 'Deploying r59608 (clicktracking for old toolbar)'
  • 15:20 apergos: temporarily turned off upload-by-url on test.wikipedia, it interferes with testing the external page retrieval extension for fundraising
  • 01:00 Tim: (45 minutes ago) removed non-stewards from the arbcom admin list
  • 00:26 logmsgbot: tstarling synchronized php-1.5/extensions/SecurePoll/includes/ballots/RadioRangeBallot.php

November 30

  • 23:30 brion: restarted parser test runner on wikitech [@brion fixme: set up init script]
  • 22:54 Fred: rebooting pascal since it once again went down.
  • 21:21 RoanKattouw: Fred moved l10nupdate cronjob on hume to /etc/cron.d/l10nupdate , now runs as brion instead of root
  • 21:21 RoanKattouw: Unlocked /h/w/l10n/trunk/extensions , had been locked since Nov 19th blocking L10nUpdates to extensions
  • 20:06 rainman-sr: set search limit to 51 per Roan
  • 20:06 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/EditToolbar/EditToolbar.php 'Raise style version for r59598, r59599'
  • 20:05 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/EditToolbar/EditToolbar.js 'r59599 for real this time'
  • 20:04 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/EditToolbar/EditToolbar.js 'r59599'
  • 20:00 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/EditToolbar/EditToolbar.js 'r59598'
  • 19:55 rainman-sr: converting searchidx1 back to indexer, things seem to be back to normal now
  • 19:46 atglenn: enabled ExternalPages on test.wikipedia (fundraising extension) after adding it by hand to extension-list and ExtensionMessages.php
  • 19:30 rainman-sr: doing a delayed restart of all searchers
  • 19:10 mark: Setup puppet to enforce desired JVM alternatives on the search servers
  • 17:00 rainman-sr: all search servers (except search11) should be switched back from 64 bit java (java-6-openjdk) to 32 bit java (ia32-java-6-sun)
  • 16:33 rainman-sr: temporarely stop indexing and put searchidx1 in en.wp rotation, will back out when peak times are over
  • 15:28 mark: Puppet removed syslog-ng on nfs1/2 as well. Restored.
  • 14:49 mark: Replacing stock syslogd with rsyslog globally across the cluster (with puppet). rsyslog is the default in Ubuntu 9.10 onwards
  • 10:35 mark: Using streber for karmic install testing
  • 10:35 mark: Using brewster for karmic install testing
  • 10:19 mark: Created karmic-wikimedia APT repository
  • 02:27 rainman-sr: added search11 to en.wp search rotation.. we're getting peak-level traffic on 2am on sunday?!

November 29

  • 21:32 mark: Unmounted /home on ms5
  • 20:58 mark: Disabled NIS on ms5
  • 20:54 mark: Carefully starting puppet management of ms5 (Solaris)... just ssh host key exchanging for now
  • 20:42 mark: Installed Blastwave puppet package (and deps) on ms5
  • 20:39 mark: Installed Blastwave pkgutil on ms5
  • 20:37 mark: Uninstalled coolstack CSKruntime and CSKruby packages on ms5
  • 20:16 mark: Installed coolstack CSKruntime and CSKruby packages on ms5
  • 19:32 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 19:25 domas: s3 replication switched from db11-bin.215:36889084 to db17-bin.001:79
  • 19:22 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 19:22 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 17:40 mark: Fixed puppet config for searchidx1
  • 14:32 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'adding db1 and db7 to s6'
  • 14:16 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Per Domas, do bug 21510 a little bit more performance-friendly'
  • 14:12 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'bug 21510: Second attempt'
  • 14:04 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'backing out last change, breaks'
  • 14:03 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'bug 21510: Temp raise acct create throttle on eswiki for UEM IP for wiki week'
  • 13:54 mark: Installed puppet on db20, added new db20 node block in site.pp
  • 13:37 Tim: loaded arbcom election configuration into SecurePoll on enwiki
  • 13:28 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out db1 for db7 init'
  • 13:19 RoanKattouw: Running l10nupdate on hume for debugging
  • 12:37 domas: s6 start db29-bin.001:10246135
  • 12:32 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'welcome s6'
  • 10:54 domas: enwiki db is now served by 32g/16disk servers only... what is our next upgrade? :-)
  • 10:53 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out db7 from enwiki duty, it is too tiny...'
  • 04:05 logmsgbot: tstarling synchronized wmf-deployment/cache/trusted-xff.cdb
  • 04:03 Tim: updating TrustedXFF to r59540
  • 03:20 Fred: distupgrade on filetserv1 (new kernel)
  • 03:13 Fred: apt-get upgrade on fileserv1...

November 28

  • 22:36 Fred: customized morebots startup script a tad, restarted it as user werdnum instead of root...

November 27

  • 11:27 Raymond_: Run of ParserTests on mw.o/CodeReview is broken. Last run Nov, 25th
  • 06:37 Fred: adjusted site.pp to removed nfs-home stanza for Bayes as it shouldn't be there.
  • 06:02 Tim: on fenari: running SecurePoll/cli/wm-scripts/makeArbcomList.php to generate arbcom election voter list
  • 01:39 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'second attempt'
  • 01:37 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'removing clicktracking right (which does nothing) from the sysop group'
  • 00:29 Tim: fixed catrope's umask and ran: find -user catrope -exec chmod g+w {} \;

November 26

  • 23:54 Tim: removed broken extensions from extension-list: MWSearchUpdateHook.php and LiquidThreads_alpha
  • 23:47 RoanKattouw: Scapped, logmsgbot seems to be broken again in that it doesn't report scaps with the !log prefix
  • 23:36 Tim: on fenari, did a source install of parsekit using the pecl tool, no package was available
  • 23:21 Tim: installing MediaWiki dependencies on fenari
  • 23:17 Tim: on fenari, added /home/wikipedia/bin to $PATH using /etc/profile
  • 23:14 RoanKattouw: Deploying r59465 to make LocalisationUpdate work again for renamed usability messages
  • 17:55 mark: Removed NIS on yongle
  • 17:23 mark: Removing wikimedia-nis-client on all search servers
  • 17:01 mark: Running apt-get upgrade on all search servers
  • 16:43 mark: Removed wikimedia-nis-client on srv225
  • 16:41 mark: Removed wikimedia-nis-client on spence
  • 16:38 mark: Removed wikimedia-nis-client on hume
  • 16:33 mark: Removed wikimedia-nis-client on fenari
  • 16:32 mark: Removed wikimedia-nis-client on amane
  • 16:27 mark: Removed wikimedia-nis-client on srv124
  • 16:27 mark: Replacing all NIS clients by puppet managed users
  • 15:08 mark: Set domain of fenari back to 'wikimedia.org' from 'pmtpa.wmnet'. Why was it changed? Quite a few things got confused...
  • 15:04 mark: Made NFS /home mounting puppet managed on all relevant servers
  • 13:18 mark: Running apt-get dist-upgrade on pdf1, in part to upgrade libpoppler (PDF rendering library)
  • 13:18 mark: Removed rogue apt repositories on pdfd1
  • 02:24 Tim: started apache on spence

November 25

  • 23:46 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php
  • 18:16 Fred: rebooted wikitech linode / added to Nagios for basic monitoring.
  • 16:28 mark: Starting NFS failover test on nfs1 & nfs2
  • 14:57 mark: Setup syslog-ng server on nfs1/nfs2 with puppet. Flipped syslog-ng.pmtpa.wmnet CNAME to point to the NFS service ip

November 24

  • 23:50 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Adding 2009_Notice30_bold'
  • 23:00 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'deebee seventeen and twenteeseven'
  • 22:13 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'adding db21 and db29 back'
  • 21:20 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out db17 and db29 too'
  • 21:16 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out db21 and db27, AND NO PUPPETS'
  • 21:14 mark: Setup logrotate for /home/w/logs/ on both nfs1/nfs2 (of course, with puppet ;-)
  • 19:07 mark: Resized JFS filesystem of /home on nfs2 to fit the new LV size on nfs1/nfs2
  • 19:03 mark: Moved cronned rsync of /home from db20 to nfs1 and nfs2, managed by puppet
  • 18:46 mark: MediaWiki logging, broken by yesterday's NFS service IP migration, restored (using puppet)
  • 18:15 mark: Puppetised rsyncd setup on nfs1/nfs2
  • 18:00 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Flood flag for strategywiki bug 20347'
  • 01:43 Tim: enabled write-back caching on db14 despite broken controller battery, due to excessive lag and MASTER_POS_WAIT()
  • 00:24 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'It turns out that this time FiveFacts is actually referenced as Five_Facts_About_Wikipedia... who knew...'
  • 00:10 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'It turns out that this time FiveFacts is actually referenced as Five_Facts_About_Wikipedia... who knew...'
  • 00:01 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Adding Notice_[36,37,28] and FiveFacts landing page'

November 23

  • 21:42 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Adding RU and IL for GEOIP location.'
  • 21:27 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Adding RU and IL for GEOIP location.'
  • 20:53 river: moving ptolemy to TS vlan to reinstall it as the TS OSM db
  • 20:11 mark: Finished migrating NFS off db20 to a DRBD cluster of nfs2 (primary) and nfs1 (secondary)
  • 19:14 RoanKattouw: prototype got stuck, rebooted it. Linode sucks and we should not use it ever again
  • 16:53 mark: Re-signed puppet cert for fenari... why was it gone?
  • 12:39 domas: spence was taken down by runaway rrdtool process, http://p.defau.lt/?8FAf9FtfQTbkSQKCcP_qmw
  • 11:00 Andrew: Scapping to prod message updates
  • 10:49 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/pages/NewUserMessagesView.php 'Merge r59353'
  • 10:35 Andrew: Scapping
  • 10:34 Andrew: Updating LiquidThreads to trunk state on alpha only
  • 08:28 domas: sq14 has failed sdc
  • 02:05 Tim: fixed clear-profile so that it works on fenari and other ubuntu hosts
  • 01:54 logmsgbot: tstarling synchronized php-1.5/index.php 'test'
  • 01:54 Tim: set up fenari's /root/.ssh/config like the one on zwinger, with its user known hosts file pointing to the global one
  • 01:41 Tim: Moved logmsgbot to fenari. Opened up its UDP port to zwinger and added firewall rules to /etc/rc.local for honour-system access control.
  • 01:21 Tim: commented out the nagios-specific nickserv stuff in ircecho, so that logmsgbot doesn't ghost nagios-wm
  • 01:11 Tim: installed udprec at fenari:/usr/local/bin/udprec
  • 00:55 Tim: on fenari, symlinked ddsh -> dsh (dancer's) so that the sync scripts can work

November 22

  • 14:15 domas: powercycled locke after 10h downtime
  • 10:45 RoanKattouw: Reports that file deletion and undeletion on Commons is broken
  • 03:35 rainman-sr: rebuilding mwsuggest index for en.wp which throws strange lucene exceptions, hopefully fresh built is going to solve it
  • 01:24 rainman-sr: deployed latest lucene-search to test liquidthreads integration

November 21

  • 21:03 domas: sq13 needs reboot and probably hardware maintenance
  • 21:03 domas: sq43 and sq37 need hdd replacements.
  • 20:46 domas: squid peermonitor is receiving 504s from peers...
  • 20:45 domas: firewalled too.
  • 20:44 domas: cleaned up sq13 crap in remote logs on db20, was full / fs

November 20

  • 22:11 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Adding 2009_Notice35 to whitelist'
  • 22:10 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Adding Notice30_EML to whitelist'
  • 19:13 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Added Help_Us_Change_the_World to allowed targets'
  • 18:44 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Added Help_Us_Change_the_World to allowed targets'
  • 18:41 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Enabling tracking of 2009_Notice30_EML'
  • 13:56 mark: OS-installed nfs1
  • 13:09 mark: Renamed auth1 to nfs1 in DNS
  • 13:00 mark: apt-get upgrade and reboot on fenari
  • 12:48 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Re-enabling ClickTracking with throttle 1:1000'
  • 12:46 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/ClickTracking/ClickTracking.php 'Deploying r59284'
  • 12:46 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/ClickTracking/ClickTracking.js 'Deploying r59284'
  • 04:51 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'load test'
  • 04:49 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'load test'

November 19

  • 23:56 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'whitelisting 2009_Notice_30_EML'
  • 22:56 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Enabling GeoIP for HU.'
  • 21:08 mark: Shutting down bayle for decommissioning
  • 21:04 mark: Stopped pdns recursor on bayle
  • 20:20 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Fix error by Aaron allowing anybody to edit foundationwiki, which has wgRawHTML on...'
  • 19:57 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'NS_MEDIAWIKI edit rights for all users on foundationwiki'
  • 17:35 mark: Upgrading pdns-recursor on lily to 3.1.7.1
  • 17:33 mark: Moved service ip 208.80.152.131 (first DNS resolver ip) from bayle to dobson
  • 17:25 mark: Deployed new PowerDNS recursor on mchenry, with puppet
  • 15:14 mark: Deploying new sudoers file on search servers with puppet, correcting sudo access to /usr/bin/jstack for rainman
  • 15:08 mark: Removed stale spamassassin lock files on lily
  • 14:58 mark: Rebooting lily
  • 14:41 mark: Finished upgrade on lily
  • 14:34 mark: Running apt-get dist-upgrade on lily
  • 14:13 mark: Stopped authoritative PowerDNS server on bayle
  • 13:58 mark: Rebooting dobson
  • 13:56 mark: Moved ns0.wikimedia.org service ip from bayle to dobson
  • 13:29 mark: Removed firewall entry on ns1
  • 13:29 mark: Rolled out new PowerDNS and packages on linne with puppet
  • 13:22 mark: Temporarily filtered all incoming DNS packets on ns1 for upgrade
  • 11:57 tomaszf: flipping payments back on .. all is well after updates
  • 10:42 tomaszf: taking outage on payments for updates
  • 10:39 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disable ClickTracking again'
  • 10:38 tomaszf: upgrading activemq version on erzurumi to 5.3
  • 10:37 Andrew: Some JS failures reported, because of ClickTracking extension, which seems to think it is a good idea to hijack all links to API redirects, which doesn't work so great for links with href=javascript:doSomething()
  • 00:25 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 're-enabling ClickTracking hooks'
  • 00:24 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'disabled Special:ClickTracking'
  • 00:23 logmsgbot: tstarling synchronized php-1.5/extensions/UsabilityInitiative/ClickTracking/ClickTracking.hooks.php 'r59230'
  • 00:06 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Whitelisted new banners.'

November 18

  • 22:44 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php
  • 22:00 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Deploy r59222'
  • 21:59 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/jquery/js2.combined.js 'Deploy r59222'
  • 21:41 Andrew: Scapping to deploy updates.
  • 21:41 Andrew: ... and merging r59218 into LiquidThreads
  • 21:40 Andrew: Updating LiquidThreads_alpha to trunk state, and merging rrm: cannot remove directory `.svn/tmp/prop-base': Permission denied
  • 21:14 logmsgbot: catrope ran sync-common-all
  • 21:12 RoanKattouw: Deploying r59214, r59217
  • 20:57 RoanKattouw: Deploying r59212
  • 20:33 logmsgbot: catrope ran sync-common-all
  • 20:31 RoanKattouw: sync-common-all throwing lots of rsync errors "failed to set times on dirname/.svn : Operation not permitted"
  • 20:30 RoanKattouw: Deploying usability fixes for real this time
  • 20:04 RoanKattouw: Deploying usability fixes
  • 20:03 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/FundraiserStatistics_body.php
  • 20:02 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/FundraiserStatistics.css
  • 19:50 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php 'Remove debugging code'
  • 19:31 RoanKattouw: Non-roots can't compile texvc on at least srv124, which breaks sync-common on that server and scap for non-roots
  • 19:28 RoanKattouw: Syncing srv124 (test.wikipedia.org)
  • 19:12 RoanKattouw: Running svn up on test for real this time, in /home instead of /apache
  • 18:56 RoanKattouw: Running svn up on test for usability deployment
  • 17:59 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php 'Debugging'
  • 17:58 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'More debugging'
  • 17:23 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php '-m Debugging code'
  • 06:24 Tim: running logrotate -f on db20
  • 05:39 Tim: logrotate on db20 has been broken since October 23, now the root partition is full. Fixing.
  • 02:55 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'disabling the ClickTracking extension due to scary SQL escaping practices'
  • 00:01 Tim: on ms1, changed the max-age for centralnotice/images to 3600, max-age for the rest of centralnotice remains at 300.
  • 00:01 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/lqt.js 'Merged r59190'

November 17

  • 23:59 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/lqt.js 'Merged r59189'
  • 23:36 Tim: deploying squid configuration (second attempt)
  • 23:32 Tim: squid conf had a syntax error in it, all upload squids crashed, mostly recovered now
  • 23:21 Tim: deploying an aggressive squid refresh_pattern (ignore-reload) for http://upload.wikimedia.org/centralnotice/images
  • 20:50 Andrew: Scapping to deploy LiquidThreads alpha updates, let us hope the site does not go downa gain :)
  • 19:58 domas: frontend squids occasionally have 100% cpu usage
  • 19:57 domas: http://p.defau.lt/?QsA_sTZ_ykf0R_t0dnNtuQ <--- ipvs state during the overload
  • 19:55 domas: hawthorn hitting %si capacity limits after +50% fundraiser image surge
  • 18:01 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 're-adding db26, ixia'
  • 02:13 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php
  • 01:16 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionReporting.php 'picking up 2009 time range'

November 16

  • 23:27 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'updating for 2009 figures'
  • 22:44 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 22:02 Fred: enabled raw data gathering from the squids on Support pages (on Locke)
  • 19:41 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out db18 and db27 for reimaging'
  • 19:03 Fred: root=fred above
  • 19:03 logmsgbot: root synchronized php-1.5/wmf-config/CommonSettings.php 'Added CH and NL landing page for geo-location'
  • 17:24 domas: increased file/connection limits on ms3
  • 14:39 Andrew: To clarify, software update was NOT reverted, as the issue was due to the scap itself.
  • 14:27 Andrew: Site seems to be mostly back up by now.
  • 14:24 Andrew: Site seems to be up again, mostly.
  • 14:18 Andrew: Looks like the error was caused by the scap pushing 4-cpu apaches into swap, causing memcached nodes to fall over, resulting in higher than normal database traffic
  • 14:12 Andrew: Reverting software update
  • 14:12 Andrew: Reports of slowness and down-ness
  • 14:07 Andrew: Scapping to apply r59136, r59135, r59133, r59127
  • 12:24 logmsgbot: andrew synchronized php-1.5/extensions/ProofreadPage/ProofreadPage.php 'Merge r58865 and r59070 from trunk -- fixes for deferred updates.'
  • 12:12 Andrew: scapping to update LiquidThreads alpha
  • 07:35 tomaszf: adding backup cron on db10 for civi and grosley for /srv

November 15

  • 12:41 mark: Restarted NIS client on fenari

November 14

  • 16:13 river: removed db26 from rotation for toolserver dump
  • 16:12 logmsgbot: kate synchronized php-1.5/wmf-config/db.php
  • 02:49 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionTrackingStatistics_body.php
  • 02:14 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionTrackingStatistics_body.php
  • 01:59 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php
  • 01:58 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionReporting.i18n.php
  • 01:54 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionTrackingStatistics_body.php

November 13

  • 23:21 logmsgbot: tfinc synchronized php-1.5/extensions/GeoLite/GeoLite.php
  • 22:55 logmsgbot: tfinc synchronized php-1.5/extensions/GeoLite/GeoLite_body.php
  • 21:45 Andrew: Scapping to update LiquidThreads alpha to LiquidThreads trunk state
  • 17:06 mark: Deployed new sudoers file on the search servers
  • 15:53 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/Dispatch.php 'Updating alpha version to trunk state'
  • 15:52 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/ParserFunctions.php 'Updating alpha version to trunk state'
  • 12:48 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Use the newly setup proxy for upload-by-url'
  • 00:31 logmsgbot: robh synchronized php-1.5/cache/interwiki.cdb 'added outreach to the interwiki map'

November 12

  • 23:45 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'GB not UK for GeoLite'
  • 23:36 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php 'Adjustment'
  • 23:31 Andrew: Trying LiquidThreads alpha fix again, scapping.
  • 20:30 logmsgbot: tfinc synchronized php-1.5/extensions/GeoLite/GeoLite.php
  • 20:10 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php
  • 20:09 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'tinkering with skwikiquote logo settings to force change'
  • 20:06 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'tinkering with skwikiquote logo settings to force change'
  • 19:55 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '21431 Set logo for Bengali Wikibooks'
  • 19:10 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21262 Set q:sk: logo'
  • 19:03 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'setting ru.wikiversity to the default engligh logo until they have a localized version'
  • 18:57 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Adding back in ImportSources for new ru.wikiversity'
  • 18:39 logmsgbot: midom synchronized php/cache/interwiki.cdb
  • 18:28 Rob: bad php error pushed by me, rolled it back, things should come back up now
  • 18:28 logmsgbot: root ran sync-common-all
  • 18:21 logmsgbot: root ran sync-common-all
  • 16:59 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'LiquidThreads_alpha stuff doesn't seem to have taken effect, turned it off for lqt labs for now, will have another stab later'
  • 16:50 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php 'Fix name of LiquidThreads alpha variable (wmgLiquidThreadsAlpha)'
  • 16:49 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Fix name of LiquidThreads alpha variable (wmgLiquidThreadsAlpha)'
  • 16:47 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'commit live hack in r58954'
  • 16:29 Andrew: Re-scapping to update localisation for LiquidThreads alpha (which is technically a "new extension")
  • 16:26 Andrew: tomasz has left some non-group-writeable dirs around in our checkout, please change your umask drwxr-xr-x 3 tfinc wikidev 32 Nov 9 06:44 GeoLite
  • 16:21 Andrew: Which one is used is controlled by a new variable, wmgLiquidThreads_alpha -- true to use the alpha. Only activated on liquidthreads_labswikimedia. Scapping to apply updates.
  • 16:20 Andrew: Split LiquidThreads to LiquidThreads_alpha and LiquidThreads, LiquidThreads_alpha being the most recent version.
  • 15:37 Rob: bayes back online, let the statistic-crunching resume =]
  • 15:29 Rob: updating dns to reflect new mgmt ip for bayes
  • 03:29 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'turning $wgHtml5 back off again due to bugs'
  • 01:37 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enabling $wgHtml5'
  • 00:15 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionTracking/ContributionTracking_body.php

November 10

  • 23:16 Tim: fixed DB errors in ContributionTrackingStatistics by granting SELECT access on drupal.contribution_tracking and civicrm.public_reporting to the contrib_tracking user
  • 19:02 Rob: renamed auth1 to nfs1, replaced disks with 1tb drives, connected eth2 on nfs1 to eth2 on nfs2
  • 17:00 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'adding db27'
  • 02:52 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php

November 9

  • 23:15 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disabling GlobalUsage due to major scaling issues, discussion on private-l'
  • 22:56 logmsgbot: tstarling synchronized docroot/secure/keys.html
  • 22:55 logmsgbot: tstarling synchronized docroot/secure/keys.txt
  • 20:57 Rob: wordpress upgraded for techblog. by upgraded we mean new install and manual copy of existing uplaoded files and customizations.
  • 20:51 Rob: techblog accessibility still spotty. Wordpress upgrades are sometimes painful.
  • 20:12 Rob: messing with updating wordpress for techblog. It failed before so techblog will be up and down during the next few minutes
  • 20:11 Fred: raised the number of apache threads on Singer to accommodate secure traffic.
  • 19:29 mark: Disconnected drbd resource nfshome on nfs2: RAID1 resync was still running, waiting for it to finish before attempting DRBD resync
  • 19:15 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php
  • 18:59 mark: Stopped NFS kernel server on db20 (yes, that's /home)
  • 18:34 mark: Turned off swap on db20: block device needed for DRBD external metadata
  • 17:04 mark: OS-installed nfs2
  • 16:40 mark: Running restart of puppetd across the cluster
  • 16:35 mark: Running backup of db20 /home
  • 16:28 mark: Running dist-upgrade on locke
  • 16:00 mark: fsck repaired /var/backup on mchenry, filesystem remounted
  • 15:55 mark: Added tridge to the ALL node group
  • 15:49 mark: dist-upgrade and reboot of tridge
  • 15:42 mark: Running new fsck on mchenry while the system is up
  • 15:41 mark: /var/backups filesystem on mchenry is corrupt
  • 15:22 mark: Reset DRAC password of sanger
  • 15:18 mark: Rebooting mchenry
  • 15:16 mark: Running apt-get dist-upgrade on mchenry
  • 15:06 mark: Rebooting sanger
  • 15:04 mark: Running apt-get dist-upgrade on sanger
  • 14:41 mark: Shutdown bart for decommissioning
  • 14:36 mark: Deployed cron job to restart puppet daily
  • 14:35 mark: Deployed automatic updating of GeoIP database across the cluster
  • 08:14 brion: killed duplicate wikibugs irc bot connection, stopping double bug reporting in #mediawiki
  • 03:17 Tim: updating SecurePoll to r58802

November 8

  • 22:10 mark: Added the esams ganglia clusters to the Florida grid (why the hierarchy?)
  • 21:52 domas: dropped one snapshot on ixia, other still up.
  • 21:52 Andrew: domas says he did some magic with i/o on ixia, replag back to zero.
  • 21:51 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Turning GlobalUsage back on'
  • 21:41 mark: Experimentally installed unattended-upgrades on server sockpuppet
  • 21:41 mark: Reboot srv156
  • 21:38 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Deactivating GlobalUsage to check impact on ixia replag.'
  • 21:24 Andrew: ixia still has huge replication lag, refreshGlobalImageLinks definitely not still running.
  • 18:31 mark: Unmounted /home on thistle, ixia, lomaria, db1 and db11
  • 17:26 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/Stabilization_body.php 'deployed r58786'
  • 16:51 brion: taking load off ixia to let it catch up; it's still lagged on GlobalUsage inserts
  • 16:51 logmsgbot: brion synchronized php-1.5/wmf-config/db.php
  • 16:09 Bryan: Correction: --start-image=Symbol_neutral_vote.svg --start-page=15088846 --maxlag=5
  • 16:04 Bryan: Parameters to restart refreshGlobalimagelinks.php with on enwiki: --start-image Symbol_neutral_vote.svg --start-page=15088846 --maxlag 5
  • 16:02 brion: stopped andrew's refreshGlobalimagelinks.php job on hume pending fixes and someone to babysit it
  • 15:58 brion: ixia lag from GlobalUsage population script; Bryan poking at it to make it behave nicer
  • 15:55 brion: commonswiki | Connect | 510 | NULL | INSERT /* GlobalUsage::setUsage */ INTO `globalimagelinks` (gil_wiki,gil_page,gil_page_namespace,g |
  • 15:50 brion: circa 8 minutes replag on ixia :(
  • 15:30 brion: adding missing 'zh' variant URL alias (used for default or something? there's a tab for it)
  • 15:25 brion: updating all main.conf wiki entries w/ zh variant alias entries (most missing from all but wikipedia.org and wikisource.org) for bugzilla:19019
  • 13:32 mark: Shutdown khaldun
  • 13:04 brion_: restarted parser test loop on wikitech [fixme: make an init script]
  • 13:01 brion_: parser tests not running since oct 27; looks like wikitech was rebooted then. checking....
  • 10:24 logmsgbot: andrew synchronized php-1.5/extensions/GlobalUsage/SpecialGlobalUsage.php 'Deploy r58752'
  • 10:21 logmsgbot: andrew synchronized php-1.5/extensions/GlobalUsage/GlobalUsage_body.php 'Deploy r58752'
  • 09:50 Andrew: Set up globalusage for all wikis
  • 09:48 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 09:28 apergos: cleaned out some space from /tmp and ran apt-get clean on srv218, these should probably become a cron job
  • 09:27 logmsgbot: andrew synchronized php-1.5/includes/ImagePage.php 'Merge r58692'
  • 09:23 Andrew: scapping to update GlobalUsage
  • 09:20 brion: srv218: rsync: write failed on "/apache/common/php-1.5/wmf-config/InitialiseSettings.php": No space left on device (28)
  • 09:19 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding TimedText namespace on Commons for mdale subtitle awesomeness'
  • 08:43 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Activated LiquidThreads on MediaWiki.org'
  • 08:39 Andrew: Sticking LiquidThreads in opt-in mode on mediawiki.org
  • 05:54 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'putting db1 and db21 live with jafrru dataset'
  • 03:58 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out db1 and db21 for new image load'
  • 01:04 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'putting db21 back live'

November 7

  • 22:14 mark: Started puppet on sq44
  • 22:12 mark: Unstuck puppet on srv149
  • 22:02 mark: Fixed puppet on ixia
  • 21:48 mark: Installed puppet on yongle
  • 21:26 mark: Rolling dist-upgrade finished
  • 21:12 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'removing db21 from rotation - will be used for migration source'
  • 18:35 mark: Moved planet.wikimedia.org DNS CNAMEs to singer
  • 18:35 mark: Setup planet.wikimedia.org instance on singer
  • 17:09 mark: Moved noc.wikimedia.org CNAME to fenari
  • 17:09 mark: Setup noc.wikimedia.org instance on fenari, using puppet
  • 14:57 mark: Doing rolling dist-upgrade of all application servers
  • 14:13 mark: Removed optsview APT repository on srv225
  • 14:12 mark: Removed optsview APT repository on srv224
  • 13:36 mark: Manually upgrading puppet across the cluster to help itself get past a catalog bug
  • 13:03 mark: Setup puppet to automatically upgrade wikimedia-task-appserver on all application servers; in this case it will roll out package php5-geoip across the cluster
  • 09:12 logmsgbot: ariel synchronized php-1.5/wmf-config/InitialiseSettings.php 'enable Collection on strategy wiki per bug 21361'

November 6

  • 23:45 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionTracking/ContributionTracking_body.php 'dropping 2008 theme'
  • 23:39 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php 'dropping 2008 theme'
  • 23:19 mark: Replaced the Webrick puppetmaster setup on sockpuppet by an Apache/Mongrel setup with 4 children to deal with scalability problems
  • 21:29 mark: Increased active record DB connection pool size from 5 to 50... by just editing the ruby library file
  • 18:54 mark: db10 replication caught up with db9
  • 18:08 mark: Set up puppet to run "apt-get update" on every run (for now, may make that somewhat less aggressive if needed)
  • 17:10 logmsgbot: andrew synchronized php-1.5/wmf-config/ExtensionMessages.php
  • 17:08 mark: Setup puppet to exchange all hostkeys between all servers
  • 17:07 logmsgbot: andrew synchronized php-1.5/includes/LinksUpdate.php
  • 17:05 Andrew: Scapping
  • 17:05 Andrew: Set up GlobalUsage on test.wikipedia.org
  • 16:55 mark: Set up puppet to upgrade itself everywhere
  • 16:40 mark: Fixed permission problem in the MySQL data dir, restarted replication on db10
  • 16:37 mark: Fixed LVM problem on db10
  • 16:01 mark: Rebooting db10: OOM errors
  • 15:59 mark: Replication db9->db10 broken again
  • 15:56 mark: Set up MySQL database 'puppet' on db9/db10

November 5

  • 19:59 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/donate_interface/donate_interface.i18n.php
  • 19:51 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/donate_interface/donate_interface.php
  • 18:48 mark: Setup puppet Stored Configurations with Stomp queuing; made a custom /usr/local/sbin/puppetqd since it was missing in the puppetmaster package.
  • 17:58 Rob: transcode1-3 drac setup, servers racked, not in racktables yet, no os yet.
  • 17:42 mark: Upgraded distribution of sockpuppet to 9.10 (Karmic)
  • 17:35 Rob: pushing dns changes for mgmt on transcode1-3
  • 16:54 mark: Upgraded puppetmaster on sockpuppet to version 0.25.1
  • 15:35 mark: Killed all Nagios processes on bart
  • 15:28 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disabled InitialiseSettings.php again on all wikis, except testwiki. Pending HTTP proxy setup and security review.'
  • 15:19 mark: Moved secure.wikimedia.org service IP from bart to singer
  • 15:13 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'enabling upload by URL for sysops on all wikis'
  • 15:09 mark: Moved NameVirtualHost entries to /etc/apache2/conf.d/namevirtualhost on singer, to stop Apache from complaining
  • 15:06 mark: Set up secure.wikimedia.org vhost on singer, SSL proxying to the apaches
  • 14:49 logmsgbot: mark synchronized php-1.5/wmf-config/CommonSettings.php 'Add singer to the SquidServers list'

November 4

  • 23:03 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionTracking/ContributionTracking_body.php
  • 21:09 mark: Stopped most services on khaldun, to prepare for decommissioning
  • 21:06 mark: Setup csw5-pmtpa to use brewster as dhcp ip helper
  • 21:05 mark: Stopped and deinstalled dhcp3-server on khaldun
  • 21:04 mark: Setup new dhcpd on brewster using puppet. Host entries are now factored out to files /etc/dhcp3/linux-host-entries*
  • 18:48 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Disabling paypal portion until contrib tracking works correctly'
  • 18:46 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/payflowpro_gateway/payflowpro_gateway.body.php
  • 17:21 RoanKattouw: Running cleanupTitles on frwikisource
  • 14:50 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/donate_interface/donate_interface.php
  • 14:31 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/payflowpro_gateway/payflowpro_gateway.php
  • 13:54 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/payflowpro_gateway/payflowpro_gateway.php
  • 13:54 RoanKattouw: Updating Special:BrokenRedirects, Special:DoubleRedirects and Special:OrphanedPages on frwikisource; community request related to mass cleanup
  • 13:47 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/paypal_gateway/paypal_gateway.php 'trying again as cluster didn't pickup change'
  • 13:38 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/paypal_gateway/paypal_gateway.php
  • 13:38 logmsgbot: tfinc synchronized php-1.5/extensions/DonationInterface/payflowpro_gateway/payflowpro_gateway.body.php
  • 13:24 RoanKattouw: tomaszf scapped

November 3

  • 23:47 logmsgbot: tstarling synchronized wmf-deployment/cache/trusted-xff.cdb
  • 23:23 Tim: deploying trusted XFF update from r58506
  • 23:12 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'removing db28'
  • 23:10 Tim: restarted udp2log on locke, sampled-1000.log was empty
  • 19:32 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/NewMessagesController.php 'Deploy r58494'
  • 16:12 mark: Recovered torrus from its deadlocked state
  • 15:24 mark: Setup APT repository and preseeding repository on brewster, using puppet
  • 15:21 mark: Moved apt.wikimedia.org DNS CNAME from khaldun to brewster
  • 15:17 mark: Shutdown BGP session to AS 16150; no connectivity to wikitech.wikimedia.org
  • 15:14 mark: Added --timeout=30 to the wget invocations in the post-commit script on svn.wikimedia.org
  • 14:17 Andrew: Scapping
  • 14:09 logmsgbot: fvassard synchronized php-1.5/wmf-config/mc.php 'swapped dead memcache from srv217 to srv156'
  • 14:06 Andrew: Will scap to update LiquidThreads in a few minutes

November 2

  • 19:30 logmsgbot: aaron synchronized php-1.5/includes/FileDeleteForm.php 'deployed r58429'
  • 14:12 Fred: We are running out of memcache spare boxes... Need some cleanup of the "downned memcached"
  • 14:12 logmsgbot: fvassard synchronized php-1.5/wmf-config/mc.php 'swapped dead memcache from srv146 to srv217'

November 1

  • 19:41 rainman-sr: started search indexer on searchidx1
  • 19:21 mark: Renamed db6 to locke in DNS and racktables
  • 19:04 mark: Powercycled frozen server searchidx1
  • 18:59 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Changed proxy host for file-by-url testing on test.wikipedia.org from khaldun to brewster'
  • 18:59 mark: Stopped oprofile on srv218 and removed all samples; its disk was full
  • 18:54 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 18:53 mark: Setup Squid proxy on brewster, and setup selective proxying for security.ubuntu.com through it on all hosts, using puppet
  • 17:44 mark: Set up tftp server on brewster (using puppet)
  • 16:54 mark: Added ubuntu.wikimedia.org DNS entry
  • 16:45 mark: Set up Ubuntu mirror on brewster using puppet
  • 16:45 mark: Fixed Domas's IPC cleanup cron job to not run if there are no stale ipc semaphores
  • 12:48 river: allocated vlan 301 to the toolserver, 10.23.1.0/24 (we're out of public IPs)

October 31

  • 11:18 logmsgbot: catrope synchronized php-1.5/includes/MagicWord.php 'Removing MagicWordArray live hack; problem was found and fixed yesterday'
  • 09:47 Raymond_afk: ParserTests for CodeReview does not run. Last run for r58231 2009-10-27
  • 00:48 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deployed r58380'

October 30

  • 23:11 RoanKattouw: Rebuilt l10ncache on srv194, seems to have fixed it. Presumably caused by an incomplete run of a sync script caused by permissions errors (see private-l)
  • 23:00 Fred: depooled srv194 while testing / fixing
  • 22:08 Fred: actually stopped apache on srv194 since the restart did not help. Investigating....
  • 22:07 Fred: restarted apache on srv194 since it was throwing a lot of ""is not a valid magic thingie for xxxx"
  • 20:05 Fred: scaped as root to resolve cache problems on some boxes.
  • 20:02 logmsgbot: fvassard ran sync-common-all
  • 19:50 RoanKattouw: Scapping in the hopes of resolving the MagicWordsArray exception; looks like it could be a problem with localisation cache
  • 19:37 logmsgbot: catrope synchronized php-1.5/includes/MagicWord.php 'Live hack for debugging: make MagicWordArray::parseMatch parameter not found error more verbose'
  • 18:36 logmsgbot: catrope synchronized php-1.5/extensions/LocalisationUpdate/LocalisationUpdate.class.php 'Deploy r58359 (LocalisationUpdate fix for languages with a '-' in their name)'
  • 18:33 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Per bug 19865, enable the enhanced toolbar by default on enwikinews and disable the OptIn extension'
  • 16:39 RoanKattouw: sync-common-all threw lots of errors like srv175: rsync: failed to set times on "/usr/local/apache/common-local/wmf-deployment/includes/.svn": Operation not permitted (1)
  • 16:38 logmsgbot: catrope ran sync-common-all
  • 16:36 RoanKattouw: Scapping so srv194 will receive updates
  • 15:34 apergos: cleared out some files in /tmp and ran apt-get clean on srv194 to get back about 1gb space on /
  • 14:08 RoanKattouw: @wikimediatech feed on identi.ca has been broken for the past 3 days
  • 14:01 RoanKattouw: srv194 has full disk, causing syncs to fail; throws PHP exception "MagicWordArray::parseMatch: parameter not found"
  • 11:47 rainman-sr: searchidx1 ssh dead, shows 100% I/O, disk dead?

October 29

  • 22:49 logmsgbot: catrope synchronized php-1.5/includes/upload/UploadFromUrl.php 'Re-sync for srv226'
  • 22:48 RoanKattouw: Scapping again for previous deployment because srv226 wouldn't let me log in at that time
  • 20:27 logmsgbot: catrope synchronized php-1.5/includes/upload/UploadFromUrl.php 'Deploy r58339 (upload API fix)'
  • 18:51 mark: Cleaned up lighttpd.conf on khaldun
  • 17:27 mark: OS-installed brewster.wikimedia.org
  • 07:53 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'removing tidy from wmf wiki since its really in CommonSettings.php'
  • 07:51 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'testing disabling tidy on wmfwiki .. now one level higher '
  • 07:44 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'testing disabling tidy on wmf'
  • 06:15 domas: deployed ipc semaphore cleanup cron via puppet
  • 05:51 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'adding new donate skin for wmf wiki'
  • 05:44 logmsgbot: tfinc synchronized php-1.5/extensions/skins/Donate/Donate.class.php

October 28

  • 20:58 Tim: deploying r58282, SecurePoll update
  • 18:17 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding vector as defaul skin for outreachwiki'
  • 18:00 mark: Found an upload squid in the text squid configuration list, causing these 404s - removed
  • 17:56 mark: Removed stripping of caching headers from frontend squids - needed for debugging
  • 17:50 logmsgbot: mark synchronized live-1.5/404.php
  • 17:46 logmsgbot: mark synchronized docroot/wikipedia.org/w/404.php
  • 17:28 Fred: (re)started apache on srv219,234
  • 16:44 rainman-sr: revert r58235 on search1/4 because it seems to confuse pybal and make is depool then randomly
  • 16:33 mark: Increased memory cache of text frontend squids from 10 to 50 MB
  • 02:26 Tim: deploying r58239, LiquidThreads revert
  • 01:44 Tim: trying to use debugging symbols on srv189, shutting down apache there temporarily

October 27

  • 21:14 logmsgbot: tstarling synchronized wmf-deployment/cache/interwiki.cdb
  • 21:13 Tim: fixed outreach.wikimedia.org
  • 21:12 logmsgbot: tstarling ran sync-common-all
  • 21:10 hcatlin: pushed fix to native-iphone-app html... cleared memcached and.... a *momentary meltdown* occured
  • 21:07 RoanKattouw: Last scap deployed r58216
  • 21:06 logmsgbot: tstarling ran sync-common-all
  • 20:59 RoanKattouw: Running LocalisationUpdate/update.php on test
  • 20:58 RoanKattouw: running svn up on test
  • 20:54 Tim: deployed r58215
  • 20:46 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Fix config for strategywiki and usabilitywiki so the enhanced toolbar is enabled by default again'
  • 20:18 logmsgbot: midom synchronized php-1.5/includes/memcached-client.php 'bumping up timeout to 0.1s up from 0.05s - as we do have some megabyte sized objects....'
  • 19:32 mark: Ran "deluser catrope" across the cluster to prompt puppet to recreate
  • 19:30 mark: Fixed admins.pp in puppet, "managehome" attribute had disappeared
  • 17:16 logmsgbot: midom synchronized php-1.5/languages/LanguageConverter.php
  • 10:14 logmsgbot: midom synchronized php-1.5/StartProfiler.php
  • 10:08 logmsgbot: midom synchronized php-1.5/languages/LanguageConverter.php 'oops, this is not entirely right, livehacking for now'
  • 09:58 logmsgbot: midom synchronized php-1.5/languages/LanguageConverter.php 'push locking change live'
  • 07:45 domas: rolled live memcached changes, read/write timeouts down from 1s to 50ms, connect timeouts from 3x10ms with backoff to 2x10ms with no backoff, and fixed some host blacklist bug.
  • 07:42 logmsgbot: midom synchronized php-1.5/includes/memcached-client.php 'HERE WE GO MEMCACHED FIXES'
  • 06:05 domas: fixed perms in survey.wikimedia.org's /srv/org/wikimedia/survey/tmp/ , as well as set display_errors to off, in case there's more incompetence around ;-)
  • 01:39 rainman-sr: turned back on highlighting on en/de/fr, turned off interwiki search on smaller wikis ... we need more servers to cope with increase in traffic on large wikis
  • 01:11 atglenn: disabled search2 from lvs3 pybal config at rainman's request (it had load 21)
  • 01:01 rainman-sr: could someone please remove search2 from lsv3 search group ASAP
  • 00:15 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/pages/SpecialNewMessages.php 'Deploy r58176'

October 26

  • 20:45 Andrew: scapping to update LiquidThreads
  • 20:16 Andrew: Going to update LiquidThreads to trunk state in a few minutes
  • 16:08 rainman-sr: overloads all around, turned off en/de/fr wiki highlighting so that searchs don't time out
  • 11:10 hcatlin: reworked mobile1's config so that its more standardized and more of the config is in the repo
  • 08:53 domas: updated nagios to reflect changed server roles
  • 08:43 domas: dewiki is now separate cluster, s5, replication switch over done at http://p.defau.lt/?kfvvlNOc4TkJ_6SCAVe6mg
  • 08:42 logmsgbot: midom synchronized php-1.5/wmf-config/CommonSettings.php 'dewiki readwrite'
  • 08:40 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'restructuring s2dewiki into s5'
  • 08:38 logmsgbot: midom synchronized php-1.5/wmf-config/CommonSettings.php 'dewiki read-only'
  • 07:57 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'entirely separating dewiki slaves'
  • 06:54 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out db4 for copy to db23'
  • 05:45 logmsgbot: midom synchronized php-1.5/wmf-config/db.php

October 25

  • 15:23 domas: converting usability initiative tables to InnoDB...
  • 13:23 domas: set up snapshot rotation on db10
  • 12:36 hcatlin: mobile1: created init.d/cluster to correct USR1 sig problem, fully updated sys ops on wikitech
  • 12:03 domas: Mark, I'm sure you'll like that! ;-p~
  • 12:02 domas: started sq43 without /dev/sdd COSS store (manual conf hack)
  • 11:54 domas: removed ns3 from nagios, added ns1
  • 11:45 domas: bounced ns1 too, was affected by selective-answer leak (same number as ns0, btw, 507!) ages ago, just not noticed by nagios. this seem to resolve some slowness I noticed few times.
  • 11:41 domas: bounced pdns on ns0, was affected by selective-answer leak

October 24

  • 16:49 rainman-sr: decreasing maximal number of search hits per request (e.g. page) to 50
  • 16:40 apergos: re-enabled zfs replication from ms1 to ms5, set to 20 minute intervals now, keeping an eye on it to see if we have failures in running to completion
  • 13:28 rainman-sr: finished restructuring en.wp, continuing with normal incremental search updates
  • 11:50 domas: removed hardy-backports from fenari sources.list, added bzr ppa to sources.list.d/bzrppa.list

October 23

  • 23:37 logmsgbot: tstarling synchronized wmf-deployment/cache/trusted-xff.cdb
  • 23:31 logmsgbot: tstarling synchronized wmf-deployment/cache/trusted-xff.cdb
  • 23:24 Tim: updating TrustedXFF (bolt browser)
  • 22:36 domas: db28 has multiple fan failures (LOM is finally able to do something :) - still needs datacenter ops
  • 22:20 domas: db28 is a toast, needs cold restart by datacenter ops, LOM not able to do anything
  • 22:20 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'db28 dead'
  • 11:17 domas: Fixed skip-list of cached query pages, was broken for past two months :)
  • 10:54 logmsgbot: midom synchronized php-1.5/thumb.php 'removing livehack'
  • 10:52 domas: rotating logs becomes difficult when they become too big, so they continue to grow indefinitely! db20 / nearly full, loooots of /var/log/remote ;-)
  • 10:39 domas: who watches the watchers? :) rrdtool process on spence was using 8G of memory. :-) http://p.defau.lt/?NOGuiw1ht_9_4KmtD3r4lA
  • 10:24 domas: semaphore leaks made some apaches fail, failed apache in rendering farm was not depooled, thus having 404 handler serve plenty of "can't connect to host" broken thumbs.
  • 10:12 domas: apparently there're intermittent connection failures from ms4 to scalers
  • 09:56 logmsgbot: midom synchronized php-1.5/thumb.php 'error header livehack'
  • 04:04 domas: noticed intermittent network failure inside pmtpa - most of input on this has been one stalling SSH session and spence failure
  • 04:01 domas: switched jobs table on db22 with an empty one, old one was having just few noop entries and five million invalidated rows... hit interesting (but probably easy to fix) performance problem at mtr_memo_release/mtr_commit code inside MySQL :)
  • 03:16 Fred: restarted powerdns on ns2 to kill some zombies with a double tap :p

October 22

  • 21:48 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/language/FlaggedRevs.i18n.php 'deploy r58038'
  • 16:25 Andrew: Updating LiquidThreads to trunk state again
  • 13:34 Andrew: Updating LiquidThreads to trunk state, scapping.

October 21

  • 22:30 Tim: upgraded libpoppler2 on all apaches
  • 21:20 Tim: updating ubuntu mirror
  • 20:50 Tim: apt-get upgrade on pdf1 for USN-850-1
  • 19:59 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 13:37 rainman-sr: restarting search incremental update on all wikis, will sync to search servers when updates catch up

October 20

  • 20:16 RoanKattouw: Brion synced r57957
  • 19:59 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '21023 Allow bureaucrats to remove sysop flag on Simple English WIkiquote'
  • 15:54 hcatlin: Deployed changes to S60. Stopping redirect for Nokia Series60, because it needs more work.
  • 14:38 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/api/ApiThreadAction.php 'Merge r57945'
  • 14:32 Andrew: Updating LiquidThreads again, scapping
  • 14:03 Andrew: Updating LiquidThreads, scapping
  • 12:20 hcatlin: Updated Common.js on en.wiki to add in redirects for the new batch of supported devices
  • 11:57 hcatlin: deploying updated software to mobile1
  • 04:39 Tim: fixing static.wikipedia.org, broken due to lack of DumpHTML extension in wmf-deployment
  • 00:42 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19865 en.Wikinews would like to make Vector the default skin'
  • 00:37 logmsgbot: robh synchronized php-1.5/wmf-config/CommonSettings.php 'Bug 20361'

October 19

  • 23:12 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/api/ApiThreadAction.php 'Merge r57930'
  • 23:05 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/lqt.js 'Merge r57928'
  • 23:05 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/api/ApiThreadAction.php 'Merge r57928'
  • 23:04 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Merge r57928'
  • 22:52 rainman-sr: starting complete rebuild of en.wp index
  • 22:31 atglenn: turned off replication on ms1 to ms5 (it was failing), running manually to catch up
  • 22:27 Fred: restarted apache on srv[105,117,173]
  • 22:19 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/lqt.js 'Merge r57924'
  • 21:56 Andrew: Scapping
  • 21:44 Andrew: Updating LiquidThreads to trunk state again for bugfixes and ajax improvements.
  • 19:50 rainman-sr: stopping search incremental updates to rebuild en.wp index for increased efficiency
  • 19:39 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 19:37 logmsgbot: tstarling synchronized php-1.5/includes/User.php 'r57910'
  • 19:36 logmsgbot: tstarling synchronized php-1.5/includes/DefaultSettings.php 'r57910'
  • 18:54 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 18:47 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'rate limit exclusion for seminar'
  • 14:50 logmsgbot: andrew synchronized php-1.5/wmf-config/mc.php 'Swap out srv156'
  • 14:29 Andrew: Wikitech is insanely slow :)
  • 14:27 Andrew: Scapping
  • 14:25 Andrew: Updating LiquidThreads installation to trunk state.
  • 13:33 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Deployed AbuseFilter to LiquidThreads Labs Site'

October 17

  • 06:45 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Deploy r57856'
  • 05:12 logmsgbot: aaron synchronized php-1.5/includes/api/ApiQueryWatchlist.php 'deployed r57852'
  • 04:57 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deployed r57850'
  • 04:57 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.class.php 'deployed r57850'
  • 04:27 logmsgbot: aaron synchronized php-1.5/extensions/CodeReview/backend/CodeTestSuite.php 'deployed r57848'
  • 04:22 logmsgbot: aaron synchronized php-1.5/includes/api/ApiQueryWatchlist.php 'deployed r57846'
  • 03:58 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/RevisionReview_body.php 'deployed r57843'

October 16

  • 19:41 Andrew: Scapping
  • 19:38 Andrew: updating LiquidThreads to trunk state (r57832)
  • 16:55 logmsgbot: andrew synchronized php-1.5/extensions/UsabilityInitiative/js/js2.combined.min.js
  • 05:53 Tim: disabled password auth for sshd on mayflower, to simplify debugging of authentication failures for new users
  • 01:05 domas: fixed locke, brought udp2log up in screen, and collector somewhere else :)
  • 00:59 domas: next time someone leaves NIC above HDD in boot order....
  • 00:47 domas: locke stalled, connected to DRAC via db6.mgmt, runs installer after reboot...

October 15

  • 20:53 tomaszf: adding payments.wikimedia.org to loudon
  • 17:04 RoanKattouw: Getting LQT-related DB errors on prototype, trying update.php
  • 16:02 Tim: updated extdist snapshot
  • 10:50 logmsgbot: aaron synchronized php-1.5/includes/LogEventsList.php 'deployed r57767'
  • 09:29 logmsgbot: andrew synchronized php-1.5/includes/filerepo/ArchivedFile.php 'Deployed r57755, fixes for fatal errors introduced in r57602 by blindly copy/pasting code.'
  • 09:27 logmsgbot: andrew synchronized php-1.5/includes/filerepo/OldLocalFile.php
  • 07:12 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/Stabilization_body.php 'done, reverted'
  • 06:39 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/Stabilization_body.php
  • 06:39 aaron: investigating small flaggedpages tracking bug
  • 04:43 apergos: remove manually oldest dumps for fr, ru, en, es, pt wikis to get a bit more room on storage2 (til monitor catches up with the config change on new dumps)
  • 04:08 Tim: running apt-get upgrade on mayflower
  • 04:01 apergos: changed number of dumps we keep from 10 to 9 since storage2 was low on space (99% = 75G left)
  • 03:07 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedArticle.php
  • 03:04 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/StablePages_body.php 'deployed r57735, r57736'
  • 03:03 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedArticle.php 'deployed r57735, r57736'
  • 03:03 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevsXML.php 'deployed r57735, r57736'
  • 03:03 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.class.php 'deployed r57735, r57736'

October 14

  • 23:26 Fred: created http://wikitech-static.wikimedia.org/ as a static dump of wikitech (read faster, but read-only)
  • 21:27 brion: r57727 wmf-deployment loaded but scap delayed due to usability test (?!)
  • 20:56 logmsgbot: tfinc synchronized php-1.5/extensions/FundraiserPortal/FundraiserPortal.php 'pushing changes from r57629'
  • 20:54 logmsgbot: tfinc synchronized php-1.5/extensions/FundraiserPortal/Templates/Tourmaline.css
  • 20:54 logmsgbot: tfinc synchronized php-1.5/extensions/FundraiserPortal/Templates/Sapphire.css
  • 20:54 logmsgbot: tfinc synchronized php-1.5/extensions/FundraiserPortal/Templates/RubyText.css
  • 20:54 logmsgbot: tfinc synchronized php-1.5/extensions/FundraiserPortal/Templates/Ruby.css
  • 19:46 mark: Deployed squid config to sq43 which was running with an outdated squid.conf. This would explain the weird 404 thumbnail problem, because sq43 was contacting eiximenis which was an upload squid but is now a text squid...
  • 18:00 Fred: rebooting sq43.
  • 16:49 Fred: sq47 has a bad sdd drive. Adjusted the squid config to reflect.
  • 16:17 Fred: restarted a couple of downed apaches and Squids.
  • 16:08 logmsgbot: fvassard synchronized php-1.5/wmf-config/mc.php 'removed srv144 and replaced it with srv215.'
  • 12:20 hcatlin: deploying WAP support (thedj) and Nokia Series60 support to mobile1
  • 12:17 mark: Removed full copy snapshot 'copy-snap' on ms3
  • 09:26 Andrew: Scapping to deploy LiquidThreads updates

October 13

  • 22:48 rainman-sr: turn off interwiki search for "other wikis" to help out with en.wp overload during peak times
  • 16:32 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/LiquidThreads.php 'Merge r57673'
  • 16:29 Andrew: (Note: In opt-in mode per-page.)
  • 16:29 Andrew: Deployed LiquidThreads to the strategy wiki
  • 16:27 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 16:23 Andrew: Preparing to deploy LiquidThreads to strategy wiki, in opt-in mode per page.
  • 16:15 Andrew: Enabled LiquidThreads on test.wikipedia.org, came across some fun interactions with drafts and turned it off again.
  • 15:53 logmsgbot: andrew synchronized php-1.5/extensions/OAI/OAIRepo_body.php '-m Merge r57672'
  • 15:52 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/LiquidThreads.php '-m Merge r57672'
  • 14:52 logmsgbot: andrew synchronized php-1.5/extensions/OAI/OAIRepo_body.php 'Merge r57666'
  • 14:51 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/LiquidThreads.php 'Merge r57667'
  • 14:50 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Hooks.php 'Merge r57667'
  • 13:38 mark: Removed yaseo from Squid configs, and set CARP weight of eiximenis to 40
  • 11:41 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Forgot the talk namespace'
  • 11:40 Andrew: Added 'Multimedia' namespace to usability wiki due to private request from somebody working on the Ford grant.
  • 11:39 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 11:10 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Fix for dumping/search'
  • 06:53 domas: off to Japan (until Oct22) \o/

October 12

  • 23:19 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'flaggedrevs lab config'
  • 16:39 hcatlin: ...to mobile1
  • 16:38 hcatlin: Pushed out Akan language support.
  • 16:09 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/pages/TalkpageView.php 'Deployed r57656'
  • 16:08 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Deployed r57655'
  • 16:07 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/i18n/Lqt.i18n.php 'Deployed r57655'
  • 14:24 hcatlin: updated code on mobile1. a couple minutes of downtime due to mailer config errors.
  • 14:22 Andrew: Deployed r57651 (LiquidThreads)
  • 14:22 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php

October 11

  • 20:17 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'patrol for autopatrolled group on frwiki'
  • 14:02 mark: Cleaned up BGP sessions on br1-knams
  • 13:06 hcatlin: updated mobile1 with one-hour expiration of cache to see if that helps anything.
  • 03:23 Tim: restarted thin on mobile1, apparently it didn't reconnect to memcached at all when memcached came back up
  • 03:10 Tim: in light of bug 20653 (complete lack of cache invalidation), reduced memcached memory on mobile1 to 5GB, to reduce the effective expiry time. Will monitor CPU but past cache clear events suggest it won't be a problem.

October 10

  • 21:40 domas: nuking htmlCacheUpdate jobs for Biography_articles_with_listas_parameter (a hidden category)...
  • 20:39 logmsgbot: midom synchronized php-1.5/includes/LogEventsList.php
  • 18:16 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deployed r57620'
  • 18:15 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedArticle.php 'deployed r57620'
  • 09:05 domas: resolving job queue constipation: alter table job drop key job_cmd, add key job_cmd (job_cmd,job_namespace,job_title,job_params(128));
  • 02:18 logmsgbot: aaron synchronized php-1.5/includes/LogEventsList.php 'deployed r57602'
  • 02:17 logmsgbot: aaron synchronized php-1.5/includes/filerepo/OldLocalFile.php 'deployed r57602'
  • 02:17 logmsgbot: aaron synchronized php-1.5/includes/filerepo/ArchivedFile.php 'deployed r57602'

October 9

  • 21:55 hcatlin_: restarted the mobile1 cluster
  • 21:37 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'crank wgClickTrackThrottle to 1:1 to see what happens'
  • 21:18 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'bump ClickTracking throttle to 1:100 edit reqs'
  • 21:10 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 21:10 brion: turning on ClickTracking at 1:1000 sampling sitewide
  • 21:03 logmsgbot: andrew synchronized php-1.5/includes/ChangesList.php
  • 20:44 rainman-sr: running /home/rainman/scripts/build-new liquidthreads_labswikimedia on searchidx1
  • 17:57 brion: installing DB tables for ClickTracking ext on all wikis
  • 17:15 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Deploy r57572'
  • 15:58 Andrew: Updating LiquidThreads to trunk state @r57567
  • 14:12 Rob: pushed updates to some blog software on blog and techblog
  • 05:43 Fred: restarted thin on mobile1.
  • 05:39 apergos: restarted memcached on mobile1 (it had died), ... thin server is being an expletive deleted though
  • 05:16 Fred: mobile1 doesn't have a swap partition for some reason...
  • 05:15 Fred: mobile1 went OOM
  • 05:07 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'autopromote logging in RC for plwiki'
  • 00:04 brion: cranked ClickTracking to 1:1 on test
  • 00:03 brion: testing ClickTracking on testwiki: 1/1000 sampling

October 8

  • 23:52 logmsgbot: tstarling synchronized wmf-deployment/cache/trusted-xff.cdb
  • 23:50 Tim: updating TrustedXFF
  • 23:35 logmsgbot: brion synchronized php-1.5/extensions/UsabilityInitiative/ClickTracking/ClickTracking.php
  • 20:03 logmsgbot: midom synchronized php-1.5/wmf-config/CommonSettings.php 'profiling host change'
  • 19:56 logmsgbot: andrew synchronized php-1.5/includes/Revision.php 'Deploy r57530, fix for leaking of oversighted data'
  • 19:46 logmsgbot: andrew synchronized php-1.5/includes/Revision.php 'Deploy r57530, fix for leaking of oversighted data'
  • 18:55 brion: fixed thumb rendering on test.wikipedia; caveat that thumb.php reqs are sent to synced copy, not nfs master
  • 18:54 logmsgbot: brion ran sync-common-all
  • 18:50 brion: sync-common-all to update MWVersion.php w/ hackaround for test thumbs.php
  • 18:50 logmsgbot: brion ran sync-common-all
  • 18:45 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '21059 Enable patrol rights on commons now with autopatrolled'
  • 18:40 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 18:39 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 18:39 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 18:25 mark: Installed server dobson
  • 17:55 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '21059 Enable patrol rights on commons'
  • 16:31 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Deployed r57525'
  • 16:14 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Deploy r57520-3'
  • 15:15 Fred: rebooting srv156
  • 15:14 Andrew: 4-cpu apaches seem to be out of memory and swapping
  • 15:04 Andrew: Scapping
  • 15:03 Andrew: Applying LiquidThreads schema change to liquidthreads_labswikimedia and updating/deploying LiquidThreads to r57518 state.
  • 14:52 mark: Redeployed ntp server on linne using puppet
  • 14:32 mark: DNS cleanup: removed all traces of yaseo, and 66.230.200.0/24
  • 14:26 mark: Unbound 66.230.200.10/24 range from csw5-pmtpa interface ve5.
  • 14:24 mark: Removed ip 66.230.200.234 from zwinger
  • 13:38 mark: Restarting all ServerTech CDUs for NTP configuration change to take effect
  • 07:16 apergos: shot the pr_query_count queries on dbs 17 18 19 21 25 as domas did his sync, they are all cleaned up now
  • 07:02 logmsgbot: midom synchronized php-1.5/extensions/ProofreadPage/ProofreadPage.php 'wmf:r57507'
  • 06:00 Fred: restarted srv100-139-150 for good measure since apache was not running.

October 7

  • 22:51 AaronSchulz: running populateLogSearch.php to fill suppression log search filter data
  • 22:39 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedArticle.php 'deployed r57501'
  • 21:30 Fred: powercycling pdf1 -> c�gone unresponsive even from serial console
  • 21:29 logmsgbot: brion synchronized php-1.5/includes/specials/SpecialContributions.php
  • 21:27 brion: pdf1 unreachable
  • 20:12 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'deleted content rights for checkusers on frwiki'
  • 20:11 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'deleted content rights for checkusers on frwiki'
  • 19:41 Fred: ganglia 3.1.2 has been installed on spence. Configuration still in progress, but mostly done. http://ganglia3.wikimedia.org
  • 19:28 Rob: resynced nagios for all the removed and decommissioned servers
  • 19:28 Rob: decommissioned srv142 & srv143. wiped, unracked
  • 19:12 Rob: srv85 decommissioned
  • 19:09 Rob: srv148 dead, decommissioned, removed from dsh, lvs, and server roles
  • 19:04 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'removed second level for now'
  • 19:00 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'update flaggedrevs_labs wiki config'
  • 18:59 Rob: decommissioning srv51, removed from server roles, node groups, lvs, and wiping in rack
  • 18:56 Rob: decommissioned srv48, srv49, srv50. removed from node groups, server roles, wiped, unracked.
  • 18:50 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deployed r57480'
  • 18:40 mark: Installed puppet on grosley
  • 18:39 logmsgbot: brion synchronized php-1.5/includes/specials/SpecialContributions.php 'sync r57479 spacing fix for contribs tools links'
  • 18:34 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'update flaggedrevs_labs wiki config'
  • 18:33 brion: cleared flaggedrevs, flaggedpage_config tables on flaggedrevs_labwikimedia: prep for new config test
  • 18:28 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deployed r57475'
  • 18:28 mark: Manually updated ntp.conf on isidore
  • 18:18 logmsgbot: brion synchronized php-1.5/includes/OutputPage.php 'r57473 fix for fatals in edge case'
  • 18:12 mark: Manually updated ntp.conf on srv1, srv2 and bart
  • 18:06 logmsgbot: brion synchronized php-1.5/includes/HistoryPage.php 'r57471 perm fix for show/hide revs box on history page'
  • 17:46 logmsgbot: brion synchronized php-1.5/includes/OutputPage.php 'sync r57468 fix -- mdale changes to script loading broke wgStylePath again'
  • 17:46 mark: Installed puppet on loudon and locke
  • 17:43 mark: Installed puppet on sq1
  • 17:41 mark: Installed puppet on tridge
  • 17:28 brion: js2-borne code is breaking wikibits.js again; mdale please make sure $wgStylePath is respected
  • 17:22 brion: during update some brief borkage with missing stylesheets. inconsistency in file update order? seems ok now
  • 16:58 Rob: decommissioning srv48, srv49, srv50. removed from server_roles, node groups, running wipe in rack
  • 16:52 Rob: srv81 crashed, decommissioning, removed from node groups, server roles, lvs, wiping in rack
  • 16:35 Rob: pulled srv44-srv47 from rack, decommissioned
  • 16:17 Rob: srv31 already wiped, unracked
  • 16:15 mark: Installed puppet on singer
  • 16:14 Rob: srv31 still racked, but is crashed, decommissioning.
  • 16:12 mark: Installed puppet on streber
  • 16:10 Rob: decommissioning srv137, pulled from rack, wiped, pulled from nodegroups and lvs
  • 16:09 mark: Installed puppet on mchenry and sanger
  • 16:08 mark: Removed puppet again on isidore; it's ubuntu 7.04 and really needs to be replaced soon
  • 16:06 mark: Installed puppet on isidore
  • 16:05 mark: Installed puppet on pdf1
  • 16:00 mark: Installed puppet on bayle and bayes
  • 15:59 mark: Installed puppet on db20
  • 15:58 Rob: negate that, not depooling.
  • 15:58 Rob: depooling sq43 until later today when i can poke at it.
  • 15:55 Rob: srv131 decommissioning, removed from lvs, node groups and server roles, wiping in rack.
  • 15:49 mark: Installed puppet on spence and williams
  • 15:47 mark: Installed puppet on lvs2-4
  • 15:42 Rob: redeployed sq43
  • 15:41 mark: Installed puppet on storage1-3
  • 15:35 Rob: srv130 was down, out of warranty, decommissioned, removed from node groups and lvs, wiping in rack
  • 15:32 mark: Shutdown srv9 for decommissioning
  • 15:31 mark: Installed puppet on amane and erzurumi
  • 15:25 mark: Installed puppet on fenari and mobile1
  • 15:20 mark: Installed puppet on ms2 and ms3
  • 15:19 Rob: srv122 decommissioning, pulled from nodes and lvs, wiping in rack
  • 15:10 mark: Installed puppet on hume
  • 15:10 Rob: srv90 pulled, decommissioned, wiped, and out of node groups and nagios
  • 15:07 Rob: lucene group in nagios sync was halting it, since no hosts exist in that group anymore. commented out group in nagios conf.php and synced.
  • 15:00 Rob: server rose decommissioning, removed from nodegroups, pulled from rack, removed from server roles
  • 14:57 Rob: srv118 running wipe, removed network connection.
  • 14:56 Rob: srv118 down, reboots for no good reason, out of warranty, decommissioning, removed from LVS, nodegroups, wiping in rack
  • 14:43 Rob: decommissioned coronelli, pulled from rack, removed from server roles, removed from node groups
  • 14:39 Rob: decommissioned maurus, pulled from rack, removed from server roles, removed from node groups
  • 02:20 Tim: hard reset of mobile1, was not responding on ssh or serial
  • 02:18 brion: mobile1 been down for a bit; Tim is poking at remote console to reboot it
  • 01:54 brion: testing svn up on test

October 6

  • 23:24 logmsgbot: brion synchronized php-1.5/extensions/OggHandler/OggHandler.php 'update defualt search path for oggThumb'
  • 22:49 logmsgbot: brion synchronized php-1.5/extensions/OggHandler/OggHandler_body.php 'update with oggThumb support'
  • 22:49 logmsgbot: brion synchronized php-1.5/extensions/OggHandler/OggHandler.php
  • 17:24 Fred: Loudon.w.o has been commissioned for use by the Usability team for a couple days/weeks.
  • 15:47 logmsgbot: robh ran sync-common-all
  • 15:11 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '20973 Additional transwiki import sources on Incubator'
  • 15:06 Rob: updating dns for ro.planet
  • 14:49 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '20986 Please change the Bengali Wikibooks sitename'
  • 14:39 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'enabling uploads on vewikimedia'
  • 14:34 logmsgbot: robh ran sync-common-all
  • 14:31 logmsgbot: robh ran sync-common-all
  • 13:53 mark: Fixed srv237
  • 12:42 mark: Fixed situation for home-mounted apaches in puppet
  • 11:57 mark: scap
  • 11:53 mark: Shut down coronelli, rose, maurus: these should all be decommissioned
  • 11:44 mark: Fixed up search node group
  • 11:31 mark: Made puppet manage ganglia on all core dbs
  • 11:16 mark: Installed puppet on adler as well ;)
  • 11:10 mark: Installed puppet on all core databases

October 5

  • 23:20 mark: Deployed net NTP client configuration on all application servers and all squids (via puppet)
  • 20:21 Andrew: Deploying LiquidThreads updates
  • 19:50 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'boring'
  • 18:58 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'load test'
  • 17:37 domas: fixed the slack at db10. what were sysadmins doing?!
  • 17:36 mark: Set up NTP server on linne, to replace zwinger

October 4

  • 18:35ish aaron: deployed r57261
  • 15:31 domas: implemented a mutex for update-special-pages jobs

October 3

  • 13:29 hcatlin: cleared out cache and logs on mobile1

October 2

  • 20:36 logmsgbot: jeluf synchronized php-1.5/wmf-config/db.php 'adding db22 back to the pool'
  • 20:34 logmsgbot: jeluf synchronized php-1.5/wmf-config/db.php 'take db22 out of the pool for a few minutes'
  • 20:26 JeLuF: killed two long running sql queries on db22
  • 17:19 atglenn: updated puppet files: don't manage homedirs on test.wikipedia
  • 14:57 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/pages/NewUserMessagesView.php
  • 14:57 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/pages/SpecialNewMessages.php

October 1

  • 20:33 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20926'
  • 19:13 Andrew: Handy hint: To recache extension messages, you have to scap with the extension in extension-list *before* you turn it on on aawiki. Otherwise the require_once() fails because it's already been included.
  • 19:12 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php
  • 19:10 Andrew: Fixing localisation NavigableTOC bug
  • 18:32 logmsgbot: brion synchronized php-1.5/wmf-config/extension-list
  • 16:03 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Merge r57230'
  • 15:53 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Fix infinite loop, deploy r57227'
  • 15:18 mark: Made www.nl.wikimedia.org work as a redirect
  • 15:18 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Add missing / :))'
  • 15:13 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Deploy r57225'
  • 14:47 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/DeletionController.php 'Deploy r57224.'
  • 14:23 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Deploy r57220'
  • 14:23 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/DeletionController.php 'Deploy r57220'
  • 14:22 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Deploy r57220'
  • 13:00 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'r57215'
  • 12:55 logmsgbot: andrew synchronized php-1.5/includes/db/Database.php
  • 12:47 domas: andrew keeps failing!!!! :-)
  • 12:45 logmsgbot: andrew synchronized php-1.5/includes/db/Database.php
  • 12:42 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php
  • 12:40 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php
  • 12:40 logmsgbot: andrew synchronized php-1.5/includes/db/Database.php
  • 12:38 logmsgbot: andrew synchronized php-1.5/includes/db/Database.php
  • 10:59 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/pages/NewUserMessagesView.php 'Deploy 57207'
  • 10:58 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Deploy 57207'
  • 10:29 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/pages/TalkpageView.php 'Deploy 57202'
  • 10:28 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Hooks.php 'Deploy 57202'
  • 10:20 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php 'Deploy r57200'
  • 10:19 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Deploy r57200'
  • 09:26 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Thread.php
  • 09:04 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/NewMessagesController.php 'Deploy r57193'
  • 08:51 domas: mod_rewrite's shm-based mutex consumed all shm semaphore array slots on multiple apache servers, causing them all to die
  • 08:23 Andrew: Scapping again with fixed scap script.
  • 08:13 Andrew: scap script invocation of mergeMessageFileList.php was broken, was trying to invoke mergeMessageList.php (which doesn't exist). Fixed.
  • 07:52 Andrew: Scapping to fix LiquidThreads missing messages. Solution was in a post to private-l by Tim, subject "[Private-l] Increased bandwidth usage on internal network since yesterday's scap", (add the new extension to $IP/wmf-config/extension-list)
  • 02:59 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.class.php 'deployed r57186'
  • 02:12 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.class.php 'deployed r57183'

September 30

  • 23:06 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Deploy r57169'
  • 21:35 logmsgbot: fvassard synchronized php-1.5/wmf-config/mc.php 'depooled srv123 and replaced it with srv214'
  • 21:26 brion: memcached 10.0.2.123:11000 down
  • 21:23 atglenn: hard restart of apache on srv115, fred shot the localisation cache rebuilds also.. it's happier now
  • 21:12 brion: apche2ctl via dsh
  • 21:10 brion: apache-restart-all is still broken for puppet hosts
  • 20:59 tomaszf: running static html dump on wikitech.linode
  • 20:50 Andrew: Uh, make that [andrew@zwinger ~]$ FANOUT=16 dsh -f -N mediawiki-installation php /apache/common/wmf-deployment/maintenance/rebuildLocalisationCache.php --force --threads=8
  • 20:50 Andrew: Running to fix LiquidThreads localisation issue: srv167: bash: /apache/common/wmf-deployment/maintenance/rebuildLocalisationCache.php: Permission denied
  • 19:48 Andrew: Scapping to deploy updated LiquidThreads localisations
  • 19:43 Andrew: Running l10nupdate on zwinger to update for LiquidThreads localisation.
  • 19:37 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/pages/TalkpageView.php 'Deploy r57145'
  • 19:36 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/pages/ThreadPermalinkView.php 'Deploy r57145'
  • 19:31 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Turning on LQT on liquidthreads.labs.wikimedia.org'
  • 19:21 logmsgbot: andrew ran sync-common-all
  • 19:18 Andrew: scapping to create liquidthreads_labswikimedia
  • 19:10 Andrew: Setting up LiquidThreads labs wiki liquidthreads.labs.wikimedia.org
  • 18:58 logmsgbot: aaron synchronized php-1.5/includes/specials/SpecialRevisiondelete.php 'deployed r57131'
  • 17:27 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'setting duallicense on strategyappswiki'
  • 16:27 Fred: restarting pdns on Lilly (too many defunct processes)
  • 16:10 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedArticle.php 'deployed r57112'
  • 15:32 Rob: fixed the issue regarding centralauth, first spotted by inablility to create new accounts by email
  • 15:32 logmsgbot: robh ran sync-common-all
  • 14:30 Rob: had a duplicate 'strategyappwiki' database for the strategyappswiki by mistake, backed it up and dropped it, as strategyappSwiki is the correct database already in place.
  • 13:45 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Duplicate namespace id 110 for Participants/Participans_talk, only search the former'
  • 13:41 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable default searching of Participants namespaces on strategyappswiki'

September 29

  • 21:33 tomaszf: added ActiveAbstract to wmf-deployment and ran sync-common on srv225. xml snapshots are now generating abstracts properly
  • 19:34 logmsgbot: fvassard ran sync-common-all
  • 19:09 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'enable MWReleases on mediawiki.org'
  • 19:09 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 19:06 brion: cli scripts now loading AdminSettings.php again. Ideally should be restructured a bit
  • 19:05 logmsgbot: brion synchronized php-1.5/wmf-config/PrivateSettings.php
  • 18:28 logmsgbot: fvassard synchronized php-1.5/wmf-config/InitialiseSettings.php 'Adding config parameters for fi.wikimedia.org'
  • 18:28 logmsgbot: fvassard synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 18:14 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'enable UserDailyContribs'
  • 17:47 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'db25 back'
  • 17:29 logmsgbot: brion synchronized php-1.5/includes/specials/SpecialUserrights.php 'regression fix'
  • 17:09 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deployed r57062'
  • 16:55 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/UnreviewedPages_body.php 'deployed r57058'
  • 16:54 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/OldReviewedPages_body.php 'deployed r57058'
  • 16:18 Fred: restarted apache on srv114-168. It appears the servers went OOM
  • 14:29 mark: reloading asw-a5-sdtpa
  • 14:16 mark: reloading asw-a5-sdtpa
  • 14:04 mark: asw-a5-sdtpa switch reload went wrong, was asking for some license PROM. Reloaded previous software
  • 13:40 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'wait for db25 resync'
  • 13:38 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'wait for db25 resync'
  • 13:18 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php 'enabling a few other items on flaggedrevstestwiki weeee'
  • 13:11 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php 'enabling deleterevision for sysops on flaggedrevslabswiki'

September 28

  • 23:38 brion: UserDailyContribs needs fixing for deployment on mysql 4.0 :( easy fix :)
  • 23:35 brion: testing UserDailyContribs on test
  • 22:55 brion: setting up UserDailyContribs tables...
  • 21:43 atglenn: moved default kernel to the last entry in grub on ms6, see if that fixes the "boot in failsafe mode cause we feel like it" problem.
  • 21:41 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/maintenance/updateStats.inc 'deploy 57027'
  • 21:14 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/maintenance/updateStats.inc 'deployed r56315'
  • 20:39 domas: mobile1 was by the way serving only 404s, service was down
  • 20:32 domas: mobile1 was out of disk space, fixed/cleaned up, disabled access log, etc. it is really really sad when services are in such shape...

September 27

  • 16:07 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deployed r56982'
  • 16:07 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.class.php 'deployed r56982'
  • 04:54 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.class.php 'deployed r56980'
  • 03:13 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/RevisionReview_body.php
  • 03:12 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedArticle.php
  • 03:12 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.class.php
  • 03:12 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deploy r56890, r56892, r56972'
  • 00:57 logmsgbot: midom synchronized php-1.5/includes/specials/SpecialMostlinked.php
  • 00:35 logmsgbot: midom synchronized php-1.5/includes/specials/SpecialMostlinked.php

September 26

  • 23:51 domas: db3 just... rebooted?
  • 21:35 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deploy r56958, r56959'

September 25

  • 22:03 Fred: double-escaped command for image_scaler cleanup in puppet.
  • 17:35 Fred: hard drive sdd on sq27 has gone bad, causing issues with 2nd Tier squid.
  • 17:16 Fred: puppetized cron job on the appserver cluster for LocalisationUpdate.
  • 15:16 logmsgbot: tstarling synchronized php-1.5/includes/Title.php 'r56919'
  • 15:05 Tim: automated the task, ran fixCleanupTitles/revertCleanupTitles.php to revert another 558 page moves
  • 13:19 Tim: repairing some more cleanupTitles.php damage from yesterday, using manual (text processed) queries to revert followed by namespaceDupes.php to fix

September 24

  • 23:38 brion: restarting rebuildLocalisationCache job @ 8 threads per box / 16 boxes at a time, with --force
  • 23:29 logmsgbot: brion synchronized php-1.5/extensions/LocalisationUpdate/LocalisationUpdate.class.php
  • 23:27 brion: running rebuildLocalisationCache on all boxen, dsh 8 at a time to avoid killing things
  • 23:05 brion: tested rebuildLocalisationCache on srv235
  • 23:02 logmsgbot: brion synchronized php-1.5/maintenance/Maintenance.php 'fix for maint scritps memory_limit'
  • 22:39 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'try general rollout of LU, see what happens'
  • 22:20 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'enabled LU on incubator, wikinews/books/tionary/versity/source'
  • 21:49 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'removing stub LUDependency def'
  • 21:39 atglenn: upgraded ilom firmware on ms6, rebooted again, updated racktables with sn and ip addrs, back to normal now
  • 19:04 logmsgbot: brion synchronized php-1.5/extensions/LocalisationUpdate/update.php 'rem mem limit'
  • 18:57 brion: attempting a run of LocalisationUpdate updater script...
  • 18:56 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'LU on on test & aa'
  • 18:55 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'update LU config'
  • 18:29 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19707 Set up a logo for mhr.wikipedia'
  • 18:25 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20552 Add sources and enable transwiki import for pt.wikibooks'
  • 17:17 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 15829 Better Malayalam translation for Namespace'
  • 16:55 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20761 Set Portal namespace alias to NS_Portal in Bengali wikipedia'
  • 16:51 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20271 Give bureaucrats the ability to remove sysop rights at simplewiki'
  • 16:45 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20513 enable email on notify for watchlist on strategywiki'
  • 16:38 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20367 Set WP namespace alias to NS_PROJECT in Bengali wikipedia'
  • 08:05 logmsgbot: midom synchronized php-1.5/extensions/ConfirmEdit/FancyCaptcha.class.php
  • 05:42 Tim: [[Talk:Portal:...]] pages on bnwiki were corrupted due to half-completed namespace change (bug 20314) and cleanupTitles.php run. Fixed and deployed r56868.
  • 04:30 Tim: running cleanupTitles.php on all wikis
  • 04:25 Tim: running cleanupTitles.php on frwikinews
  • 04:23 Tim: updated maintenance directory to r56864.
  • 03:59 logmsgbot: aaron synchronized php-1.5/wmf-config/CommonSettings.php
  • 03:54 logmsgbot: aaron synchronized php-1.5/wmf-config/CommonSettings.php 'added feedback size'
  • 03:48 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'set feedback size default'
  • 03:45 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'cleaned up feedback vars a bit'
  • 03:41 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'cleaned up feedback vars a bit'

September 23

  • 22:11 tomaszf: copying zwinger ssh keys to fenari
  • 21:09 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'strategyappswiki readerfeedback threshhold set'
  • 19:39 Rob: slated erzurumi for internal use as it was only running pdf export, which now exists on pdf1
  • 19:36 Rob: updated dns for erzurumi internal ip
  • 16:35 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'reverted hack'
  • 16:30 logmsgbot: fvassard synchronized php-1.5/wmf-config/InitialiseSettings.php 'Created aliases for RUWIKINEWS for Project and Portal.'
  • 16:24 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'zhwikisource test'
  • 16:15 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'Reducing $wgFeedDiffCutoff due to reports of OOM'
  • 10:18 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'adding ixia'
  • 00:59 logmsgbot: aaron synchronized php-1.5/extensions/ReaderFeedback/specialpages/RatedPages_body.php 'deployed r56490, r56496'

September 22

  • 23:37 logmsgbot: brion synchronized php-1.5/includes/specials/SpecialUpload.php 'testing second fix to commonist upload bug'
  • 22:59 logmsgbot: brion synchronized php-1.5/includes/specials/SpecialUpload.php 'testing possible upload bot workaround'
  • 22:11 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 21:39 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'disable l10nupdate on test as well'
  • 21:36 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'emergency hack -- empty LUDependency class'
  • 21:34 brion: hitting rebuildLocalisationCache
  • 21:33 brion: LocalisationUpdate pollutes the localisation cache and breaks when you disable it
  • 21:31 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'LU back off to confirm cpu usage bug'
  • 21:20 brion: attempting to run updates on LocalisationUpdate
  • 21:18 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'enabling LocalisationUpdate pre-update; make sure it dont explode'
  • 21:01 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'l10nupdate settings'
  • 21:00 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'l10nupdate settings'
  • 20:50 brion: setting up l10nwiki databases on each DB cluster to hold LocalisationUpdate tables
  • 20:30 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'testing config var for db stuff'
  • 19:58 apergos: rebooting ms6, try to unstick ipmi interface
  • 18:23 brion: running LocalisationUpdate from current trunk for testwiki
  • 18:18 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'Testing LocalisationUpdate on test.wikipedia again...'
  • 18:15 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'thanks brion'
  • 18:06 brion: ok, bad proofread queries killed on most/all s3 boxen
  • 17:53 logmsgbot: brion synchronized php-1.5/extensions/ProofreadPage/ProofreadPage.php
  • 17:37 brion: removing db query fix for ProofreadPage; runs too slow. may need version revert
  • 17:33 logmsgbot: brion synchronized php-1.5/extensions/ProofreadPage/ProofreadPage.php 'testing db query fix, fixed'
  • 17:32 logmsgbot: brion synchronized php-1.5/extensions/ProofreadPage/ProofreadPage.php 'testing db query fix'
  • 17:31 brion: we found a buglet in ProofreadPage (ambiguous join in query), should be resolved shortly
  • 16:46 brion: setting up pr_index tables for ProofreadPage update
  • 16:44 brion: noting that DB admin credentials aren't currently configured right on live deployment. (AdminSettings needs migration?)
  • 16:42 logmsgbot: brion synchronized php-1.5/maintenance/sql.php
  • 12:48 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out ixia for hardy upgrade'
  • 10:27 mark: Switched back traffic, so far so good
  • 08:41 mark: Capacity test, DNS scenario knams-down
  • 06:48 mark: Redistributed LVS weights of upload squids
  • 06:39 mark: Redistributed LVS weights of text squids
  • 00:39 logmsgbot: brion synchronized php-1.5/includes/specials/SpecialMovepage.php 'testing fix for image redir bug'

September 21

  • 21:20 logmsgbot: andrew synchronized php-1.5/includes/upload/UploadBase.php 'Deploy r56734'
  • 21:02 Andrew: Deployed r56631 on test, waiting for test to sync it
  • 19:44 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'removed deleterevision ability on sysops on simplewiki due to issues listed in 20186 and 18780'
  • 19:40 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'removed deleterevision ability on sysops due to issues listed in 20186 and 18780'
  • 19:04 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'wgAllowImageMoving on by default'
  • 18:57 logmsgbot: andrew synchronized php-1.5/extensions/CodeReview/ui/CodeRevisionView.php 'Deploy r56728, blocking/codereview interaction bugfix'
  • 18:36 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '20632 Activate collection extension on itwiki'
  • 18:34 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19819 Activate revisiondelete for sysops on simple wiki'
  • 18:32 Fred: srv217 disabled in LVS since it is unstable.
  • 18:06 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 19697 Activate RevisionDelete for sysops on dewiki'
  • 17:47 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20331 Enable RevisionDelete on Polish Wikipedia'
  • 17:38 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'updates for bug 20314 adding portal namespace to bnwiki, 20322 setting local time on vecwiki, and 20324 setting some namespacealiases on vecwiki'
  • 16:48 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20505 added namespaces to trwikinews'
  • 16:44 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'upated logo for ptwiki per bug 20659'
  • 16:35 Rob: thats arwikiquote, hit enter too soon.
  • 16:35 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 19060 logochange on arwiki'
  • 16:33 Rob: spoke too soon, forgot to sync some files, restarting apache proceesess once more
  • 16:33 Rob: Fred found a number of apaches not in node lists, added them back in, then i ran updates so hopefully strategyappswiki will work properly without redirections
  • 16:21 logmsgbot: andrew synchronized php-1.5/maintenance/cleanupTable.inc
  • 16:20 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20334 corrrection'
  • 16:18 mark: Installed puppet on sq16-40, which increases maximum FDs from 32k to 64k
  • 16:15 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20527 Adding Project namespace in eowiki'
  • 16:12 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'bug 20291 disable obsolete MakeSysop and MakeBot extesions'
  • 16:06 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20334 namespace additions and aliases for viwikisource'
  • 15:57 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'adding ixia again'
  • 15:47 Rob: done resyncing that file over and over for this moment, stratappswiki has proper settings now
  • 15:47 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding all three had an issue for stratappswiki, adding htem back one at a time'
  • 15:45 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding all three had an issue for stratappswiki, adding htem back one at a time'
  • 15:44 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding all three had an issue for stratappswiki, adding htem back one at a time'
  • 15:43 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding three new readerfeedback rating options for strategyappswiki'
  • 15:42 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding three new readerfeedback rating options for strategyappswiki'
  • 11:36 Andrew: cleanupTitles.php stopped because I installed new internet, rerunning, this time in a screen on zwinger.
  • 10:15 Andrew: Running cleanupTitles.php on all wikis
  • 07:06 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'removing ixia, it got lagged as wasnt part of lag tests'
  • 02:58 tomaszf: decreasing retention rate on storage2 to 10 instead of 15 for snapshots

September 20

  • 20:44 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 're-adding ixia'
  • 16:13 domas: apparently LVM doesn't like snapshot volumes getting full - either wrong policy set, or a bug
  • 16:08 domas: LVM snapshots on ixia deadlocked i/o: "[5284280.064801] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception " - meminfo: http://p.defau.lt/?eytJOh1FNy_Fnf9dwyQyjg
  • 16:06 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 01:49 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'removing feasibility and impact readerfeedback categories from strategyappswiki'

September 19

  • 16:29 logmsgbot: aaron synchronized php-1.5/includes/filerepo/ArchivedFile.php 'deployed r56641'
  • 15:00 domas: saper is doing package installs on ptolemy, consult apt logs for details
  • 13:57 mark: Installed puppet on sq16
  • 13:56 mark: Fixed puppet site.pp
  • 13:42 hcatlin: mobile stats redeployed and page-error fixed: http://stats.m.wikipedia.org/
  • 12:29 hcatlin: reconfigured nginx on mobile1 to use the new cap sources as the static file source instead of the deprecated /srv/wikimedia folder
  • 12:20 hcatlin: pushing up a fix to mobile1 to correct mw-headline changes made to skin.php that broke sectioning on mobile

September 18

  • 17:57 logmsgbot: fvassard synchronized php-1.5/wmf-config/InitialiseSettings.php 'updated logo location for specieswiki.'
  • 17:32 Rob: replaced bad hdd/3 in sq43
  • 17:32 Rob: replaced bad hdd in db18
  • 17:32 Rob: replaced bad hdd in srv148
  • 17:32 Rob: replaced bad powersupply on srv 245 powersupply
  • 13:57 logmsgbot: andrew synchronized php-1.5/includes/HTMLForm.php 'Merge r56184, bugfix for preferences regression caused by new Html class with different parameter ordering.'
  • 11:42 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'adding reinstalled db5'
  • 10:39 Andrew: (UsabilityInitiative toolbar brokenness). Also, turning the toolbar back on.
  • 10:39 Andrew: Scapping to deploy r56592.
  • 10:02 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'ixia readded, copy is done from snapshot anyway hehe'
  • 09:26 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'taking out ixia for db5 copy'
  • 09:11 domas: nuked db5 disk array, rebuilding system with raid10
  • 08:56 domas: (threshold for autocommit is 30 minutes)
  • 08:54 domas: wmf-config autocommit is in midom@fenari crontab
  • 05:30 logmsgbot: tstarling synchronized php-1.5/includes/LocalisationCache.php 'added profiling sections for isExpired and initLanguage'
  • 05:12 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'fixed /tmp/mediawiki references'
  • 05:10 Tim: removed the old /tmp/mediawiki cache directories, were using about 1.4 GB per server
  • 05:08 Tim: remerged the l10n cache directories back into a single directory, /tmp/mw-cache. Fixed lack of local message cache (probably caused huge memcached traffic).
  • 05:06 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 05:06 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php
  • 01:50 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/api/ApiReview.php 'deployed r56576'
  • 01:49 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/RevisionReview_body.php 'deployed r56576'

September 17

  • 21:37 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'Disabling new edit toolbar until problems resolved'
  • 20:58 brion: removing software update sitenotice; most problems cleared out by now
  • 20:58 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 20:21 logmsgbot: aaron synchronized php-1.5/includes/LogPage.php 'deployed r56564'
  • 20:07 logmsgbot: andrew synchronized php-1.5/includes/upload/UploadFromStash.php
  • 19:17 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.php 'deployed r56553'
  • 19:16 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.class.php 'deployed r56553'
  • 19:16 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedRevs.hooks.php 'deployed r56553'
  • 17:35 logmsgbot: aaron synchronized php-1.5/extensions/ReaderFeedback/specialpages/RatingHistory_body.php 'deploy r56545'
  • 17:24 logmsgbot: brion synchronized php-1.5/includes/specials/SpecialSearch.php 'fix ordering of search creation link'
  • 17:08 mark: Increased uplink capacity of asw-a5-sdtpa to csw1-sdtpa to 2x 1 Gbps trunk
  • 15:38 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/FlaggedArticle.php 'deployed 56529'
  • 15:29 logmsgbot: andrew synchronized php-1.5/includes/Skin.php '-m Deploying r56523, fix for hidden category issue'
  • 15:12 logmsgbot: andrew synchronized php-1.5/includes/Title.php 'Deploy r56521, fix for user css/js permissions check'
  • 15:02 logmsgbot: andrew synchronized php-1.5/languages/LanguageConverter.php 'Deploy r56520, fix for unconverted text causing blank pages'
  • 15:02 logmsgbot: aaron synchronized php-1.5/languages/LanguageConverter.php 'NOCC bug fig'
  • 14:36 Andrew: also deploying r56517, tooltip fixes
  • 14:35 Andrew: Scapping to deploy r56515 and r56516, fixes to CentralAuth global groups broken by a userrights API module.
  • 14:07 logmsgbot: andrew synchronized php-1.5/includes/EditPage.php 'Merge r56478, which we forgot to deploy'
  • 14:02 logmsgbot: andrew synchronized php-1.5/includes/parser/Parser.php
  • 13:01 logmsgbot: andrew synchronized php-1.5/extensions/CentralAuth/CentralAuthHooks.php 'Deploy r56509, regression spewing exceptions in CentralAuth'
  • 12:32 domas: db5 RAID has issues (1s await to start with)
  • 12:32 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 12:02 domas: mark confirmed network issue!!!!! I am happy!!!!!
  • 09:55 logmsgbot: andrew synchronized php-1.5/extensions/AbuseFilter/AbuseFilter.hooks.php 'Deploy r56500, fix for AbuseFilter regression'
  • 09:37 logmsgbot: andrew synchronized php-1.5/includes/Xml.php 'Deploy r56497'
  • 02:25 logmsgbot: aaron synchronized php-1.5/includes/specials/SpecialUserrights.php 'fix fatal - declare $wgOut'
  • 02:16 logmsgbot: aaron synchronized php-1.5/includes/templates/Userlogin.php
  • 02:08 logmsgbot: aaron synchronized php-1.5/includes/templates/Userlogin.php
  • 01:53 logmsgbot: aaron synchronized php-1.5/extensions/ReaderFeedback/specialpages/RatedPages_body.php 'Fixed usage of wrong messages'
  • 01:50 logmsgbot: aaron synchronized php-1.5/extensions/ReaderFeedback/specialpages/RatingHistory_body.php 'Fixed usage of wrong messages'
  • 01:45 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/ProblemChanges_body.php 'query tweaks'
  • 01:25 logmsgbot: aaron synchronized php-1.5/extensions/FlaggedRevs/specialpages/ProblemChanges_body.php 'fix index name'
  • 01:24 logmsgbot: brion synchronized php-1.5/extensions/FlaggedRevs/specialpages/ProblemChanges_body.php
  • 01:19 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 01:18 logmsgbot: brion synchronized php-1.5/includes/api/ApiUpload.php
  • 01:08 logmsgbot: brion synchronized php-1.5/extensions/FlaggedRevs/specialpages/ProblemChanges_body.php 'index fix'
  • 01:08 logmsgbot: brion synchronized php-1.5/includes/Linker.php 'fix HTML validity ("Array" in portal output)'
  • 01:01 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'wgHtml5 off for now to reduce new vars'
  • 01:01 logmsgbot: andrew synchronized php-1.5/includes/specials/SpecialContributions.php 'Deploy r56469'
  • 01:00 logmsgbot: andrew synchronized php-1.5/includes/User.php 'Deploying r56473'
  • 00:49 logmsgbot: brion synchronized php-1.5/includes/parser/Parser.php 'fix missing page problem'
  • 00:39 logmsgbot: brion synchronized php-1.5/includes/specials/SpecialContributions.php
  • 00:28 atglenn: tossed old oprofiled.log from srv187, back to 3gb free now
  • 00:25 logmsgbot: brion synchronized php-1.5/languages/LanguageConverter.php
  • 00:22 atglenn: apt-get clean on srv187, got back .5gb, / was full
  • 00:10 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'update sitenotice'

September 16

  • 23:53 logmsgbot: brion synchronized php-1.5/languages/classes/LanguageZh.php
  • 23:50 logmsgbot: brion synchronized php-1.5/languages/classes/LanguageZh.php
  • 23:41 logmsgbot: midom synchronized php-1.5/includes/LocalisationCache.php 'profiling'
  • 23:40 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'tweaking wgCachedirectory to /tmp/mediawiki/$site-$lang'
  • 23:32 brion: running a manual apache2ctl restart batch as apache-restart scripts are broken on many hosts (sudo)
  • 23:32 domas: where is my attribution :-)
  • 23:30 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'setting wgCacheDirectory to /tmp/mediawiki'
  • 23:28 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding wgCacheDirectory'
  • 23:19 brion: starting scap to wmf-deployment r56456
  • 23:07 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'updating with sitenotice for soft updates'
  • 22:03 brion: running cleanupTitles on testwiki
  • 21:53 brion: ok fixed the readerfeedback config i think
  • 21:08 brion: svn up'ing -- do not scap until things are confirmed working
  • 20:22 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'stratwiki api final fix i hope'
  • 20:21 logmsgbot: robh synchronized php-1.5/wmf-config/CommonSettings.php 'removing private wiki overrides for api use'
  • 20:03 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'stratwiki api tinkering'
  • 19:57 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'enabling writeapi on stratappswiki'
  • 18:14 domas: set up 5xx logging (without upload and old query interface) at locke:/a/squid/5xx.log
  • 17:59 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 17:57 aZaFred_: sapshot[1..3] have been puppetized
  • 17:56 brion: mediawiki-installation group troubles have been worked out. thx rob & fred!
  • 17:56 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'updating w/ config preps for code update'
  • 17:39 brion: sync system is currently broken. bogus digits (9, 7, 6, 8) and not-quite-set-up snapshot* machines in mediawiki-installation group
  • 17:10 Rob: removed some outdated security plugins on blogs, updated some others
  • 15:51 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 15:44 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Updating logo for stratappswiki'
  • 14:54 mark: Moving traffic back to Europe - florida squids overloaded
  • 14:39 mark: Capacity test, DNS scenario knams-down
  • 14:37 Rob: moved masters from db13 to db15 with some major assistance (basically did it himself ;) from tim
  • 14:34 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php
  • 14:22 Rob: script had some issues, Tim is debugging
  • 14:22 Rob: yep, switching masters because db13 raid battery is dead.
  • 14:20 logmsgbot: robh synchronized php-1.5/wmf-config/db.php 'switching masters from db13 to db15'
  • 14:18 logmsgbot: robh synchronized php-1.5/wmf-config/db.php

September 15

  • 23:13 brion: applying patch-log_user_text.sql to newly created wikis: mhrwiki strategywiki uawikimedia cowikimedia ckbwiki pnbwiki mwlwiki acewiki trwikinews flaggedrevs_labswikimedia readerfeedback_labswikimedia strategyappswiki strategyappwiki
  • 23:10 brion: adding stub l10n_cache table to all wikis
  • 23:02 brion: checking to confirm log_page/log_user_text update is applied on all wikis
  • 23:01 tomaszf: installed memcache on sage.knams
  • 21:31 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'changing settings for readerfeedback on stratapps'
  • 21:12 atglenn: /home nfs mounted and added to fstab on srv124, new test.wikipedia
  • 21:09 domas: where is my attribution ;-D
  • 21:08 Rob: test.wikipedia.org fixed by mounting nfs
  • 20:55 Rob: setup new private wiki, added to dns as well as configuration files
  • 20:38 logmsgbot: robh ran sync-common-all
  • 19:58 Rob: servers running wipe were burdening the logging host. added drop rules to iptables on db20 to refuse those servers access since ssh doesnt work with wipe destorying things
  • 19:24 Rob: depooled srv124 to use as test.wikipedia.org, then updated squid config and pushed
  • 19:09 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'oh well'
  • 17:14 Rob: db12 is back online with mysql running
  • 16:41 Rob: installed python-pyfribidi on pdf1
  • 16:06 aZaFred_: snapshot[1..3] wikimedia-task-appserver install completed. Added hosts to dsh nodegroup for mediawiki-installation so common updates get pushed to them
  • 16:04 Rob: srv245 bad powersupply, swapped with on site spare
  • 15:59 Rob: rebooting srv245 to fix its drac access
  • 15:33 Rob: some odd invalid entries in dsh nodes, removed.
  • 15:24 Rob: running wipe on srv35, srv52, srv54, srv56
  • 15:20 Rob: srv66 running wipe
  • 15:12 mark: Started MySQL on ms2 and restarted replication
  • 15:11 mark: Redistributed the spare drives on ms2 back into the spare pool (/dev/md1)
  • 15:08 Rob: shutting down db12 for raid battery swap
  • 15:04 mark: Swapped drive c3t6d0 in ms2, readded it to /dev/md14 and moved back the spare /dev/sdao into the spare pool (/dev/md1)
  • 14:42 mark: Shutting down MySQL on ms2
  • 14:33 Rob: removed a number of decommissioned servers from nagios
  • 14:30 Rob: wipe running on srv44, srv45, srv47
  • 14:26 Rob: srv31, srv32, srv33 running wipe in screen sessions, do not try to use them ;]
  • 14:22 Rob: srv30=srv80 to be decommissioned, wiping the drives with them in rack now. mark already depooled from apache and memcached
  • 14:19 Rob: srv145 is back up and ok
  • 14:07 Rob: srv145 coming back up, my bad
  • 13:53 Rob: srv52 hdd died.
  • 13:51 domas: cron jobs work way better, if one figures how to set permissions right (like, executable? :)
  • 00:03 atglenn: grrr.. that would be ms4.
  • 00:03 atglenn: testing mailiferr (workaround for no MAILTO on solaris) on ms5 for hourly snaps

September 14

  • 23:26 atglenn: rerunning the rsync list of changed files again on ms6 (last run was borked). it's in screen as root
  • 23:23 aZaFred_: added nagios/ganglia monitoring for snapshot[1..3]
  • 22:32 atglenn: created a directory /export/upload/wikipedia/common/thumb with no write perms on ms1 so that static html dump ext doesn't try to write in it
  • 21:39 aZaFred_: upgrading Ubuntu on Sage to 8.04 LTS
  • 21:12 tomaszf: running timings for en static html snapshot from hume
  • 19:50 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 19:47 Rob: left out the virtual host, resyncing and restarting apaches
  • 19:44 logmsgbot: robh ran sync-common-all
  • 18:21 Rob: added volunteer.wikimedia.org to dns and pushed authdns-update
  • 18:03 Rob: added flagrevs stats update to hume crontab
  • 16:24 brion: restarted apache on wikitech

September 13

  • 21:03 mark: Converted aufs cache dir into COSS on eiximenis
  • 20:41 mark: Increased cache dir object size split from 512kb to 1MB
  • 17:45 domas: readded srv211 to pybal apaches list, either someone removed it by mistake, or didn't document the reason of removal
  • 16:57 domas: rolling out php5 5.2.4-2ubuntu5.7wm1 and php5-apc 3.1.3p1-1wm1 live on all apaches
  • 15:32 domas: rolling out limited php 5.2.4-2ubuntu5.7wm1 testing
  • 15:07 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'show draft by default on iawiki'
  • 08:58 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'Set $wgFlaggedRevsAutoconfirm on dewiki'

September 12

  • 11:08 _3^3: srv230-srv250 are running now APC-3.1.3p1-1wm1
  • 08:53 _3^3: APC 3.1 is ~3-5% faster than 3.0 under load. Will roll out limited deployment for longer-term stability tests.
  • 08:29 mark: Rerouted pmtpa->esams traffic via hgtn-leaseweb

September 11

  • 19:14 domas: xcache was both slower and less stable (got corrupted cache within seconds, can we blame that on -O3? :))
  • 18:56 domas: (xcache is configured to have ttl of 600 seconds on all php objects.. ;-)
  • 18:55 domas: installed php5-xcache on srv250 (manual comment in apc.ini though...)
  • 18:39 logmsgbot: fvassard synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enabling Collection on cswiki per Bug 20436'
  • 06:25 domas: added api.php to wpstatic acl

September 10

  • 21:36 tomaszf: starting [a-z]* sync of commons images from ms5->belgrade
  • 19:30 tomaszf: starting [0-9]* sync of commons images from ms5->belgrade

September 9

  • 20:16 domas: unmounted /mnt/scratch on amane, /dev/sdb is gone long time ago.
  • 19:05 aZaFred_: modified puppetized cron job on the scalers to escape parenthesis in the find command
  • 18:14 domas: restarted bunch of evil apaches
  • 10:04 domas: noticed hawthorn flapping at http://torrus.wikimedia.org/torrus/Network?token=T0484
  • 05:00 Tim: restarted apache on srv108, APC cache corruption

September 8

  • 18:25 atglenn: made sure all scalers have /a/magick-tmp and are using it
  • 15:49 logmsgbot: midom synchronized php-1.5/includes/db/Database.php 'removing the verbose comment'
  • 12:25 logmsgbot: midom synchronized php-1.5/includes/db/Database.php 'verbosity hook yet again'
  • 10:42 logmsgbot: midom synchronized php-1.5/includes/Linker.php 'mft:55983 wmf:56025'
  • 10:32 logmsgbot: midom synchronized php-1.5/extensions/MWSearch/MWSearch_body.php 'wmf56024, trunk55392'
  • 10:21 logmsgbot: midom synchronized php-1.5/includes/db/Database.php
  • 10:09 logmsgbot: midom synchronized php-1.5/includes/db/Database.php 'sql comment hack'

September 7

  • 15:10 mark: Set AS 16265 weight to 100 on csw1-esams to balance traffic better
  • 14:27 mark: Restarted apache on srv171
  • 14:26 logmsgbot: mark synchronized php-1.5/wmf-config/mc.php 'Replace sub-srv81 nodes by spares'
  • 14:17 mark: Depooled all apaches below srv81 in LVS
  • 13:55 mark: Prepared srv43-srv47 for decommissioning
  • 13:51 mark: Converted srv100 to /home-less puppet image scaler
  • 13:44 mark: Depooled srv43-srv47 as image scalers in LVS
  • 13:32 mark: Converted srv221-srv224 to /home-less with puppet
  • 13:23 mark: Converted srv220 to /home-less with puppet
  • 13:14 mark: Installed cron script to remove gs temp files on srv219, using puppet
  • 13:14 mark: Converted srv219 to /home-less with puppet
  • 12:34 mark: Fixed ganglia configuration in puppet

September 6

  • 21:33 mark: Puppetised srv171-180
  • 21:17 mark: Puppetised srv161-170
  • 21:12 mark: Puppetised srv151-160
  • 20:54 mark: Restarted all job runners to use the local mediawiki copy instead of /home
  • 20:53 logmsgbot: mark synchronized php-1.5/maintenance/jobs-loop.sh 'Update path to use wmf-deployment branch'
  • 20:41 mark: Puppetised srv211-218
  • 20:38 mark: Puppetised srv201-srv210
  • 20:34 mark: Puppetised srv191-srv200
  • 20:30 mark: Puppetised srv181-srv190
  • 20:22 mark: Not puppetising srv151-160 because of job queue runners
  • 20:15 mark: Puppetised srv141-srv150
  • 20:11 mark: Puppetised srv131-srv140
  • 20:07 mark: Puppetised srv121-srv130
  • 20:02 mark: Puppetised srv111-srv120
  • 19:57 mark: Puppetised srv103-srv110
  • 19:46 mark: Puppetised srv102
  • 19:45 mark: Puppetised srv101
  • 19:13 mark: Converted eiximenis into a text squid
  • 16:38 logmsgbot: mark synchronized php-1.5/wmf-config/CommonSettings.php 'Added eiximenis to the Squid list, and removed obsolete entries'
  • 01:05 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'rollback group for simplewiktionary'

September 5

  • 23:47 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'desirability rating on strategywiki'
  • 23:36 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'rollback group for simplewiktionary'
  • 21:10 domas: amane has I/O failures
  • 20:13 mark: Pooled application servers srv226-253
  • 17:41 domas: svn switch --relocate http://svn.wikimedia.org/ svn+ssh://svn.wikimedia.org/
  • 16:34 apergos: cleared out some stuff in /tmp on srv220-224 to free up more space. this is only a TEMPORARY FIX
  • 16:29 hcatlin: Setup autoclearing logs on mobile1
  • 15:46 logmsgbot: midom synchronized php-1.5/includes/GlobalFunctions.php 'removing extended profiling'
  • 15:31 apergos: apt-get clean on a few (srv220-224) hosts to buy us a tiny bit of disk space
  • 13:13 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'added Proposal NS to default search on strategywiki'
  • 13:00 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'fixed ruwikiquote config'
  • 12:59 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'fixed ruwikiquote config'
  • 12:32 mark: Ganglia tmpfs filesystem on zwinger ran out of memory, removed some obsolete RRDs in there, and in the on-disk copy
  • 11:18 mark: Restarted confused gmetad_pmtpa on zwinger
  • 11:15 mark: Rebooting all new apache servers
  • 11:15 mark: Updating Nagios with the new servers failed. Could someone please update the docs?
  • 11:06 mark: Running apt-get dist-upgrade on all new servers
  • 10:54 mark: scap
  • 10:53 mark: Added new servers to dsh node groups
  • 10:34 mark: Installed puppet on srv227-srv253, signed all certificates
  • 10:30 mark: OS-installed srv227-srv253, except broken srv245
  • 08:27 mark: Installed srv226 as mediawiki application server
  • 08:15 mark: Removed 29 GB of access.log in /usr/logs/ on mobile1 (why is it there and not logrotated?). Restarted thin, memcached. Test URL does not work, but the site itself does seem to work. Please check/update the docs!

September 4

  • 22:27 brion: rerunning parser tests for revs that had negatives to clear out the false failures
  • 22:09 brion: added 1gig swap file to wikitech to see if that helps let the parser tests complete
  • 17:22 brion: started rsyncd on searchidx1 per robert's note on broken updates
  • 03:24 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'fixed wiki name typo'
  • 01:30 atglenn: turned on automatic zfs replication from ms1 to ms5; copying over incrementals every ten minutes out of cron

September 3

  • 21:35 atglenn: huh guess I should log this, gmond installed on ms5 (from old ts repo spec, v3.1 can't coexist with v3.0x), fred started collecting stats on it (yesterday)
  • 17:48 atglenn: first incremental zfs rep from ms1 to ms5, still using cat on the receiving end, mbuffer + zfs recv is just too slow
  • 16:43 mark: Installed gmond on mobile1
  • 16:39 mark: Power cycled mobile1
  • 16:37 brion: mobile1 not responding to ping, needs a reboot
  • 02:34 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'removed PovWatch'
  • 02:12 rainman-sr: rsyncd died for some misterious reason on searchidx1, needs restarting

September 2

  • 22:01 atglenn: started zfs recv on ms5 from local copy of images data (local copy arrived complete yesterday pm)
  • 21:22 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'let bureaucrats remove Confirmed group'
  • 17:32 aZaFred_: fixed a couple problems with planet on bart (fr and zh causing the python script to crash and stop updating the other "planets"

September 1

  • 22:42 logmsgbot: aaron synchronized php-1.5/wmf-config/abusefilter.php 'log detail rights for autoconfirmed on dewiki'
  • 21:31 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'removed unused flood flag added to eswikinews; eswikibooks was the correct wiki'
  • 20:54 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'Re-sync for srv220'
  • 20:16 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php
  • 20:12 aaron: also renamed huwiki 'confirmed' group to 'trusted' (bug 19885)
  • 20:12 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'Cleaned-up redundant settings and a few instances of improper setting of autopromote array'
  • 18:39 Fred: Added Ganglia stats for /export usage on export2
  • 18:39 Fred: Added Ganglia stats for /export usage
  • 18:27 Fred: rebooted srv156 after it went OOM
  • 13:56 mark: Power cycled mayflower
  • 09:07 domas: lily pdns was dead probably because of my tampering with defunct processes
  • 02:27 river: restarted pdns_recursor on lily as it had died somehow

August 31

  • 23:32 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'rollback group for arwiki'
  • 23:24 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'autopatrol for rollbackers on svwiki'
  • 22:02 logmsgbot: aaron synchronized php-1.5/wmf-config/abusefilter.php 'log detail rights for autoconfirmed on zhwiki'
  • 21:55 logmsgbot: aaron synchronized php-1.5/wmf-config/abusefilter.php 'itwiki abusefilter settings'
  • 20:26 logmsgbot: aaron synchronized php-1.5/wmf-config/abusefilter.php 'itwiki abusefilter settings'
  • 20:08 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'rollback, autoreviewer, and confirmed groups for zh_yuewiki'
  • 19:49 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'flood flag for eswikibooks'
  • 12:18 domas: restarted srv183 via drac

August 30

  • 19:49 aZaFred_: rebooting srv159
  • 19:47 logmsgbot: fvassard synchronized php-1.5/wmf-config/mc.php 'srv159 is dead. replacing with srv218'
  • 19:44 aZaFred_: srv159 <-> srv218 in mc.php

August 28

  • 19:09 brion: raising upload file size limit on wikitech wiki so my presentation uploads :P
  • 17:41 logmsgbot: midom synchronized php-1.5/includes/OutputHandler.php 'per Tim and tests, set gz level to 6'
  • 17:39 domas: TimStarling was nearly right - compression level impact on mediawiki output: http://spreadsheets.google.com/pub?key=t-EjyzEfh0t39hoQbf-a5fw&output=html
  • 17:12 domas: comparison of failing vs working APC oprofiles: http://p.defau.lt/?1BWDb3VOalQ7d9zMaKhWcg
  • 17:11 domas: srv194 seems to be deciding not to use APC, somewhere at apc_cache_busy() or somewhere near - same problem seen few times before on other servers
  • 15:23 domas: compression that is, not encoding ;-)
  • 15:22 logmsgbot: midom synchronized php-1.5/includes/OutputHandler.php 'bumped up encoding from 3 to 9'
  • 15:12 Tim: added mobile1 to the tiertwo ACL so that it gets X-Cache headers
  • 15:09 logmsgbot: midom synchronized php-1.5/includes/AutoLoader.php 'removing profiling hook'
  • 14:50 logmsgbot: midom synchronized php-1.5/includes/OutputHandler.php 'ooooops'
  • 14:49 logmsgbot: midom synchronized php-1.5/includes/OutputHandler.php 'compression profiling hook'
  • 14:33 mark: Set up blocking of unnecessary http headers in squid

August 27

  • 20:50 brion: rerunning past commit parser tests as i deleted them all by mistake :P
  • 20:22 brion: re-disabling pdfhandler on wikitech, something ain't rendering right yet
  • 20:15 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 20:14 rainman_: move eswiki highlighting from search4 to search11 to balance out the load a bit better
  • 20:01 brion: installing PdfHandler on wikitech; udpating to current wmf-deployment
  • 19:52 brion: re-running parsertests for rev 55603 and above after updating php packages
  • 19:29 brion_: attempting to apt-get upgrade on wikitech, see if that helps crashy tests
  • 18:25 mark: Restarted squid frontend on eiximenis
  • 17:53 Tim: fixing missing blobs tables on ms3 cluster rc1
  • 17:44 Tim: running recompressTracked.sh on hume
  • 17:40 mark: Shutdown MySQL on ms2 for reboot
  • 17:28 mark: Replication on ms2 caught up with master ms3
  • 17:08 mark: Recovered data on ms2 by rebuilding raid-arrays, started mysql, started replication
  • 16:44 logmsgbot: midom synchronized php-1.5/wmf-config/InitialiseSettings.php 'remove optin for performance reasons'
  • 16:43 logmsgbot: midom synchronized php-1.5/wmf-config/InitialiseSettings.php 'remove optin for performance reasons'
  • 16:28 mark: Installed gmond on mayflower
  • 16:20 mark: Power cycled mayflower again
  • 16:06 mark: Recovering RAID arrays on ms2
  • 15:43 brion: updating to wmf-deployment r55629 -- fixes to api, (un)delete storage, and collection
  • 15:33 logmsgbot: tstarling synchronized php-1.5/includes/Revision.php 'r55628'
  • 15:26 logmsgbot: midom synchronized php-1.5/includes/AutoLoader.php 'profiling'
  • 15:10 domas: bunch of servers in srv190+ range have intermittent network problems, leading to connection failures, etc
  • 14:58 logmsgbot: tstarling synchronized php-1.5/includes/Revision.php 'testing archive issue pre-commit'
  • 10:03 mark: Power cycled mayflower (svn)

August 26

  • 21:56 Tim: observing overload on s4 slaves due to long-running ApiQueryBacklinks queries, will kill
  • 21:29 brion: update to r55616 -- fixlets and api limit tweak
  • 18:11 logmsgbot: midom synchronized php-1.5/includes/AutoLoader.php 'profiling hook'
  • 17:08 brion: rerunning parser tests for r55603; temporary failure on the test box

August 25

  • 15:52 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'mwahahaha'
  • 15:47 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 15:44 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'load testing, muhaha'
  • 15:40 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'putting storage1 to cluster22'
  • 14:21 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'enable PdfHandler sitewide; forgot to do this earlier :D'

August 24

  • 00:45 domas: CPU 100% on apache cluster for few minutes, reason unknown :) (thats what 'unbalanced load' probably means, hehe)
  • 00:12 brion: unbalanced load; db16 reporting too many connected threads

August 23

  • 23:32 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20363: namespace fixes'
  • 23:14 Fred: HD in db18 Enclosure 0 Slot 21 failed and needs to be replaced.
  • 23:12 Fred: HD in db21 - Enclosure0 Slot 7 needs to be replaced.
  • 22:55 Fred: restarted apache on srv140
  • 22:55 Fred: restarted memcached on srv197
  • 16:43 Fred: restarted memcached on srv211
  • 00:50 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php '20338 Enable Collection on Bengali Wikipedia'
  • 00:18 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'fix eswikibooks english import source (bug 20352)'

August 22

  • 00:48 atglenn: next approach: zfs send (ms1) -> cat > filename (ms5). running in screen.
  • 00:43 atglenn: stopped zfs send/receive from ms1 -> ms5: 3 GB in two hours = fail :-P

August 21

  • 22:16 atglenn: starting copy of image data to ms5 (consider this timing tests, not the real thing): zfs send running in screen session
  • 20:51 aZaFred_: restarted memcached on srv212
  • 20:36 logmsgbot: fvassard synchronized php-1.5/wmf-config/db.php 'commenting out the rc2 cluster since it doesnt seem to be used anymore...'
  • 20:01 logmsgbot: fvassard synchronized php-1.5/wmf-config/mc.php 'Removed srv52 and added srv219 in its place since srv52 is not behaving...'
  • 17:48 aZaFred_: fixed nrpe config on adler as it was set not to accept any connection not coming form 127.0.0.1
  • 17:37 aZaFred_: switching Nagios from bart to spence (its new home) and decomissioning Nagios on Bart.
  • 16:34 aZaFred_: changed settings for sq43 since /dev/sdd disappeared.
  • 16:15 aZaFred_: restarted memcached on srv89
  • 15:09 hcatlin: Mobile1 just had af, az, bn, ca, bg, and drb languages deployed.

August 20

  • 23:02 domas: opensearch is causing most of 'revision' SQL load (probably text too). search overall accounts for more than 25% of database::query time :-)
  • 19:48 logmsgbot: midom synchronized php-1.5/includes/Revision.php 'removing profiling hooks'
  • 19:07 logmsgbot: midom synchronized php-1.5/includes/Revision.php 'profiling hooks'
  • 18:42 brion: running featured article imports on readerfeedback & flaggedrevs test wikis
  • 18:25 brion: pulling enwiki featured articles to copy into readerfeedback & flaggrevs test wikis
  • 18:18 domas: Just logging, that SUL and job queue was broken because of bogus entry in dblist
  • 18:18 domas: DNS was unhappy!
  • 18:09 brion: nuked bogus 'en_flaggedrevs_labswikimedia' entry from all.dblist; per domas it broke job queue
  • 18:00 logmsgbot: brion synchronized php-1.5/wmf-config/flaggedrevs.php 'adding flaggedrevs_wikimedia config'
  • 17:41 logmsgbot: fvassard synchronized php-1.5/wmf-config/InitialiseSettings.php 'Added default languages for flaggedrevs.labs and readerfeedback.labs'
  • 17:22 Fred: synched db-list file for readerfeedback/flaggedrevs test wikis
  • 17:22 Fred: removed srv52 from dsh nodelist as it was causing sync* to fail (amongst other things)
  • 16:58 brion: poking at readerfeedback/flaggedrevs test wikis setup with fred
  • 16:42 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '20300 - Fix arwikisource namespaces'
  • 12:25 logmsgbot: andrew synchronized php-1.5/wmf-config/abusefilter.php 'Remove abusefilter-private right from abuse filter users on nlwiki, it leaks IP addresses to unidentified users'

August 19

  • 21:00 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'labs settings'
  • 20:53 Rob: added dns for readerfeedback and flaggedrevs lab sites
  • 20:47 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'flaggedrevslabswiki'
  • 18:57 Rob: and bug 19019
  • 18:57 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'setting wgVariantArticlePath for bug 8532'
  • 17:29 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 17:14 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'rob undoing changes he made'
  • 17:08 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 17:05 logmsgbot: robh synchronized wmf-deployment/wmf-config/InitialiseSettings.php
  • 15:39 mark: Shutting down MySQL on ms2, rebooting the machine
  • 15:13 logmsgbot: midom synchronized php-1.5/wmf-config/CommonSettings.php 'db/mc file renames'
  • 15:12 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 15:12 logmsgbot: midom synchronized php-1.5/wmf-config/mc.php
  • 15:07 logmsgbot: midom synchronized php-1.5/wmf-config/db-pmtpa.php
  • 14:27 apergos: snapshots enabled on ms1 about ten hours ago after a small disaster with image deletion affecting wps: ch, cdo, ce, ceb (restored from ms6 and elsewhere, thumbs not impacted)
  • 12:11 mark: Repeated cache dir changes for SSD testing on sq49... they disappeared last week, how come? Did someone overwrite the upload-settings squid conf file?
  • 10:40 rainman-sr: updated to latest mwdumper on searchidx1 to fix en.wp index update breakage
  • 04:38 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'Give unreviewedpages right to Coders'

August 18

  • 23:16 logmsgbot: brion synchronized php-1.5/languages/messages/MessagesBg.php 'touching for cache test'
  • 22:24 brion: configured wikitech-l and mediawiki-l to accept jpeg, png, and gif attachments
  • 20:14 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#20300 arwikisource namespaces'
  • 19:34 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'bump style version'
  • 18:31 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#20301 bnwiki sitename'
  • 12:12 domas: ruby processes were running out of FDs because of new logging code - fds weren't closed, leading to 1024 limit hit in ~16 hours :)
  • 11:19 domas: mobile1 was serving 502's. mobile1 is a fucked up mess too. restarted merb via /etc/init.d/thin start, seems to have brought mobile1 up
  • 00:44 atglenn: removing the rest of the (now unused) thumb data from ms1. yay!

August 17

  • 23:26 atglenn: add fred to root alias on mchenry
  • 23:04 atglenn: enabling zfs snapshots for thumbs on ms4 (hourly/daily/weekly/monthly)
  • 21:40 logmsgbot: brion synchronized php-1.5/extensions/ContributionTracking/ContributionTracking_body.php 'update r55206 contrib tracking timestamp'
  • 20:14 atglenn: moving thumbs out of the way on ms1 (here we go again)... nothing should be using these any more...right?
  • 20:04 atglenn: move outdated stuff from /export/upload/scripts on ms1 to /root/export-scripts-old (including rcs dir; we have current stuff in svn now)
  • 19:33 Fred: enabled ganglia data gathering of export/thumbs .
  • 18:52 atglenn: tweak thumbhandler on ms4 to handle multiple / after the hostname
  • 18:11 atglenn: remove cache server stuff from thumbhandler.php on ms4, we don't have thumb cache servers any more
  • 17:54 atglenn: change error handler on ms1 from thumbhandler to 404.php since we don't serve thumbs from there now
  • 15:25 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Increment wgStyleVersion to placate Simetrical'
  • 15:16 logmsgbot: tstarling synchronized php-1.5/skins/monobook/KHTMLFixes.css 'r55187'
  • 13:37 Tim: removed the /wmf-config/ from sync-file so that you don't have to prefix every filename with ".."
  • 13:33 logmsgbot: tstarling synchronized php-1.5/wmf-config/../includes/RawPage.php 'r55180'
  • 13:33 logmsgbot: tstarling synchronized php-1.5/wmf-config/../api.php 'r55180'
  • 13:32 logmsgbot: tstarling synchronized php-1.5/wmf-config/../index.php 'r55180'
  • 13:32 logmsgbot: tstarling synchronized php-1.5/wmf-config/../includes/WebRequest.php 'r55180'

August 16

  • 21:08 domas: set db8 read_only=0 :)
  • 21:06 domas: commonswiki split off to separate database cluster s4 (db8,db3,db5,ixia), split off log position: db8-bin.005:650239113
  • 21:06 logmsgbot: midom synchronized php-1.5/wmf-config/CommonSettings.php 'commons ro end'
  • 21:05 logmsgbot: midom synchronized php-1.5/wmf-config/db-pmtpa.php
  • 21:04 logmsgbot: midom synchronized php-1.5/wmf-config/CommonSettings.php 'commons going readonly'
  • 11:02 domas: sq45 I/O errors from start: http://p.defau.lt/?87cfQAK1ybCk8SALdYPV7g
  • 10:53 domas: sq45 sdd was misbehaving, got better after restart
  • 10:47 domas: powercycled sq45

August 15

  • 20:27 mark: Started MySQL on ms2
  • 19:18 logmsgbot: root ran sync-common-all
  • 19:08 JeLuF: closed advisorywiki (bug 19855)
  • 19:08 logmsgbot: jeluf ran sync-common-all
  • 19:03 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#19777 tkwiktionary and trwiktionary namespace settings'
  • 17:18 JeLuF: Added a check to sync-common-file and scap that stops the script if SSH_AUTH_SOCKET is not set (missing ssh-agent). See bug 20080
  • 17:03 mark: Set up udev rules for Solaris like hard drive symlinks, Partitioned unused drives in ms2, set up RAID-1 arrays over unused drives (now syncing)
  • 17:01 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#20194 favicon for strategywiki'
  • 16:58 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#20232 arcwiki sitename change'
  • 16:51 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#20240 mwlwiki logo'
  • 16:41 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#19661 Namespace alias'
  • 14:38 mark: Rebooted sq1
  • 14:37 mark: Deinstalled squid on sq1, it's for varnish testing
  • 12:39 mark: Shutdown MySQL on ms2
  • 07:38 domas: verified that morebots is functional :)
  • 07:36 domas: fixed main instance replication for ms3->storage1
  • 07:36 domas: restarted morebots, was hanging somewhere in the ether

August 14

  • 04:13 aaron: synchronized php-1.5/wmf-config/flaggedrevs.php 'Disabled flaggedrevs autopromote on ukwiktionary'
  • 18:29 atglenn: tweak thumb url regexp for upload squids and make live
  • 17:50 atglenn: turn logging back off on sq9, found em
  • 17:42 logmsgbot: fvassard synchronized php-1.5/wmf-config/mc-pmtpa.php 'Replaced srv122 with srv212 since srv122 is down.'
  • 17:42 atglenn: turn on access logging temporarily on sq9 to track down some thumb requests to ms1
  • 09:55 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disable optin on strategy, usability. There is no point.'

August 13

  • 22:20 atglenn: redeploy sq1 change, since its files don't get refreshed by the generate script :-P
  • 21:45 Rob: sq43 depooled due to bad hdd
  • 21:40 Rob: rebooting sq43 for possible bad hdd check
  • 20:52 atglenn: sq43 looking for a cache dir on /dev/sdd (which it doesn't have), leaving nagios to whine while I look into it
  • 20:41 atglenn: restarted backend squid on sq43
  • 20:32 atglenn: pushed out ms4 change to all pmtpa upload squids
  • 20:23 aZaFred_: rebooted clematis after it became partially unresponsive.
  • 19:55 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Big 20226 set logo for pnbwiki'
  • 19:37 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 20094 fixed'
  • 19:07 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19777 Set sitename of Turkmen wiktionary'
  • 19:04 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '20191 Remove autopatrol usergroup from Arabic Wikipedia'
  • 19:01 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '20222 Turkish Wikinews Translations of Site Variables'
  • 18:58 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19923 Flood flag for es.wikibooks'
  • 18:48 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20094 please create Auteur namespace at fr.wikisource'
  • 18:40 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'enable upload on trwikinews per bug 20215'
  • 18:35 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Fixing language of ua.wikimedia.org'
  • 18:32 atglenn: deploy ms4 update to sq13 (live squid)... see how it goes
  • 17:32 brion: set usermatchmode=search in Bugzilla config, should help in assigning bugs/CCs
  • 15:32 mark: Rebooting ms2
  • 12:22 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'Limit patrolling to $wgFlaggedRevsPatrolNamespaces on huwiki'
  • 11:13 Andrew: Scapping to get a consistent state.
  • 11:13 Andrew: turned HoneypotIntegration off, Somebody went and removed useful options from HttpFunctions, which are needed to maintain the data.
  • 10:50 logmsgbot: andrew synchronized php-1.5/wmf-config/secret-projects.php 'HoneypotIntegration secret data'
  • 10:42 Andrew: Setting up a test setup of HoneypotIntegration extension on testwiki
  • 07:48 logmsgbot: tstarling synchronized php-1.5/wmf-config/db-pmtpa.php 'removed ms2 from cluster22'
  • 00:50 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'adding width and height to wikimedia-project footer icon per bug 20203'

August 12

  • 23:56 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'ptwiki 500k celebration logo per bug 20207'
  • 22:19 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'Configured reader feedback on strategy wiki'
  • 22:10 logmsgbot: aaron synchronized php-1.5/wmf-config/CommonSettings.php 'Configured reader feedback on strategy wiki'
  • 22:09 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'Configured reader feedback on strategy wiki'
  • 22:00 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'Configured reader feedback & Proposal namespace for strategywiki'
  • 20:50 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Fix wgFeedbackNamespaces'
  • 20:49 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Fix wgFeedbackNamespaces'
  • 20:48 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Fix wgFeedbackNamespaces'
  • 20:46 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Activating ReaderFeedback on strategywiki'
  • 20:45 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Activating ReaderFeedback on strategywiki'
  • 20:33 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'rollback for enwikinews Editors'
  • 19:38 Rob: running authdns-update for all the new project langcodes added just now
  • 19:33 logmsgbot: robh ran sync-common-all
  • 19:06 logmsgbot: robh ran sync-common-all
  • 18:31 logmsgbot: robh ran sync-common-all
  • 16:26 aZaFred_: upgraded wordpress to 2.8.4 on techblog and blog to address newly released security vulnerability.
  • 14:38 logmsgbot: andrew synchronized php-1.5/skins/vector/main-ltr.css '-m Updated skins/vector/main-?t?.css to r54861, fixes for r53975 which broke because it was only partially merged into wmf-deployment.'
  • 14:37 logmsgbot: andrew synchronized php-1.5/skins/vector/main-rtl.css '-m Updated skins/vector/main-?t?.css to r54861, fixes for r53975 which broke because it was only partially merged into wmf-deployment.'
  • 13:36 logmsgbot: mark synchronized php-1.5/wmf-config/db-pmtpa.php 'Depool ms2'
  • 13:33 mark: Shutting down MySQL on ms2
  • 06:26 logmsgbot: midom synchronized php/cache/trusted-xff.cdb 'adding UAE proxies'
  • 02:48 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'Allow blocked users to edit own talk page on es.wikibooks'
  • 02:27 Aaron|notebook: Note: Confirmed group is completely add/removeable on arwiki by sysops, not just self-add/removeable; this is how it should be
  • 02:19 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'made Confirmed group self-add/removeable on arwiki for sysops'
  • 02:07 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'add Portal to reviewable namespaces for enwikinews'
  • 01:55 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'Sync the file that actually changed'
  • 01:51 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'flood flag for plwiki'
  • 01:26 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'rollback for editors on plwiki'

August 11

  • 23:38 atglenn: new backend cache file on sq1 for testing
  • 22:52 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'reenable EditWarning now bug is fixed'
  • 22:28 atglenn: sq4 back in
  • 21:56 atglenn: take sq4 out of upload pool temporarily for thumb testing
  • 20:56 AaronSchulz: Set $wgFlaggedRevTags/$wgFlaggedRevValues on dewikiqoute per bug 19179
  • 20:44 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php
  • 18:53 tomaszf: pulling optin_survey table from enwiki on db28 for Roan
  • 18:12 domas: started rsync to storage1 from ms3, in root@storage1's screen
  • 18:04 mark: Prepared storage1 for emergency copy of ms3 data
  • 18:01 RobH_A90: srv122 shutdown due to broken cooling fan
  • 16:30 RobH_A90: db12 back up, restarted mysql
  • 16:13 mark: Shutting down MySQL on ms2
  • 16:03 domas: db12 is now down for raid battery replacement (thanks cobi for noticing that)
  • 16:01 RobH_A90: shutting down db12 for raid battery replacement
  • 15:59 RobH_A90: shutting down mysql on db12 for raid battery replacement
  • 15:58 logmsgbot: robh synchronized php-1.5/wmf-config/db-pmtpa.php 'depooling db12 to replace raid controller battery'
  • 15:53 RobH_A90: sq49 shutdown and disks 3/4 replaced with ssd for mark
  • 12:39 domas: powercycled srv156/srv159, seems like job memory use skyrocketed, and conflicted with apache/memcached/puppet/etc
  • 07:39 domas: db3 and db8 had same server_id, so, replication worked, just with reconnect after every event.. :)
  • 07:25 Domas: db3 and db8 had the same server ID, fixed.
  • 06:59 logmsgbot: tstarling synchronized php-1.5/wmf-config/db-pmtpa.php 'repooling db8, not as broken as I thought'
  • 06:41 Tim: restarting the slave thread on db3 since the error log on db13 shows an error flood due to that slave "Start client, asynchronous binlog_dump to slave_server(100236)"
  • 06:27 Tim: depooled db8 due to broken replication
  • 06:27 logmsgbot: tstarling synchronized php-1.5/wmf-config/db-pmtpa.php
  • 06:20 Tim: investigating replication issues on db8, tried restarting slave thread, no improvement.
    • Shows "Slave: received 0 length packet from server, apparent master shutdown"
  • 05:48 Tim: disabled EditWarning due to bug 20171
  • 05:47 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php

August 10

  • 23:46 logmsgbot: fvassard synchronized php-1.5/wmf-config/db-pmtpa.php 'Adding db8 to commonwiki pool.'
  • 23:43 logmsgbot: fvassard synchronized php-1.5/wmf-config/db-pmtpa.php 're-pooled db3'
  • 23:11 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php
  • 22:37 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20001 -- activate fancy edit toolbar by default on strategy wiki'
  • 22:33 Andrew: Updated interwiki cache for strategy wiki interwiki prefix.
  • 22:32 logmsgbot: andrew ran sync-common-all
  • 20:59 Fred: shutting down db3 for replication to db8.
  • 20:58 logmsgbot: fvassard synchronized php-1.5/wmf-config/db-pmtpa.php
  • 14:37 Tim: deployed r54721
  • 14:35 logmsgbot: tstarling synchronized wmf-deployment/cache/trusted-xff.cdb

August 9

  • 12:42 logmsgbot: midom synchronized php-1.5/wmf-config/db-pmtpa.php
  • 07:03 logmsgbot: midom synchronized php-1.5/wmf-config/db-pmtpa.php
  • 06:07 domas: restarted few segfaulting apaches......... ... ..... . ..

August 7

  • 23:20 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'updating sr wikinews logo for 10k event per dungodung'
  • 22:09 domas: brought up db18 after some maintenance hybernation
  • 22:09 domas: db8 raid toasted
  • 22:01 logmsgbot: brion synchronized php-1.5/wmf-config/flaggedrevs.php 'add unreviewedpages, autoreview to coders on MWW'
  • 21:54 logmsgbot: brion synchronized php-1.5/wmf-config/flaggedrevs.php 'let coders review and validate on FR pages on mediawikiwiki'
  • 21:33 logmsgbot: brion synchronized php-1.5/wmf-config/flaggedrevs.php
  • 21:30 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 21:27 brion: syncing dblist (sync-dblist now fixed... i hope)
  • 21:21 logmsgbot: brion synchronized php-1.5/wmf-config/flaggedrevs.php 'testing flaggedrevs limited config on MW wiki'
  • 20:11 Rob: removed UsernameBlacklist from CommonSettings.
  • 20:09 logmsgbot: robh synchronized php-1.5/wmf-config/CommonSettings.php
  • 19:59 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 19152 Enable flood flag on Simple English Wikiquote'
  • 19:55 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '15685 Enable Extension:Nuke on all but largest wikis - cept its already on enwiki, so now its on EVERYWHERE'
  • 19:16 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'fix _ missing from wgMetaNamespace on strategy wiki'
  • 19:14 Rob: changed wording in flaggedrevs for dewikiquote since the non-english name was crashing the project.
  • 19:12 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php
  • 19:09 Andrew: Started spamming board elections notices to eligible voters in a screen on zwinger.
  • 19:01 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19521 Enable the flood flag on the English Wikinews'
  • 18:50 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '20034 Reconfiguration of flagged revs on en.Wikibooks'
  • 18:50 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php '20034 Reconfiguration of flagged revs on en.Wikibooks'
  • 18:49 brion: updated vector & usability skin per trevor's recent updates
  • 18:48 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 18:46 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 18:22 brion: some problems with upload serving being unusually slow. we're looking into it
  • 18:16 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19813 Enable the rollback usergroup at Simple English Wikiquote'
  • 18:11 logmsgbot: robh synchronized php-1.5/wmf-config/abusefilter.php 'Updates to bug 19772'
  • 17:26 brion: added some left padding to header and mid-content sections on techblog so it doesn't look like crap
  • 17:11 logmsgbot: robh ran sync-common-all
  • 17:11 Rob: enabled abuse filter on itwiki per bug 19772 Activate abuse filter on itwiki
  • 17:04 logmsgbot: robh ran sync-common-all
  • 16:56 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19401 please rename WikiSaurus namespace on de.wikt'
  • 16:48 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20110 suppressredirect right gone missing on en.wikibooks'
  • 07:32 domas: restarted puppetd (forgot yesterday) everywhere
  • 07:07 domas: cleaned up/signed certs for srv101 and srv122
  • 05:56 logmsgbot: midom synchronized php-1.5/wmf-config/db-pmtpa.php 'db11 load wasnt 0 for frja'
  • 02:26 brion: added a news item on our old sourceforge project directing folks to mediawiki.org. Might help discourage people from discovering and installing 1.9.3 :)
  • 00:24 logmsgbot: brion synchronized wmf-deployment/extensions/UsabilityInitiative/OptIn/OptIn.hooks.php
  • 00:20 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'adding usebetatoolbar to pref stats tracking'

August 6

  • 22:00 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'enable EditWarning sitewide'
  • 21:41 logmsgbot: brion synchronized php-1.5/wmf-config/InitialiseSettings.php 'enable beta ui opt-in sitewide'
  • 21:21 brion: setting up tables for opt-out stats and prefstats on all wikis...
  • 20:31 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enabling Abusefilter on fawiki per bug 19642'
  • 20:28 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19636 Enable for gdwiki'
  • 20:16 Rob: added more blogs to planet per requests.
  • 19:57 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '19993 Change autoconfirmed settings on plwiki'
  • 19:56 Rob: rolled back autoconfirmed change to huwiki until consensus is reached.
  • 19:56 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php
  • 19:52 logmsgbot: robh synchronized php-1.5/wmf-config/CommonSettings.php 'Bug 19941 Hungarian Wikinews Licence Change'
  • 19:45 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '20066 Logo change for li.wiktionary'
  • 19:43 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'REVERTING Bug 20045 Logo change'
  • 19:42 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20045 Logo change for pih.wikipedia'
  • 19:39 Rob: enabled flaggedrevs on arwiki per bug 19332, kicked over to aaronbecause the auto promote scripts no longer function.
  • 19:36 logmsgbot: robh ran sync-common-all
  • 19:34 logmsgbot: robh ran sync-common-all
  • 19:18 Fred: added regex match for
    to all http checks on Nagios. (getting a better idea of when things are actually partially broken)
  • 18:52 logmsgbot: ariel synchronized wmf-deployment/wmf-config/CommonSettings.php 'switch to ms4 for thumbs everywhere'
  • 18:47 Rob: gave autoconfirmed users the same rights as confirmed on flaggedrevs in huwiki (fixing the issue of the new confirmed group overriding the old settings for huwiki.)
  • 18:46 logmsgbot: robh ran sync-common-all
  • 18:41 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 19315 Set =false on Hungarian Wikipedia'
  • 18:37 Rob: Enabled abusefilter with default settings on huwiki per bug 19109 Enable AbuseFilter in Hungarian Wikipedia
  • 18:37 logmsgbot: robh ran sync-common-all
  • 18:18 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php '14716 Grant noratelimit right to the editor group in the Hungarian Wikipedia'
  • 17:41 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php 'ok now actually correct order. lucene, then titlekey'
  • 16:34 brion: moving TitleKey initialization before Lucene per robert's rec (fixes lucene prefix search for enwiki)
  • 16:34 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 16:28 brion: reenabled TitleKey on enwiki (disabled by hidden hack in lucene config). rebuilding index
  • 16:27 logmsgbot: brion synchronized php-1.5/wmf-config/lucene.php
  • 15:27 Rob: correction, flashing the raid bios, not system bios
  • 15:27 logmsgbot: robh synchronized php-1.5/wmf-config/db-pmtpa.php 'db18 down for flashing bios'
  • 12:53 domas: cluster is full of hanging brion's rsyncs back from Jul17 :)
  • 12:49 domas: puppetd memleaking memory in hundreds of megabytes...
  • 12:04 domas: srv101 had corrupted PHP tree, was giving out 'Override this function.' message to everyone. sync-common'ed it, needs sysadmin investigation.

August 5

  • 21:59 brion: prepping wmf-deployment update on test, includes updates to vector & usabilityinitiative ext
  • 15:02 domas: cleaned up space on db20, filtered out db8 syslog stream

August 4

  • 22:23 atglenn: testing move of thumb repo to ms4, on test.wikipedia
  • 21:48 atglenn: shut down hadoop on hume, no longer in use
  • 19:32 brion: scapping wmf-deployment r54386 (adds wgMainPageTitle, needed for mobile redir)
  • 19:07 logmsgbot: midom synchronized php-1.5/wmf-config/db-pmtpa.php 'disable db8'
  • 15:52 Rob: updated software for blogs
  • 14:15 RobH_A90: dobson drives swapped back to original for comparsion ssd testing

August 3

  • 23:28 Fred: updated wikimedia-task-appserver on srv[92,110,130,131] as they were running old versions.
  • 23:25 brion: provisional setup for auto-triggering of parser tests on trunk commits up. (needs init scripts etc)
  • 23:11 Fred: removed db4 from mysql group since mysql is not installed on it.
  • 23:09 Fred: fixed RAID check on db2
  • 22:59 Fred: added nagios (web) and ganglia monitoring to Grosley
  • 22:49 Fred: reconciled the LVS config and what is in the node-list for the Apaches group.
  • 21:31 domas: addded db17 and db18 back to s3 rotation
  • 21:31 logmsgbot: midom synchronized php-1.5/wmf-config/db-pmtpa.php
  • 19:36 Rob: pushed updated file for flaggedrevs, if there are issues, old file is named flaggedrevs.messy.bak in the wmf-config. If not needed within a few hours, it will be archived.
  • 19:35 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php 'pushing aarons updated file'
  • 16:16 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 19639 import sources for vecwiki'
  • 15:29 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Logo updates on bugs 19675 19898 20016'
  • 15:24 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 19884 dewikisource alias'
  • 15:22 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 16919'
  • 15:19 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 19400'
  • 15:18 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 19709 rolling back'
  • 15:17 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 19709'
  • 08:54 logmsgbot: midom synchronized php-1.5/wmf-config/db-pmtpa.php 'enabling db26...'

August 2

  • 20:01 mark: Started frontend squid on sq23
  • 16:12 hcatlin: mobile1 ran out of disk b/c of log file size. Fixed now.
  • 15:44 mark: Eiximenis frontend seems limited at 26 k FDs somehow, reduced its LVS load again
  • 14:23 mark: Installed gmond on sq50
  • 14:09 mark: Increased COSS cache dir sizes on sq50 from 15 to 35 GB, aufs from 20 to 24 GB
  • 13:47 mark: Increased LVS weight on eiximenis from 45 to 60 (4x other squids)
  • 13:47 mark: Increased COSS cache dir size on eiximenis from 65 GB to 128 GB
  • 07:41 brion: running parsertests on trunk checkins from r54052 on (not yet auto triggered)
  • 07:20 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 07:20 logmsgbot: brion synchronized php-1.5/wmf-config/secret-projects.php
  • 07:07 logmsgbot: brion synchronized php-1.5/wmf-config/secret-projects.php 'add key for codereview test uploads'
  • 07:04 brion: updating CodeReview extension, starting test of the testing integration :)

August 1

  • 20:17 mark: Added DNS entries for <language>.m.wikipedia.org (autogenerated from langlist)
  • 20:11 mark: Running apt-get upgrade on bayle
  • 14:56 mark: Increased LVS weight of eiximenis from 30 to 45 (3x the load of other squids)
  • 04:54 Rob: updated blogs software with more plugins for security
  • 00:44 logmsgbot: kate synchronized php-1.5/wmf-config/db-pmtpa.php 'removing db30 for commons dump for TS'

July 31

  • 22:46 tomaszf: running log splitter on searchidx1 to cleanup udplogger searchqueries
  • 21:08 Fred: bounced squid on sq23
  • 21:08 Fred: bounced squid on sq43
  • 20:50 Fred: and finally srv174
  • 20:49 Fred: and srv32 / srv97
  • 20:48 Fred: and srv207
  • 20:47 Fred: bounced apache on srv129
  • 20:46 Fred: added missing srv130 to apaches nodelist and synched common on it.
  • 18:46 Fred: re-synched nagios to get rid of 'false' mysql alerts
  • 18:42 Fred: deployed wikimedia-base (0.20) everywhere.
  • 17:40 Fred: updated wikimedia-base to include acct and enable SAR in order to do post-crash analysis when necessary. New version: 0.20. (Also changed dependency from ntp-simple to ntp as the ntp-simple package does not exist anymore)
  • 15:17 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Reenabled CentralNotice'
  • 15:15 logmsgbot: kate synchronized php-1.5/wmf-config/db-pmtpa.php 'remove db26 to dump s1 for TS'
  • 14:15 brion: removing stray readonly message from db.php on s3 (but not on s3frja... god our config's weird)
  • 14:15 logmsgbot: brion synchronized php-1.5/wmf-config/db-pmtpa.php
  • 14:14 brion: noting for toolserver repl fix: old s3 pos was db18-bin.090 454738665
  • 14:10 brion: disabling general readonly
  • 14:10 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 14:09 brion: s3 reset to master_host='db11', master_log_file='db11-bin.001', master_log_pos=79
  • 14:06 brion: setting db11 up as temp master on s3
  • 14:04 mark: Restarted MySQL on db18, running recovery
  • 14:01 brion: prepping a manual master switch
  • 13:50 logmsgbot: brion synchronized php-1.5/wmf-config/db-pmtpa.php 'disabled down db18'
  • 13:49 brion: removing down db18
  • 13:47 logmsgbot: brion synchronized php-1.5/wmf-config/CommonSettings.php
  • 13:28 logmsgbot: mark synchronized php-1.5/wmf-config/db-pmtpa.php
  • 13:27 Andrew: Note that sync-file wmf-config/db-pmtpa.php does not work, you use sync-file db-pmtpa.php
  • 13:27 Andrew: Updated SwitchSettings.php for new location of db.php (wmf-config/db-pmtpa.php)
  • 13:26 logmsgbot: andrew synchronized php-1.5/wmf-config/db-pmtpa.php 'test'
  • 12:40 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'temporarily disabled CentralNotice'
  • 12:32 Rob: scheduled reboot of csw5-pmtpa took place at 8am, traffic between esams and pmtpa is very high since as traffic and squid cache misses normalize out.

July 30

  • 20:26 mark: Pooled eiximenis as frontend upload squid
  • 19:10 logmsgbot: brion ran sync-common-all
  • 19:10 brion: fred fixed login perms on 101, 122. running a sync-common-all to resync deployment
  • 19:02 brion: srv101, srv122 serving HTTP but not taking updates via ssh
  • 18:29 brion: installing texvc build & run deps on wikitech box for parser testing
  • 16:07 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 20015'
  • 00:12 logmsgbot: brion synchronized wmf-deployment/wmf-config/CommonSettings.php 'disable temp req logging on en'
  • 00:11 logmsgbot: brion synchronized wmf-deployment/api.php 'reenable api to test'
  • 00:08 logmsgbot: brion synchronized wmf-deployment/api.php 'temp disable api to test'
  • 00:08 brion: connections still fill up with api disabled
  • 00:07 logmsgbot: brion synchronized wmf-deployment/wmf-config/CommonSettings.php
  • 00:07 logmsgbot: brion synchronized wmf-deployment/wmf-config/InitialiseSettings.php
  • 00:04 brion: setting up some experimental req url logging on backend for enwiki issues

July 29

  • 23:43 brion: temporarily bumping max connections to 6k on db16
  • 23:39 brion: new boxes need to be fixed so apache-restart family of scripts work, or else the scripts replaced
  • 18:17 Rob: srv131 back online
  • 18:17 Rob: srv118 and srv130 back online
  • 18:15 Rob: rebooted srv131 for lockup
  • 18:13 Rob: rebooting srv118 & srv130, locked up
  • 18:11 Rob: srv110 and srv113 back online
  • 18:08 Rob: rebooted srv110, srv113, both locked up
  • 16:18 river: removed stale fingerprint for ns1.wikimedia.org on bayle
  • 14:01 Rob: shutting down mysql on db12, when done will shut down system and replace bad battery on controller
  • 13:30 Rob: upgraded dobson memory for squid ssh testing
  • 13:24 Rob: shutdown eiximenis, upgraded ram
  • 04:36 Tim: running svn cleanup on locked directories as indicated by find -name lock
  • 04:19 Tim: resharing ms1:/export/upload with root allowed, so that the above command actually works
  • 04:15 Tim: fixing ExtensionDistributor permissions with: find -not -user extdist -exec chown extdist {} \;
  • 04:11 Tim: reduced the number of svn-invoker threads on zwinger to 1, to hopefully avoid permanent "lock held" errors (bug 19889)

July 28

  • 21:35 brion: updating to wmf-deployment branch r53906 - adds search API output details
  • 18:06 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'setting skin to vector on strategywiki'
  • 17:58 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'setting subpages and lang for strategywiki'
  • 16:55 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#19908 - remove some redundant settings for wmgUseCollection'
  • 16:55 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#19908 - remove some redundant settings for wmgUseCollection'
  • 16:54 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#19908 - remove some redundant settings for wmgUseCollection'
  • 09:40 Tim: updating SecurePoll via branch

July 27

  • 23:03 atglenn: installed sudo and added george wilson (sun guy) to sudoers on ms1
  • 20:41 mark: Shut down BGP transit session to AS 13680 on seemingly faulty blade 8 in csw5-pmtpa
  • 18:05 atglenn: added account for george wilson (sun zks dev) on spence and ms1
  • 17:52 tomaszf: adding one more worker to snapshots taking total to eight
  • 17:05 mark: Shutdown BGP peer 8342 (RTComm) to see if it resolves the reachability issues from Russia

July 26

  • 21:54 hcatlin_: just pushed UI tweaks to the mobile server

July 25

  • 21:17 RoanKattouw: Reenabled Apache on prototype with a disallow-all robots.txt; it seems GoogleBot was DoSing us
  • 21:12 RoanKattouw: Stopping Apache on prototype, because it gets stuck in OOM death or whatever it is after about 5 minutes every time
  • 20:46 mark: Preffed traffic to 15169 (Google) via 13680
  • 20:05 RoanKattouw: Rebooted prototype

July 24

  • 21:51 hcatlin: pushed new mobile code to support multiple notices
  • 20:08 brion: added $wgUseXVO = true to default config in InitialiseSettings so it'll be there when bug 19845 fix goes live
  • 19:11 tomaszf: starting three xml snapshot threads
  • 18:59 tomaszf: pointing storage2 to /export/dumps instead of /export/archive
  • 17:06 domas: removed henbane & amaryllis from nagios
  • 10:54 logmsgbot: midom synchronized php-1.5/wmf-config/../includes/specials/SpecialSearch.php 'livehack per rainmans suggestion'
  • 05:31 Tim: download.wikimedia.org was totally broken, just showing a few empty directories. Changed the document root on storage2 to /export/archive/public, where there appears to be some useful files

July 23

  • 22:16 mark: Decommissioned all yaseo servers, wiped their disks
  • 20:35 mark: Updated the glue record for ns1.wikimedia.org
  • 20:20 mark: Changed IP of ns1.wikimedia.org to 208.80.152.142 (a svc ip on linne)
  • 19:58 mark: Installed linne.wikimedia.org as auth DNS server
  • 19:12 logmsgbot: brion synchronized wmf-deployment/wmf-config/CommonSettings.php 'restore FancyCaptcha now that image crisis is diverted'
  • 15:38 Rob: setup strategywiki for the strategy planning whatever
  • 15:37 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 15:32 logmsgbot: robh ran sync-common-all

July 22

  • 21:20 Fred: installing wikimedia-task-appserver on srv122. Incoming reboot
  • 15:11 Tim: fixed firewall on browne to deny RC->IRC UDP packets from outside the local network
  • 09:57 logmsgbot: midom synchronized php-1.5/wmf-config/../StartProfiler.php
  • 09:51 domas: apparently someone decided that our profiling is not useful and should be disabled? :)

July 21

  • 23:56 Fred: rebooted pascal (for realz this time)
  • 23:15 tomaszf: fred is pulling backups from ms4 onto storage2.
  • 23:07 Fred: rebooting pascal as he fell over again
  • 22:45 tomaszf: adding snapshot1,2,3 to DHCP
  • 22:03 mark: Increased large object cache dir size to 120 GB on eiximenis
  • 18:28 domas: srv122 booted into netinstall, apparently
  • 17:39 Rob: updated both blog and techblog to newest stable release of wordpress
  • 16:36 brion: internal UDP logging broken since 17 July; looks like udp2log isn't running on db20 since reboot?
  • 16:21 logmsgbot: robh synchronized php-1.5/wmf-config/CommonSettings.php 'death to captcha'
  • 16:15 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'death to the uplaoder group'
  • 16:06 logmsgbot: robh synchronized php-1.5/wmf-config/CommonSettings.php
  • 16:00 Rob: updated sync-fiel script for new file locations
  • 16:00 logmsgbot: robh synchronized php-1.5/wmf-config/CommonSettings.php
  • 15:50 Rob: removing old whygive blog data from dns and archiving the database.
  • 15:07 Rob: updated planet with http://meta.wikimedia.org/wiki/Planet_Wikimedia#Requests_for_inclusion
  • 14:43 mark: Increased COSS cache dirs on pmtpa upload squids
  • 11:30 domas: for i in $(ssh db20 findevilapaches); do ssh $i invoke-rc.d apache2 restart; done \o/
  • 11:29 domas: killed brion's sync processes on zwinger, hanging since July17 :)
  • 09:15 domas: mgmt-restarted srv156

July 20

  • 22:18 mark: Rebooted pascal
  • 21:15 mark: Doubled cache dir sizes on eiximenis, upped carp load from 20 to 30
  • 18:09 hcatlin: restarted mobile1 cluster to load in new software
  • 15:55 Fred: bounced apache on srv193
  • 07:18 Tim: re-enabled CentralNotice
  • 07:17 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 07:13 apergos: enough data removed from ms1 to feel safe for a few days; started mass copy of remaining thumbs to ms4 in prep for complete repo switchover (running in root screen on ms1)
  • 04:48 Tim: copying up all available MW release files from my laptop
  • 04:21 Tim: mounted ms4:/export/dumps on zwinger
  • 04:16 Tim: changed export options for ms4:/export/dumps to allow root access for the local subnet
  • 01:19 hcatlin: On mobile1 we are now gzipping the log files after rotation in /srv/wikimedia-mobile/logs

July 19

  • 22:02 Fred: restarted memcached on srv159
  • 20:43 mark: eiximenis backend squid pooled
  • 20:10 mark: Restarted deadlocked powerdns on bayle
  • 19:14 mark: Installed eiximenis with a Squid OS install
  • 18:58 mark: Moved eiximenis to vlan 100 (squids)
  • 18:55 mark: Changed eiximenis' IP into 208.80.152.119 for Squid testing
  • 17:41 hcatlin: Mobile1's web stack just got switched from Phusion Passenger to Nginx/Thin/Rack.

July 18

  • 15:23 apergos: some thumb directories on ms4 created at request of img scalers were created with owner root and perms 700... fixing
  • 03:55 river: ms5 is ready
  • 01:20 atglenn: continuing with removals of thumbs on ms1. 789G free now, need to reach about 1450 before we can just "maintain". but we're gaining on it.
  • 00:22 brion: set up temporary data dump index, copied the dvd index (it's just offsite links). still need to track some MW releases
  • 00:07 brion: recovering MediaWiki 1.6 through 1.10 release files and re-uploading them...

July 17

  • 23:42 brion: added a 404 page and recovered index.php for our temp download.wikimedia.org
  • 22:05 brion: set wikitech to use vector skin by default :D
  • 22:03 Andrew: Fixed morebots, which was relying on a fragile version check. Just deleted it :)
  • 20:43 brion: fixed paths for noc.wikimedia.org/conf file highlighting
  • 20:38 domas: ms2 has broken disks..
  • 20:31 brion: We're going to see about setting up the previously-idle ms5 so we can get our thumbnailing on
  • 20:01 brion: rob's poking raid rebuild on storage2 (dumps server)
  • 19:03 RobH_A90: eiximenis and dobson pulled for solid state drive testing, do not use for other tasks
  • 18:28 logmsgbot: brion synchronized wmf-deployment/wmf-config/InitialiseSettings.php 'enabling vector for rtl'
  • 18:25 atglenn: started mass move out of the way of thumbnail dirs and replacing with symlinks to ms4
  • 18:25 logmsgbot: brion synchronized wmf-deployment/wmf-config/CommonSettings.php 'bump style version'
  • 18:25 logmsgbot: brion ran sync-common-all
  • 18:24 brion: running sync-common-all for UI updates. need to poke the style ver too :)
  • 18:08 brion: svn up'ing wmf-deployment for test.wikipedia.org. Merged UI fixes from usability team
  • 18:03 Fred: spun a couple more apache server into image scalers: srv219..srv224.
  • 17:28 rainman-sr: putting new location of initialisesettings to lsearch-global-2.1.conf so the incremental updater works again
  • 17:20 Fred: srv224 is now an image_scaler. Adjusted on lvs3, ganglia and dsh's node_list.
  • 17:14 Fred: db20 back online
  • 16:50 Fred: rebooting db20 as it is in a "state"
  • 16:45 brion: looks like we've lost internal /home NFS, which makes some of our internal services very unhappy. investigating...
  • 16:43 brion: ganglia out.
  • 13:44 apergos1: doing next round of removals on ms1 (/export/upload/wikipedia/en/thumb/2) to keep ahead of the game
  • 04:15 apergos: starting removal of /export/upload/wikipedia/en/thumb/1 on ms1 (moved away and symlink to ms4 done already) for more space
  • 03:54 logmsgbot: brion synchronized wmf-deployment/wmf-config/CommonSettings.php 'Disabling sitenotice from maintenance'
  • 03:29 brion: reenabling uploads & image deletion/undeletion
  • 03:29 logmsgbot: brion synchronized wmf-deployment/wmf-config/CommonSettings.php
  • 03:28 brion: remounting ms1 on apaches
  • 00:49 atglenn: only about 1gb gain on each so doing all of /export/upload/wikipedia/en/thumb/0
  • 00:39 atglenn: removing more directories in /export/upload/wikipedia/en/thumb/0 on ms1 and replacing with symlinks to ms4
  • 00:30 logmsgbot: brion synchronized wmf-deployment/includes/specials/SpecialUpload.php
  • 00:30 logmsgbot: brion synchronized wmf-deployment/includes/ImagePage.php
  • 00:27 logmsgbot: brion synchronized wmf-deployment/wmf-config/CommonSettings.php
  • 00:20 brion: temporarily disabling image delete/rename during maintenance

July 16

  • 23:56 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Disabling uploads and setting captcha to not-fancy.'
  • 23:29 atglenn: removing the images in /export/upload/wikipedia/en/thumb/0/00 on ms1 (real dir is a symlink to ms4) to get back some space
  • 22:51 atglenn: sym link back in place, let's see what happens
  • 22:47 atglenn: reverting temporarily while we resolve mount issues for the ms4 share
  • 22:40 atglenn: ...whether the image scalers will fall over if we force them to do (some) regeneration.
  • 22:37 atglenn: on ms1, /export/upload/wikipedia/en/thumb/0/00 symlinked to (shared from ms4) /mnt/thumbs/wikipedia/en/thumb/0/00 to test
  • 21:29 brion: robots.php for robots.txt generation now also working. yay!
  • 21:28 logmsgbot: brion synchronized live-1.5/robots.php
  • 21:28 brion: extract2.php now fixed up for new deployment; portal pages ok (www.wikipedia.org)
  • 21:27 logmsgbot: brion synchronized live-1.5/robots.php
  • 21:26 logmsgbot: brion synchronized extract2.php
  • 21:26 logmsgbot: brion synchronized extract2.php
  • 21:22 logmsgbot: brion synchronized extract2.php
  • 21:18 logmsgbot: brion ran sync-common-all
  • 21:18 brion: rsync messed up the php-1.5 directory to symlink translation. retrying as root
  • 21:14 logmsgbot: brion synchronized extract2.php
  • 21:13 logmsgbot: brion synchronized extract2.php
  • 21:13 atglenn: started copy of thumbnails to ms4, symlinks going in on ms1 (but no data removal yet)
  • 21:11 logmsgbot: brion synchronized live-1.5/extract2.php
  • 21:10 logmsgbot: brion synchronized live-1.5/robots.php
  • 21:09 logmsgbot: brion ran sync-common-all
  • 21:08 brion: attempting to replace the old php-1.5 dir with wmf-deployment symlink
  • 21:02 logmsgbot: brion synchronized wmf-deployment/wmf-config/InitialiseSettings.php 'I think touching the new master InitialiseSettings will fix it'
  • 21:01 logmsgbot: brion synchronized wmf-deployment/includes/GlobalFunctions.php 'mkdir error trackdown hack'
  • 20:54 logmsgbot: brion synchronized wmf-deployment/wmf-config/missing.php
  • 20:52 logmsgbot: brion synchronized wmf-deployment/wmf-config/CommonSettings.php
  • 20:52 logmsgbot: brion synchronized wmf-deployment/wmf-config/reporting-setup.php
  • 20:48 brion: switching all sites to wmf-deployment branch
  • 20:48 logmsgbot: brion synchronized live-1.5/MWVersion.php
  • 19:06 Tim: copying ExtensionDistributor stuff to ms4:/export/ext-dist, from root screen on ms1
  • 19:01 brion: Now running test.wikipedia.org, www.mediawiki.org, and meta.wikimedia.org on new deployment checkout
  • 19:01 logmsgbot: brion synchronized live-1.5/MWVersion.php
  • 18:58 logmsgbot: brion ran sync-common-all
  • 18:39 Tim: restarted xinetd on zwinger
  • 18:24 logmsgbot: tstarling synchronized php-1.5/CommonSettings.php
  • 17:57 brion: also restarted 186, 196 which had some funkiness in php err log
  • 17:56 brion: srv186 also bad sudo
  • 17:55 brion: srv171 has some borkage; sudo config is broken can't run apache-restart as user
  • 17:52 logmsgbot: brion ran sync-common-all
  • 17:51 brion: running updated sync-common-all friendly to non-NFS boxes
  • 17:49 brion: swapped private SVN-managed /home/wikipedia/bin into place
  • 15:09 apergos: removing the last of our snapshots on ms1 :-( getting us a little more space
  • 14:47 apergos: disabled snapshots on ms1 in preparation for move of thumbnails to ms4
  • 14:38 brion: updated wikibugs-l list config to allow bugzilla-daemon@wikimedia.org to post
  • 14:34 brion: restarted wikibugs bot
  • 14:27 brion: ms1 performance seems to be sucking again
  • 14:17 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'adjusting throttle temporarily for outreach event'
  • 11:55 RoanKattouw: ExtensionDistributor repeatedly reported broken in the past 48 hrs
  • 07:08 Fred: traffic profile switched back to normal. Esams is back to normal.
  • 06:11 hcatlin: Mobile1 has returned to normal function.
  • 05:58 hcatlin: Error after restarting mobile1 stopped stats logging from working. Stats will be low for July 15th and higher for July 16th. Parsing of the 6 hour log file (about 1GB) might slow server for next few minutes until caught up.
  • 04:24 Rob: outage for esams servers started at approx 3:20 gmt
  • 04:15 Rob: still waiting on esams to update us about the rack(s), moving traffic to pmtpa
  • 00:59 tomaszf: started backup for latest xml snapshots from storage2 to ms4

July 15

  • 22:30 Rob: updated dns for new snapshot servers becasue tomasz did not want to be in charge of dump servers.
  • 22:10 brion: brion checking around for 0-byte files (not thumbs) to see if we can recover
  • 21:33 atglenn: verified that zfs patch is in place on ms4 (it got sucked in during river's update yesterday)
  • 21:26 logmsgbot: brion synchronized php-1.5/CommonSettings.php 'Restore fancy captcha mode'
  • 21:16 logmsgbot: I_am_not_root synchronized php-1.5/CommonSettings.php 're-enabling Uploads and removing site notice.'
  • 21:01 atglenn: rebooting ms1 after applying zfs patch. *cross fingers*
  • 20:51 logmsgbot: brion synchronized php-1.5/CommonSettings.php
  • 20:51 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php
  • 20:42 brion: reenabled captcha in simple mode (no images; math q)
  • 20:37 brion: captcha system broken while images are offline, need to disable it temporarily
  • 20:18 brion: updated http://en.wikipedia.org/wiki/MediaWiki:Uploaddisabledtext & http://commons.wikimedia.org/wiki/MediaWiki:Uploaddisabledtext
  • 19:43 logmsgbot: fvassard synchronized php-1.5/CommonSettings.php 'Disabling Uploads while ms1 gets fixed (again with an s after upload).'
  • 19:40 logmsgbot: fvassard synchronized php-1.5/CommonSettings.php 'Disabling Uploads while ms1 gets fixed.'
  • 19:40 atglenn: bringing solaris up to current patch level on ms1
  • 19:34 brion: Ok, we're going to temporarily shut off uploading and unmount the uploads dir while we muck about with ms1.
  • 19:14 brion: dropping export/upload@daily-2009-07-11_03:10:00
  • 19:08 brion: restarting web server on ms1, see if that resets some connections to the backend scalers
  • 19:05 brion: restarting nfsd on ms1
  • 18:58 brion: dropping zfs snapshot export/upload@daily-2009-07-09_03:10:00
  • 18:25 RobH_A90: drac and physical setup done for dump1,2,3, will install remotely
  • 17:52 RobH_A90: updated dns for new dump processing servers public and management ips
  • 17:41 Fred: bounced apache on srv45
  • 17:37 Fred: bounced apache on srv47
  • 17:09 RobH_A90: pdf1 is not coming back, working on it
  • 16:56 RobH_A90: shutting down pdf1 and mobile1 to move their power too, weee
  • 16:55 RobH_A90: shutting down spence to move
  • 16:50 RobH_A90: shutting down singer to move its power, blogs and other associated services will be offline for approx. 5 minutes
  • 16:47 Andrew: Restarting apache on prototype
  • 16:46 RobH_A90: shutting down grosley for power move
  • 16:45 RobH_A90: all these power moves are to add the new dump processing servers to the rack
  • 16:45 RobH_A90: shutting down fenari for power move
  • 16:43 RobH_A90: shut down eiximenis and erzurumi to move their power
  • 16:34 RobH_A90: shutting down some servers and moving power around in a4-sdtpa
  • 16:17 Andrew: Changed morebots to tell you through a channel message instead of a private notice when the logging is successful.
  • 15:54 Fred: kernel updated on wikitech from 2.6.18.8 to 2.6.29 (latest available on linode)
  • 15:49 Andrew: Fixed auto-submission of honeypot data, was broken because it needed my perl include path.
  • 15:40 Fred: rebooting wikitech to install new kernel
  • 14:04 Ariel: stopped apaches on image scalers, stopped nfs on ms1, restarting nfs and apaches...
  • 13:52 Ariel: removing more snapshots on ms1 (lockstat showed it hung up in metaslab_alloc again)

July 14

  • 23:15 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'fix fix to enwiki confirmed gruop :D'
  • 22:24 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'fix to confirmed group for en'
  • 20:44 logmsgbot: brion synchronized wmf-deployment/cache/trusted-xff.cdb
  • 20:41 logmsgbot: brion synchronized wmf-deployment/cache/trusted-xff.cdb
  • 20:40 logmsgbot: brion synchronized wmf-deployment/AdminSettings.php
  • 20:22 Fred: restarted a bunch of dead apaches
  • 20:10 brion: doing a sync-common-all w/ attempt to put test.wikipedia on wmf-deployment branch
  • 19:50 logmsgbot: robh synchronized php-1.5/InitialiseSettings.php
  • 19:11 logmsgbot: robh synchronized php-1.5/InitialiseSettings.php 'bug 19611 forgot one thing'
  • 19:09 logmsgbot: robh synchronized php-1.5/InitialiseSettings.php 'bug 19611'
  • 19:08 logmsgbot: robh synchronized php-1.5/InitialiseSettings.php 'bug 19611'
  • 14:16 domas: dropped all june snapshots on ms1, thus providing some relief
  • 01:52 river: patched ms4 in preperation for upload copy

July 13

  • 21:31 Rob: pushing dns update to fix management ips for new apaches
  • 19:05 Fred: added storage3 to ganglia monitor.
  • 18:50 logmsgbot: brion synchronized php-1.5/abusefilter.php 'Disable dewiki missingsummary, mysteriously in abusefilter section. Per bug 19208'
  • 16:30 Fred: install wikimedia-nis-client to srv66 and mounted /home.
  • 16:28 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'fixing wikispecies RC-IRC prefix to species.wikimedia'
  • 16:27 brion: test wiki was apparently moved from dead srv35 to srv66, which has new NFS-less config. thus fail since test runs from nfs
  • 16:24 brion: test wiki borked; reported down for several days now :) investigating
  • 15:12 logmsgbot: midom synchronized php-1.5/db.php 'db26 raid issues'
  • 14:55 logmsgbot: midom synchronized php-1.5/db.php 'db3 and db5 coming live as commons servers'
  • 14:13 domas: dropped few more snapshots, as %sys was increasing on ms1...
  • 11:16 domas: manually restarted plethora of failing apaches (direct segfaults and other possible APC corruptions, leading to php OOM errors)
  • 09:50 logmsgbot: tstarling synchronized php-1.5/includes/specials/SpecialBlockip.php
  • 09:00 Tim: restarted apache2 on image scalers
  • 08:39 logmsgbot: tstarling synchronized php-1.5/includes/Math.php 'statless render hack'
  • 08:05 Tim: killed all image scalers to see if that helps with ms1 load
  • 08:00 Tim: killed waiting apache processes
  • 07:35 logmsgbot: midom synchronized php-1.5/mc-pmtpa.php
  • 07:24 logmsgbot: midom synchronized php-1.5/mc-pmtpa.php 'swapping out srv81'
  • 04:11 Tim: fixed /opt/local/bin/zfs-replicate on ms1 to write the snapshot number before starting replication, to avoid permanent error "dataset already exists" after failure
  • 02:16 brion: -> https://bugzilla.wikimedia.org/show_bug.cgi?id=19683
  • 02:12 brion: sync-common script doesn't work on nfs-free apaches; language lists etc not being updated. Deployment scripts need to be fixed?
  • 02:03 brion: srv159 is absurdly loaded/lagged wtf?
  • 01:58 brion: reports of servers with old config, seeing "doesn't exist" for new mhr.wikipedia. checking...
  • 01:16 brion: so far so good; CPU graphs on image scalers and ms1 look clean, and I can purge thumbs on commons ok
  • 01:10 brion: trying switching image scalers back in for a few, see if they go right back to old pattern or not
  • 01:03 brion: load on ms1 has fallen hugely; outgoing network is way up. looks like we're serving out http images fine... of course scaling's dead :P
  • 00:59 brion: stopping apache on image scaler boxes, see what that does
  • 00:49 brion: attempting to replicate domas's earlier temp success dropping oldest snapshot (last was 4/13): zfs destroy export/upload@weekly-2009-04-20_03:30:00
  • 00:45 brion: restarting nfs server
  • 00:44 brion: stopping nfs server, restarting web server
  • 00:40 brion: restarting nfs server on ms1
  • 00:36 brion: doesn't seem so far to have changed the NFS access delays on image scalers.
  • 00:31 brion: shutting down webserver7 on ms1
  • 00:23 brion: investigating site problem reports. image server stack seems overloaded, so intermittent timeouts on nfs to apaches or http/squid to outside

July 12

  • 20:30 domas: dropped few snapshots on ms1, observed sharp %sys decrease and much better nfs properties immediately
  • 20:05 domas: we seem to be hitting issue similar to http://www.opensolaris.org/jive/thread.jspa?messageID=64379 on ms1
  • 18:55 domas: zil_disable=1 on ms1
  • 18:34 mark: Upgraded pybal on lvs3
  • 18:16 mark: Hacked in configurable timeout support for the ProxyFetch monitor of PyBal, set the renderers timeout at 60s
  • 17:58 domas: scaler stampedes caused scalers to be depooled by pybal, thus directing stampede to other server in round-robin fashion, all blocking and consuming ms1 SJSWS slots. of course, high I/O load contributed to this.
  • 17:55 domas: investigating LVS-based rolling scaler overload issue, Mark and Tim heading the effort now ;-)
  • 17:54 domas: bumped up ms1 SJSWS thread count
  • 11:00 domas: hehehehehe, disabled peer verification on zwinger for now:
      Issuer: C=US, ST=Florida, L=Tampa, O=Wikimedia Foundation Inc., OU=Operations, CN=srv1.pmtpa.wmnet
       Validity
           Not Before: Jul  8 08:03:52 2006 GMT
           Not After : Jul 12 08:03:52 2009 GMT
  • 08:43 tomaszf: rebooted wikitech due to out of memory


  • Jul 12 14:17:32 <TimStarling> !log reduced MaxClients on wikitech.wikimedia.org from 150 to 5
  • Jul 12 14:06:33 <domas> !log srv1 certificate expired
  • Jul 12 11:31:58 <tomaszf> !log rebooted wikitech due to out of memory
  • Jul 12 11:07:58 <tomaszf> !log rebooting wikitech
  • Jul 12 08:41:30 <logmsgbot> !log tstarling synchronized php-1.5/InitialiseSettings.php
  • Jul 12 08:40:31 <logmsgbot> !log tstarling synchronized php-1.5/includes/ImagePage.php
  • Jul 12 08:40:15 <logmsgbot> !log tstarling synchronized php-1.5/includes/DefaultSettings.php
  • Jul 12 08:39:55 <TimStarling> !log merging and deploying r53130, will disable archive thumbnails and see if it has an impact on ms1 load
  • Jul 12 00:31:07 <logmsgbot> !log midom synchronized php-1.5/db.php
  • Jul 11 22:17:15 <logmsgbot> !log andrew synchronized php-1.5/InitialiseSettings.php
  • Jul 11 22:15:46 <werdna> !log Still very slow, going to disable CentralNotice again
  • Jul 11 22:07:30 <RoanKattouw> !log wikitech.wikimedia.org is down
  • Jul 11 20:40:26 <logmsgbot> !log tstarling synchronized php-1.5/InitialiseSettings.php 're-enabling CentralNotice'
  • Jul 11 19:32:06 <TimStarling> !log killed waiting processes again
  • Jul 11 19:24:11 <TimStarling> !log killed all processes in the rpc_wait state, to buy us some time
  • Jul 11 19:12:06 !log Reverted cache_mem reduction on upload squids; the cause of memory pressure is a memleak
  • Jul 11 19:07:47 <TimStarling> !log apaches took a while to restart due to some shell processes hanging on to listening *:80 filehandles while waiting for NFS, *should be fixed now
  • Jul 11 19:03:02 !log Restarting memory leaking frontend squids in upload pmtpa cluster
  • Jul 11 18:57:48 <TimStarling> !log restarting apaches
  • Jul 11 18:56:15 !log Reduced cache_mem from 3000 to 2000 MB on pmtpa upload cache squids

July 11

  • 15:45 mark: Rebooting sq1
  • 15:31 Tim: rebooting ms1
  • 14:54 Tim: disabled CentralNotice temporarily
  • 14:54 logmsgbot: tstarling synchronized php-1.5/InitialiseSettings.php 'disabling CentralNotice'
  • 14:53 logmsgbot: tstarling synchronized php-1.5/InitialiseSettings.php 'disabling CentralAuth'
  • 14:36 Tim: restarted webserver7 on ms1
  • 14:22 Tim: some kind of overload, seems to be image related
  • 10:09 logmsgbot: midom synchronized php-1.5/db.php 'db8 doing commons read load, full write though'
  • 09:22 domas: restarted job queue with externallinks purging code, <3
  • 09:22 domas: installed nrpe on db2 :)
  • 09:22 logmsgbot: midom synchronized php-1.5/db.php 'giving db24 just negligible load for now'
  • 08:38 logmsgbot: midom synchronized php-1.5/includes/parser/ParserOutput.php 'livemerging r53103:53105'
  • 08:37 logmsgbot: midom synchronized php-1.5/includes/DefaultSettings.php

July 10

  • 21:21 Fred: added ganglia to db20
  • 19:58 logmsgbot: azafred synchronized php-1.5/CommonSettings.php 'removed border=0 from wgCopyrightIcon'
  • 18:58 Fred: synched nagios config to reflect cleanup.
  • 18:52 Fred: cleaned up the node_files for dsh and removed all decommissioned hosts.
  • 18:36 mark: Added DNS entries for srv251-500
  • 18:18 logmsgbot: fvassard synchronized php-1.5/mc-pmtpa.php 'Added a couple spare memcache hosts.'
  • 18:16 RobH_DC: moved test to srv66 instead.
  • 18:08 RobH_DC: turning srv210 into test.wikipedia.org
  • 17:56 Andrew: Reactivating UsabilityInitiative globally, too.
  • 17:55 Andrew: Scapping, back-out diff is in /home/andrew/usability-diff
  • 17:43 Andrew: Apply r52926, r52930, and update Resources and EditToolbar/images
  • 16:44 Fred: reinstalled and configured gmond on storage1.
  • 15:08 Rob: upgraded blog and techblog to wordpress 2.8.1
  • 13:58 logmsgbot: midom synchronized php-1.5/includes/api/ApiQueryCategoryMembers.php 'hello, fix\!'
  • 12:40 Tim: prototype.wikimedia.org is in OOM death, nagios reports down 3 hours, still responsive on shell so I will try a light touch
  • 11:07 logmsgbot: tstarling synchronized php-1.5/mc-pmtpa.php 'more'
  • 10:58 Tim: installed memcached on srv200-srv209
  • 10:51 logmsgbot: tstarling synchronized php-1.5/mc-pmtpa.php 'deployed the 11 available spares, will make some more'
  • 10:48 Tim: mctest.php reports 17 servers down out of 78, most from the range that Rob decommissioned
  • 10:37 Tim: installed memcached on srv120, srv121, srv122, srv123
  • 10:32 Tim: found rogue server srv101, missing puppet configuration and so skipping syncs. Uninstalled apache on it.

July 9

  • 23:56 RoanKattouw: Rebooted prototype around 16:30, got stuck around 15:30
  • 21:43 Rob: srv35 (test.wikipedia.org) is not posting, i think its dead jim.
  • 21:35 Rob: decommissioned srv55 and put srv35 in its place in C4, test.wikipedia.org should be back online shortly
  • 20:04 Rob: removed decommissioned servers from node groups, getting error on syncing up nagios.
  • 20:03 Rob: updated dns for new apache servers
  • 19:54 Rob: decommissioned all old apaches in rack pmtpa b2
  • 16:22 Tim: creating mhrwiki (bug 19515)
  • 13:27 domas: db13 controller battery failed, s2 needs master switch eventually

July 8

  • 15:48 domas: frontend.conf changes: fixed cache-control headers for /w/extensions/ assets, did some RE optimizations %)
  • 13:31 logmsgbot: midom synchronized php-1.5/InitialiseSettings.php 'disabling usability initiative on all wikis, except test and usability. someone who enabled this and left at this state should be shot'

July 7

  • 19:06 Fred: adjusted www.wikipedia.org apache conf file to remove a redirect-loop to www.wikibooks.org. (bug #19460)
  • 17:34 Fred: found the cause of Ganglia issues: Puppet. Seems like the configuration of the master hosts gets reverted to being deaf automagically...
  • 17:05 Fred: ganglia fixed. For some reason the master cluster nodes were set to Deaf mode... (ie the aggregator couldn't gather data from them).
  • 15:02 logmsgbot: robh synchronized php-1.5/InitialiseSettings.php '19470 Rollback on pt.wikipedia'
  • 03:37 Fred: fixing ganglia. Expect disruption
  • 00:27 tomaszf: starting six worker threads for xml snapshots
  • 00:12 Fred: srv142 and srv55 will need manual power-cycle.
  • 00:10 Fred: Rolling reboot has finally completed.

July 6

  • 23:57 Fred: restarted ganglia since it is acting up...
  • 23:54 tomaszf: restarting all xml snapshots due to kernel upgrades
  • 18:49 Rob: upgraded spam detection plugins on blog and techblog
  • 18:47 Fred: starting rolling reboot of servers in Apaches cluster.
  • 17:53 tomaszf: cleaning out space on storage2. lowering retention for xml snapshots to 10
  • 17:53 Fred: upgrading kernel on cluster. This will take a while!
  • 17:46 Fred: rebooting srv220 to test kernel update.

July 3

  • 12:51 logmsgbot: andrew synchronized php-1.5/extensions/AbuseFilter/Views/AbuseFilterViewEdit.php 'Re-activating abuse filter public logging in the logging table now that log_type and log_action have been expanded.'
  • 11:45 mark: Kicked iris so it would boot
  • 10:11 logmsgbot: andrew synchronized php-1.5/skins/common/htmlform.js 'IE7 fixes for new preference system
  • 05:51 Tim: restarted squid instances on sq28
  • 05:47 Tim: restarted squid instances on sq2
  • 05:46 Tim: started squid backend on sq10 and sq23, sq24, sq31, restarted frontend on most of those to reduce memory usage
  • 05:35 Tim: restarted squid backend on sq16, was reporting "gateway timeout" apparently for all requests. Seemed to fix it. Will try that for a few more that nagios is complaining about.

July 2

  • 21:38 Rob: sq24 wont accept ssh, depooling.
  • 21:34 Rob: rebooting sq21
  • 21:26 Rob: ran changes to push dns back to normal scenario
  • 19:52 mark: Power outage at esams, moving traffic
  • 19:44 Andrew: Knams down, Rob is looking into it
  • 19:41 Andrew: Reports of problems from Europe
  • 19:25 Andrew: running sync-common-all to deploy mobileRedirect.php to fix hcatlin's mobile redirect/cookie bug
  • 19:22 logmsgbot: andrew synchronized live-1.5/mobileRedirect.php
  • 17:15 mark: Rebooted srv159
  • 16:13 Fred: shutting 217 back down as it is not supposed to be up due to faulty timer causing issues.
  • 16:12 Fred: rebooted srv217. Was unpingable.
  • 14:09 Andrew: Started sending updates of spam.log to Project Honeypot folks every 5 minutes, in my crontab on hume.
  • 11:20 logmsgbot: andrew synchronized php-1.5/skins/common/shared.css 'Live-merging r52669, r52684 at rainman's request, search fixes.'
  • 11:18 logmsgbot: andrew synchronized php-1.5/includes/specials/SpecialSearch.php 'Live-merging r52669, r52684 at rainman's request, search fixes.'
  • 00:03 logmsgbot: brion synchronized php-1.5/CommonSettings.php
  • 00:02 logmsgbot: brion synchronized php-1.5/extensions/MWSearch/MWSearch_body.php 'de-merge broken r52664'

July 1

  • 23:40 brion: poking in tweaks to search and updates to vector
  • 23:22 logmsgbot: brion synchronized php-1.5/CommonSettings.php 'bump wgStyleVersion'
  • 23:21 logmsgbot: brion synchronized php-1.5/skins/vector/main-rtl.css
  • 23:21 logmsgbot: brion synchronized php-1.5/skins/vector/main-ltr.css
  • 23:10 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'set vector skin, new toolbar on for usability wiki'
  • 23:07 mark: Kicked pascal
  • 23:05 logmsgbot: brion synchronized php-1.5/extensions/UsabilityInitiative/EditToolbar/EditToolbar.php 'bumping the js ver no'
  • 23:01 logmsgbot: brion synchronized php-1.5/extensions/UsabilityInitiative/EditToolbar/EditToolbar.js
  • 22:59 logmsgbot: brion synchronized php-1.5/extensions/WikimediaMessages/WikimediaMessages.i18n.php 'to 52659'
  • 22:57 logmsgbot: brion synchronized php-1.5/extensions/UsabilityInitiative/EditToolbar/EditToolbar.i18n.php
  • 22:44 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'enabling UsabilityInitiative (for optional EditToolbar)'
  • 22:43 logmsgbot: brion synchronized php-1.5/CommonSettings.php 'disabling EditWarning pending addl talk'
  • 22:40 brion-codereview: updating UsabilityInitiative ext to r52657 in prep for enabling new toolbar option
  • 22:10 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'Enabling new search UI formatting sitewide'
  • 22:02 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'fixing the RTL disable for vector'
  • 21:58 logmsgbot: brion synchronized php-1.5/InitialiseSettings.php 'Vector should now be available in prefs for non-RTL sites'
  • 21:57 logmsgbot: brion synchronized php-1.5/CommonSettings.php 'vector config tweak'
  • 21:42 brion-codereview: updating Vector to current
  • 19:38 logmsgbot: midom synchronized php-1.5/db.php
  • 16:13 Fred: bayes is running out of memory on a regular basis. Enabled process accounting / sar to gather more data.
  • 15:48 Fred: rebooting Bayes as it locked up again.
  • 11:48 logmsgbot: tstarling synchronized php-1.5/InitialiseSettings.php 'trying a lower value for $wgMaxMsgCacheEntrySize'
  • 11:19 domas: cleaned up srv100
  • 11:18 domas: noticed that imagemagick tempfiles are currently created in /u/l/a/c-l/p/ :)
  • 09:24 domas: pinned mysqlds on half of cores on 8-core boxes: for i in {11..30}; do ssh db$i 'taskset -pc 0-15:2 $(pidof mysqld)' ; done

Archives