Server admin log/Archive 16

From Wikitech
Jump to: navigation, search

October 31

  • 17:21 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 25719 - Add missing slash in timezone'

October 30

  • 23:05 apergos: test of logging (sorry)
  • 21:22 mark: Deploying a sudoers file for NRPE using Puppet
  • 20:48 mark: Running apt-get upgrade on db17
  • 20:48 mark: Pushed updated wikimedia-raid-utils package into the APT repository, with a newer arcconf that should work on Lucid
  • 15:53 atglenn: powercycled mobile2, it was unresponsive to ssh and pings, ganaglia showed no activity
  • 03:05 domas: ms1 can't snapshot either, I suspect kernel bugs. we either have to roll back to 2.6.28 or move forward, or actually try rebuilding filesystems from scratch with new kernels...

October 29

  • 23:21 domas: lol repaired myisam tables on db9, call if data has been lost, hehe
  • 22:58 domas: resynced srv154, was running with months old configuration/code.
  • 22:58 domas: was db22 disabled silently by someone? or not reenabled? :) reenabled now...
  • 22:55 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 18:33 apergos: restarted torrus on streber, after reports that it was not responding
  • 17:46 apergos: domas ran "reset-mysql-slave db18" (from fenari) which clears out *all* old relay logs, and restarts the slaves.
  • 17:34 apergos: removed some old relay logs from /a/sqldata on db18 to get space back, it was at 95%
  • 15:22 RoanKattouw: Followers on Twitter: view missing entries between Sep 2 and today at http://identi.ca/wikimediatech
  • 15:22 RoanKattouw: Re-established identi.ca->Twitter bridge for wikimediatech, broken since September 2
  • 15:21 RobH: repaired the sessions table, rt is now happy
  • 15:09 RobH: rt is being odd, looking into it

October 28

  • 21:34 RobH: powercycled sq69, ran puppet, its back online
  • 21:24 RobH: sq69 is borked, powercycling
  • 17:51 Ryan_Lane: running checksetup.pl on kaulen for bugzilla
  • 17:50 Ryan_Lane: running mysqlcheck --autorepair on bugzilla database on db9 for the bug_fulltext table
  • 16:26 atglenn: as soon as someone is available we need to repair a table for bugzilla (probably due to db9 restart): Table './bugzilla3/bugs_fulltext' is marked as crashed and should be repaired [for Statement "INSERT INTO bugs_fulltext (bug_id, short_desc)
  • 15:23 atglenn: reenabled logging for fundraising on locke
  • 14:50 atglenn: I see a lot of lot of ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO) after reboot of db9... not awake enough to try to look at it; services seem to be running ok
  • 14:46 atglenn: powercycled db9, it was unreachable by ssh, ganglia showed load and wait_cpu through the roof
  • 14:46 RoanKattouw: db9 back after having been powercycled by Ariel
  • 14:18 RoanKattouw: db9 down. Responds to ping but doesn't respond to anything else

October 27

  • 18:32 mark: Shutdown yongle for decommissioning
  • 17:49 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25668 - Allow bureaucrats to de-sysop and de-bureaucrat on fiwikimedia'
  • 17:43 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25659 - Activation of Special'
  • 17:41 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25659 - Activation of Special'
  • 17:29 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25652 - Allow importing from frwiki on arzwiki'
  • 11:06 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Set $wgOldChangeTagsIndex=false on fawikinews'

October 26

  • 22:34 RobH: removed borked tenwiki deployments entirely from cluster, time to start over
  • 18:33 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'readding ms1'
  • 18:04 RobH: dropping out all the tenwikipedia stuff again, internet hates this wiki
  • 18:00 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'rob makes typos some more'
  • 17:59 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'rob makes typos'
  • 17:30 RoanKattouw: Removed 10wikipedia from all.dblist and ran sync-dblist
  • 17:08 logmsgbot: robh synchronized php-1.5/wmf-config/mc.php 'sync cuz i said so'
  • 17:03 RobH: kicking srv183 around
  • 16:56 logmsgbot: robh synchronized php-1.5/wmf-config/mc.php 'srv183 fell over'
  • 16:51 logmsgbot: robh ran sync-common-all
  • 16:26 RobH: sync-docroot run, nothing blew up, yay
  • 14:34 apergos: powercycled mobile3. hey in case anyone ever needs to know this again, mobile3 is a dell r410 with the new drac 6.
  • 12:47 hcatlin: mobile3 is not responding to anything. needs restart. attempting to contact ops.

October 25

  • 21:50 Ryan_Lane: pdns on ns0 not responding, restarting
  • 21:47 Ryan_Lane: fixing test-payments.tesla entry in dns
  • 19:09 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25604 - Project namespace creation on frwiktionary'
  • 19:05 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25627 - Please create various namespaces for fawikinews'
  • 18:54 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '22283 - Add "Author" namespace to Telugu Wikisource'
  • 18:47 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25635 - Please set the Crimean Tatar Wikipedia logo'
  • 18:44 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25636 - Abuse filter in Persian wikinews'
  • 18:42 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25576 - Install AbuseFilter on simplewiki'
  • 18:34 logmsgbot: jeluf ran sync-common-all
  • 18:32 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php 'bug 25634 - Please install flagged revisions on fawikinews'
  • 18:30 logmsgbot: jeluf synchronized flaggedrevs.dblist
  • 16:09 apergos: after restart of mysqld on db9, it now accepts requests from singer, blogs appear to be back
  • 15:41 apergos: powercycled singer, it was unresponsive to ssh, huge cpu spikes

October 24

  • 20:44 mark: Shutting down mayflower for decommissioning
  • 20:40 mark: Changed root password on lily, iris, ragweed, mayflower and sage
  • 20:37 mark: Changed root password on browne and srv2
  • 19:31 RobH: installed python-openssl on spence for testing
  • 19:11 tomaszf: removing downtime with init script on loudon
  • 18:21 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 18:20 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 17:44 Ryan_Lane: reinstalling ersch for openstack testing
  • 17:29 mark: Changed root password hash in puppet; being deployed to all systems in the next 30-60 minutes
  • 17:21 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 16:56 Ryan_Lane: reinstalling alsted, to try stable openstack from scratch
  • 16:43 Ryan_Lane: rebooting ersch and alsted
  • 14:26 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 14:25 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 12:34 mark: Restarted varnish on knsq5
  • 12:33 mark: Restarted varnish on knsq1
  • 12:31 mark: Powercycled knsq4
  • 12:31 mark: Restarted varnish on knsq2
  • 12:31 mark: Removed puppet lock on knsq2

October 23

  • 20:48 domas: powercycled srv154, saw mgmt network problems yet again
  • 20:30 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Turning of CentralNotice for qualitywiki'
  • 18:32 RobH: modifying the sync-common-all to take out logging while I am futzing with it.
  • 18:27 logmsgbot: root ran sync-common-all
  • 18:24 logmsgbot: root ran sync-common-all
  • 18:21 logmsgbot: root ran sync-common-all
  • 18:19 logmsgbot: root ran sync-common-all
  • 18:17 logmsgbot: root ran sync-common-all
  • 18:13 domas: increased maxclients 10->20 on scalers
  • 18:10 logmsgbot: root ran sync-common-all
  • 17:37 Ryan_Lane: updating the dns entry for svn.wikimedia.org
  • 17:37 Ryan_Lane: rsynd'd /svnroot to formey
  • 17:37 Ryan_Lane: remounted /svnroot ro on mayflower
  • 17:37 Ryan_Lane: moving svn from mayflower to formey
  • 17:03 domas: restarted apaches that failed to start after apt-get upgrade :(
  • 17:01 logmsgbot: root ran sync-common-all
  • 16:58 RobH: working on rsync settings and running crap as root for testing, disregard
  • 16:57 tomaszf: upgrading activemq to 5.4.1 on erzurumi
  • 16:51 logmsgbot: tstarling synchronized php-1.5/includes/memcached-client.php 'profiling memcached compression'
  • 16:48 logmsgbot: tstarling synchronized php-1.5/includes/memcached-client.php 'profiling memcached compression'
  • 16:46 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/backend/RepoStats.php 'r75268'
  • 16:36 logmsgbot: root ran sync-common-all
  • 16:33 domas: upgrading scalers to new imagemagick and whatever is pending
  • 16:20 RobH: having to run the sync as root to fix permissions with roan, not normal use
  • 16:19 logmsgbot: root ran sync-common-all
  • 16:10 logmsgbot: root ran sync-common-all
  • 15:22 mark: Deployed wikimedia.vcl (varnish) which fixes a memleak in the geoip code
  • 14:27 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/ui/CodeRevisionView.php 'r75239'
  • 14:09 Ryan_Lane: updated nova-* on ersch and alsted for stable release - rebooted
  • 11:15 mark: Unstuck srv220 and srv222

October 22

  • 17:25 RobH: at this time no issues are apparent and site access appears normal again.
  • 17:25 RobH: Reported issues with accessing cluster hitting esams, possible network issues, further investigation shows no server errors. If network links continue to cause issues, we may redirect traffic stateside
  • 02:36 apergos: shot some hung converts on the image scalers, that should hold us for awhile

October 21

  • 21:10 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Whitelisting new landing pages'
  • 20:37 Ryan_Lane: adding account for jdavis on mchenry and sanger
  • 18:31 mark: Powercycled sq67 and sq68
  • 18:10 mark: Unstuck srv219, 223, 224
  • 18:05 AaronSchulz: ran runBatchedQuery.php to finish enwiki rename 'Voice of All' => 'Aaron Schulz' on archive/revision tables
  • 17:58 Ryan_Lane|food: enabled eth0 interface on test-payments, and changed the IP to the new IP
  • 17:56 RobH: added dns entry for test-payments on prototype server
  • 16:47 logmsgbot: catrope synchronized php-1.5/extensions/WikimediaMessages/WikimediaLicenseTexts.i18n.php 'r75158'
  • 15:26 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 15:14 RoanKattouw: Commented out srv154 in /etc/dsh/groups/mediawiki-installation because it's down and blocking sync-file
  • 15:13 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/WikiEditor/WikiEditor.combined.min.js 'r75156'
  • 01:02 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Readding payments sidebar links after outage'

October 20

  • 22:03 mark: Removed squid.conf log line sending packets to yaseo
  • 21:53 RobH: tired of misdated cron emails from dataset1, fixed the date again, it will go off date on its own eventually
  • 21:52 Ryan_Lane: restarting opendj on nfs1
  • 21:23 Ryan_Lane: adding nis indexes on sanger, and restarting the ldap server
  • 21:22 Ryan_Lane: updating opendj on sanger
  • 21:22 Ryan_Lane: updating opendj on nfs1
  • 21:21 Ryan_Lane: restarting the ldap servers one at a time
  • 21:21 Ryan_Lane: creating nis indexes in the ldap server on nfs1 and nfs2
  • 21:09 Ryan_Lane: upgrading opendj on nfs2
  • 21:05 Ryan_Lane: updating opendj package in repository to opendj_2.4.0-7
  • 20:40 logmsgbot: laner synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disabling local uploads for outreachwiki, and enabling subpages on namespace 0'
  • 20:19 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php 'Bug 24304 - Reconfigure English Wikibooks'
  • 18:10 mark: Punched hole in loudon firewall for squid stats from streber
  • 17:46 Ryan_Lane: setting svn.wikimedia.org to 60 seconds ttl in dns
  • 17:25 mark: Added loudon to torrus (CDN/Squid)
  • 17:24 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'taking outage on payments server'

October 19

  • 21:36 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Adding new Appeal pages'
  • 20:52 Ryan_Lane: restarting varnish on all bits
  • 17:12 JeLuF: uninstalled broken php5-xdebug package on srv181
  • 15:23 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php
  • 15:22 logmsgbot: catrope synchronized php-1.5/extensions/CentralNotice/SpecialBannerController.php
  • 15:08 JeLuF: killed old convert jobs on srv100
  • 14:17 JeLuF: stopped job runner on srv 258-289, checking impact on ext stores
  • 10:59 JeLuF: added srv258-289 as job runners. We now have 69 active job runners
  • 10:34 JeLuF: added srv181-190 as job runners. We now have 38 active job runners
  • 06:44 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Turning central notice off for closed advisory wki'

October 18

  • 20:10 Ryan_Lane: adding ldap management scripts to formey via puppet class ldap::client::sdk
  • 15:44 JeLuF: Set default charset for http://noc.wikimedia.org/conf/ to utf-8
  • 14:52 JeLuF: killed stuck runJobs.php tasks on srv{150-170}
  • 14:36 logmsgbot: jeluf synchronized php-1.5/wmf-config/db.php 'removed db28 from s5'
  • 14:32 logmsgbot: jeluf synchronized php-1.5/wmf-config/db.php 'removed db13 from s2'
  • 13:44 RoanKattouw: Removed trailing ?> from mc.php, InitialiseSettings.php, lucene.php, wgConf.php, secure.php, checkers.php
  • 06:44 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25557 - Index namespace on Telugu Wikisource'
  • 06:41 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25554 - Change the "upload file" link on jawiki'

October 17

  • 22:06 apergos: edited /etc/apache2/envvars in place and set APACHE_ULIMIT_MAX_FILES to 2000, hoping that will address the "too many open files" issue
  • 21:29 hcatlin: mobile3 is having file limit issues. Turned off apache so that is de-pooled. Issue description sent to Ariel for RT submission.
  • 18:17 mark: Restarted all varnish processes on sq67-70
  • 18:10 mark: authdns-scenario normal
  • 18:07 apergos: stopped and restarted varnish on knsq1, 4... loads back to normal there now
  • 18:05 mark: Powercycled sq69
  • 18:00 Ryan_Lane: restarting varnish on esams bits
  • 17:50 JeLuF: Setting scenario to esams-down.
  • 17:47 JeLuF: restarted varnish on knsq1, load average was 5149.87
  • 15:39 logmsgbot: jeluf ran sync-common-all 'Bug 25056 - Create new wiki for Persian Wikinews'
  • 15:15 logmsgbot: jeluf ran sync-common-all 'Bug 25130 - Create Wikipedia in Komi-Permyak'
  • 14:42 logmsgbot: jeluf ran sync-common-all 'Bug 25307 - Create Wikipedia in Banjar'
  • 14:03 logmsgbot: jeluf ran sync-common-all 'Bug 25544 - Create a new Wikipedia in Hill Mari'
  • 12:59 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25345 - outreachwiki import sources'

October 16

  • 16:44 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/ui/CodeRevisionView.php 'r74785'
  • 16:43 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 16:42 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/js/plugins.combined.min.js 'r74788'

October 15

  • 20:30 RoanKattouw: Ran a DB query to set status of 'new' pre-r50000 revs to 'old'. 8,327 revs affected
  • 00:50 Ryan_Lane: add ldap::client::nss configuration to formey, so it can see ldap users
  • 00:48 Ryan_Lane: add all mayflower users and keys to wmf ldap
  • 00:47 Ryan_Lane: add sudo and openssh-lpk schema to nfs1 and nfs2 ldap servers

October 14

  • 20:36 Ryan_Lane: also added rkattouw to the engineering alias
  • 20:34 Ryan_Lane: adding ^demon and myself to the engineering alias
  • 20:00 Ryan_Lane: adding ldap::client::wmf-cluster include to formey in puppet. Adding openldap client as well.
  • 19:59 Ryan_Lane: enabling opendj replication on nfs1 and nfs2
  • 18:01 hcatlin: mobiles are acting up. keep an eye on them!
  • 17:20 mark: Fixed mobile by restarting apache on mobile2
  • 17:15 hcatlin: LVS is down for mobile... not sure why
  • 17:05 hcatlin: mobile deploy is complete!
  • 15:52 hcatlin: deploying updates to mobile cluster
  • 13:13 mark: Deployed logrotate file for squid-frontend
  • 12:34 mark: Fixed gmond on sq61
  • 12:31 mark: restarted scaling on srv222

October 13

  • 23:20 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'White listing new Appeal pages'
  • 22:55 Ryan_Lane: added ldap::client::trust and ldap::client::openldap classes to nfs1 and nfs2, so that they can manage their own ldap servers
  • 19:51 Ryan_Lane: commented out #monitor_service { "disk space":... line in misc::nfs-server::home::monitoring class in misc-servers.pp; was already defined in base.pp.
  • 19:36 Ryan_Lane: installing opendj on nfs1 and nfs2
  • 16:20 Ryan_Lane: reinstalling opendj and java on sanger. recreating the opendj instance.
  • 16:12 Ryan_Lane: fixed sanger's ip address in /etc/hosts
  • 14:02 mark: Killed ircd on browne - hopefully people are connecting to irc.wikimedia.org :P
  • 14:01 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Move UDP IRC RC stream from browne to ekrem'
  • 10:45 mark: Powercycled sq68
  • 10:43 logmsgbot: mark synchronized php-1.5/wmf-config/db.php 'Depooling db19, ECC errors'
  • 10:42 logmsgbot: mark synchronized php-1.5/wmf-config/db.php 'Repooling db15, db18, db19'

October 12

  • 21:16 Ryan_Lane: make that starting opendj instance on sanger
  • 21:15 Ryan_Lane: starting opends instance on sanger
  • 21:15 Ryan_Lane: created opendj instance on sanger
  • 19:38 Ryan_Lane: adding new package version of opendj to repository
  • 19:14 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Updating white list of banners'
  • 18:19 RobH: db19 memory error, rebooting and will work on it later
  • 18:15 RobH: mobile2 moved and back online
  • 17:22 RobH: leaving pdf2 for now, just moble2 (rack needs to be wired)
  • 17:22 RobH: mobile2 coming down for relocation pdf2 coming down for relocation
  • 15:59 Ryan_Lane: updating opendj package in repository to a hardy built version
  • 15:45 RobH: mysql restarted on db15
  • 15:44 RobH: restarted mysql on db18 db19
  • 15:39 RobH: db15 db18 db19 moved to b2-sdtpa wired and booting.
  • 15:36 Ryan_Lane: with puppet
  • 15:36 Ryan_Lane: installing opendj on sanger
  • 15:27 Ryan_Lane: adding opendj package to repository
  • 15:07 RobH: db15 db18 db19 commented out of db.php and coming down for relocation.
  • 15:07 logmsgbot: robh synchronized php-1.5/wmf-config/db.php 'commented out db15 db18 db19 for their relocation'
  • 14:43 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/api/ApiCodeRevisions.php 'r74661'
  • 14:41 RobH: db22 had bad fan, fan replaced, opening case with oracle for replacement of the bad fan (used on site spare)
  • 14:36 RobH: db22 unexpected power loss due to power cables binding when checking fan failure, is s4 slave
  • 13:46 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'r74655'
  • 13:46 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/js/plugins.combined.min.js 'r74655'
  • 13:00 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/ui/CodeRevisionListView.php 'r74654'
  • 01:22 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'setting outreach wiki notice name'
  • 01:21 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'setting outreach wiki notice name'

October 11

  • 19:07 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Whitelisting banners for thursday test'
  • 17:36 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/ui/CodeAuthorListView.php 'r74641'
  • 17:13 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/ui/CodeCommentsListView.php 'r74636'
  • 17:08 logmsgbot: catrope synchronized php-1.5/languages/Language.php 'r74634'
  • 16:58 logmsgbot: catrope synchronizing Wikimedia installation... Revision: 74633
  • 16:57 RoanKattouw: Running scap to deploy CodeReview update (r74630, r74633)
  • 07:46 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix for jQuery 1.4.2 upgrade'
  • 07:45 logmsgbot: catrope synchronized php-1.5/skins/common/jquery.min.js 'jQuery 1.4.2 upgrade'
  • 07:45 logmsgbot: catrope synchronized php-1.5/skins/common/jquery.js 'jQuery 1.4.2 upgrade'
  • 00:19 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Re-adding donate sidebar links'

October 10

  • 19:51 mark: Put European load back on esams
  • 19:41 mark: Restarted Varnish on knsq4
  • 19:26 mark: Restarted pdns on nescio
  • 19:26 mark: Temporarily moved european traffic to pmtpa to stabilize esams
  • 18:49 mark: Set backup static routes to LVS service IPs on csw5-pmtpa
  • 18:31 mark: Restarted PyBal on lvs3, its BGP session had no prefixes announced
  • 18:26 Ryan_Lane: repooling srv100
  • 18:25 Ryan_Lane: killing convert and apache processes on the srv219-224. restarting apache on them.
  • 18:15 Ryan_Lane|away: depooling srv100

October 9

  • 19:25 Ryan_Lane|away: killed convert ulimit4 and apache on srv 100 and 219. restarted apache

October 8

  • 22:36 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Taking outage on payments'
  • 02:51 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionTracking/ContributionTracking_body.php 'Updating fall back returnto'

October 7

  • 21:53 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'white listing new banners'
  • 19:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25332 - Logo update for Azerbaijani Wikibooks'
  • 19:55 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25323 - Logo for Azerbaijani Wikisource'
  • 19:43 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25218 - please update the wiktionary logo of mg.wiktionary.org'
  • 19:38 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25333 - Portal namespace for Azerbaijani Wikipedia'
  • 19:30 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php
  • 19:29 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25422 - Set $wgDisableHardRedirects = false; on wikimediafoundation.org'
  • 19:15 logmsgbot: jeluf ran sync-common-all
  • 19:07 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25423 - AbuseFilter for the Portuguese Wiktionary'
  • 19:05 logmsgbot: jeluf synchronized php-1.5/wmf-config/abusefilter.php
  • 18:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25420 - Enabling $wgBlockAllowsUTEdit for Persian Wikipedia'
  • 17:56 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionTracking/ContributionTracking_body.php 'Picking up 1click tracking changes'
  • 00:49 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialBannerController.php 'Fixing for anonymous views'

October 6

  • 20:46 Ryan_Lane: killing convert, apache2, and ulimit4.sh on srv219-224; restarting apache on them
  • 20:26 Ryan_Lane: applying memcached::disabled class to srv154, srv155, srv157, srv158, and srv170
  • 20:26 Ryan_Lane: added new memcached::disabled class to puppet
  • 20:04 Ryan_Lane: restarted varnish on all bits servers to reset memory
  • 18:12 Ryan_Lane: rebuilding alsted for openstack
  • 18:03 Ryan_Lane: rebuilding ersch, trying openstack again
  • 16:43 mark: Moved irc.wikimedia.org CNAME from browne to ekrem
  • 16:09 mark: Setup modified ratbox ircd on server ekrem, and linked it to the ircd on browne
  • 15:06 Ryan_Lane: relabeled slot1 for Wikimedia-Daily tapes on tridge to fix backup errors
  • 10:27 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 00:35 logmsgbot: midom synchronized php-1.5/wmf-config/db.php

October 5

  • 23:19 RobH: lvs1 moved to b1-sdtpa, no network only mgmt
  • 23:15 RobH: db16 isnt in cluster, no mysql start, db20 mysql isnt started
  • 23:12 RobH: mysql restarted db17
  • 23:12 RobH: db16, db17, db20 in new rack locations.
  • 21:44 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeTemplate.php 'Picking up fixes for geo'
  • 21:43 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up fixes for geo'
  • 21:42 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialBannerListLoader.php 'Picking up fixes for geo'
  • 21:41 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/centralnotice.js 'Picking up fixes for geo'
  • 21:40 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.db.php 'Picking up fixes for geo'
  • 21:38 RobH: db20 wants to go with db16/17, its shutting down too
  • 21:34 logmsgbot: robh synchronized php-1.5/wmf-config/db.php 'commenting out db17'
  • 21:31 RobH: db16 and db17 going offline to move
  • 21:27 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.i18n.php 'Picking up fixes for geo'
  • 21:08 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 21:01 RobH: db11 was already moved, opps, rebooting.
  • 20:50 logmsgbot: tfinc synchronized php-1.5/skins/common/images/closewindow.png 'Picking up new close button image'
  • 20:41 RobH: db11 and db14 also moving
  • 20:41 logmsgbot: robh synchronized php-1.5/wmf-config/db.php 'once more with my actual changes'
  • 20:39 logmsgbot: robh synchronized php-1.5/wmf-config/db.php 'if this doesnt work its not my fault'
  • 20:34 RobH: moving db13 & db14, they are shutting down for a short duration
  • 18:17 RobH: db11 shutting down
  • 18:05 RobH: storage1 back online
  • 17:35 mark: Setup ports 3/35 to 3/44 for db11 to db20 respectively
  • 17:31 domas: s3 switched to db27-bin.001:79
  • 17:31 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 17:31 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 17:28 RobH: moved dataset1 network connection, all good
  • 16:32 RobH: dataset on the brain, storage1/2
  • 16:32 RobH: dataset1 shutting down for relocation into b1-sdtpa, dataset2 already offline, moving as well
  • 16:15 logmsgbot: catrope synchronized php-1.5/wmf-config/checkers.php 'Remove no-UA exemption for yongle'
  • 16:14 mark: Setup csw1-sdtpa:11/2 for amaranth
  • 16:12 mark: Setup csw1-sdtpa:11/1 for dataset1
  • 16:06 domas: switched commons master to 5.1, pos: db31-bin.000001:106
  • 16:06 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 16:06 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 16:02 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 16:02 logmsgbot: catrope synchronized php-1.5/extensions/ArticleAssessmentPilot/js/ArticleAssessment.combined.min.js 'r74312'
  • 15:52 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'db27 back'
  • 15:52 logmsgbot: catrope synchronized php-1.5/extensions/ArticleAssessmentPilot/js/ArticleAssessment.combined.min.js 'r74310'
  • 15:33 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'db31 in, db27 out'
  • 15:25 domas: fixed space on srv178, conf changes made apparmor audit log blow up
  • 15:02 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'db31 mnt'
  • 15:00 mark: Restarted Apache on singer
  • 14:55 mark: Migrated the Apple Dictionary bridge from yongle to ekrem, and moved the search.wikimedia.org CNAME accordingly
  • 14:51 mark: ekrem reboot test
  • 13:37 mark: Had puppet install php5-curl on ekrem
  • 13:29 mark: Migrated the old mobile (WAP) site from yongle to ekrem and updated the Squid config
  • 11:56 mark: Installed Lucid on new server ekrem
  • 11:23 mark: Reused public ip address of isidore for new server ekrem
  • 10:53 mark: Reinstalled knsq1 as bits varnish server and had it join the esams bits cluster
  • 09:34 mark: Removed AS43821 outbound double prepend on BGP session with AS13030
  • 09:30 mark: Moving European bits traffic back to the European cluster

October 4

  • 19:52 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 19:52 logmsgbot: catrope synchronized php-1.5/extensions/ArticleAssessmentPilot/js/ArticleAssessment.combined.min.js 'r74268'
  • 19:51 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/SpecialPrefSwitch.php 'r74268'
  • 11:57 mark: Stopped PyBal on amslvs4 as a test
  • 11:47 logmsgbot: mark synchronized php-1.5/wmf-config/mc.php 'Replace srv87'
  • 10:53 mark: Setup BGP state monitoring for csw5-pmtpa and csw1-sdtpa as well
  • 10:48 mark: Setup BGP state monitoring for csw1-esams and csw2-esams
  • 10:36 mark: Restarted Apache on srv223
  • 10:24 mark: Fixed Nagios by removing db34 from the listed hosts section in conf.php
  • 09:47 mark: Downpreffed AS13030 transit to local-pref 90 on br1-knams

October 3

  • 19:25 logmsgbot: catrope synchronized php-1.5/wmf-config/checkers.php
  • 18:59 logmsgbot: catrope synchronized php-1.5/wmf-config/checkers.php 'Block search abuse reported by Robert'
  • 17:43 mark: Restarted apache on srv219-221, 224
  • 17:38 mark: Stopped IPVS backup sync daemon on amslvs3
  • 17:37 mark: Rebooting amslvs1
  • 17:37 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 17:32 mark: Stopped PyBal on amslvs1
  • 17:30 mark: Stopped IPVS backup sync daemon on amslvs4
  • 17:27 mark: Rebooting amslvs2
  • 17:19 mark: Stopped PyBal on amslvs2
  • 17:14 mark: Drive failed in thistle
  • 17:09 mark: Started temporary LVS state syncing between amslvs1->amslvs3 and amslvs2->amslvs4, preparing for reboot of amslvs1-2
  • 16:51 mark: Rebooted (backup) LVS servers amslvs3 and amslvs4
  • 16:36 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'adding db28 to s5'
  • 16:20 mark: Selectively updated pybal/bgp.py on lvs2-4 and restarted PyBal
  • 16:13 mark: Upgraded PyBal to r0.1+74215 on amslvs1-4
  • 14:27 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 14:12 domas: db22 was first wikimedia slave initialized using xtrabackup, hehe
  • 14:08 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'adding db22 as commons 5.1 slave'
  • 14:06 mark: Restored nameservers order for mobile serverswq
  • 13:36 mark: lvs3 service ips were unreachable from outside the subnet for some reason. Restarting PyBal (and therefore the BGP session to the router) fixed it
  • 13:22 mark: Started pdns recursor on dobson again
  • 13:19 mark: Once again stopped pnds-recursor on dobson
  • 13:18 mark: Restarted Apache on the mobile servers one by one, to put the new resolv.conf in effect
  • 13:10 mark: Swapped nameservers for the mobile servers, so they don't crash the mobile site while I reinstall the primary recursor
  • 13:02 logmsgbot: catrope synchronized php-1.5/wmf-config/checkers.php 'Log no-UA requests for exempt IPs to /h/w/l/nouaexempt.log'
  • 12:45 mark: Mobile site broke with missing primary DNS recursor, restarted it
  • 12:44 mark: Stopped DNS recursor on dobson again
  • 12:20 mark: Installed DNS recursors on all LVS servers to reduce SPOFfiness
  • 11:46 domas: doing some DB maintenance
  • 11:44 mark: Installed nscd on lvs4 and restarted PyBal
  • 11:30 mark: Restarted pybal on lvs4
  • 11:25 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 11:21 mark: Shutdown primary pmtpa DNS recursor (dobson)
  • 11:20 logmsgbot: catrope synchronized php-1.5/skins/common/wikibits.js 'r74198'

October 2

  • 17:01 apergos: we were hoping that a reboot of dataset1 would clear up the ipv6 + lighty issue that it had after os upgrade; no dice. this needs further investiagtion. ipv6 left disabled in lighty conf for now.
  • 16:43 RobH: dataset1 relocation and drive work done, system is back online
  • 16:20 RobH: dataset1 moved, boots back online, still working on its degraded disk replacement
  • 15:55 RobH: rebooted scs-a1-sdtpa since its web interface isnt accessible, not sure if its the device or the network
  • 15:06 RobH: dataset1 downtime window starts now

October 1

  • 20:03 RoanKattouw: Batch-deleted ~1200 categories on cswiktionary, requested by Danny B
  • 15:54 mark: Restarted apache on srv219-223, and killed stuck convert processes
  • 13:20 mark: Removed unix accounts brion and fvassard (with puppet)

September 30

  • 23:14 Ryan_Lane: deleting /srv-old on prototype
  • 23:11 Ryan_Lane: added a new 50G disk to prototype. rsync'd data from /srv to new mount. moved /srv to /srv-old. mounted new drive on /srv. rsync'd data from /srv-old to /srv.
  • 21:27 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'White listing new banners'
  • 17:13 awjr: upgraded kernel on grosley.wikimedia.org to 2.6.24-28
  • 00:57 Ryan_Lane: added mail alias ryan -> rlane
  • 00:55 Ryan_Lane: added mail alias carrie -> csmith on mchenry

September 29

  • 22:59 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Turning CN back on as default'
  • 22:55 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up fix for db lookup'
  • 22:55 Ryan_Lane: rebooting ersch and alsted
  • 22:54 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Turning test wikipedia to non infrastructure mode'
  • 22:41 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'disabling cn'
  • 22:39 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up fixes from 73993'
  • 22:38 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialBannerController.php 'Picking up fixes from 73993'
  • 22:36 logmsgbot: tfinc synchronized php-1.5/includes/Skin.php 'Picking up fixes from 73993'
  • 21:31 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialBannerController.php 'Picking up fix for message translation'
  • 21:29 logmsgbot: tfinc synchronized php-1.5/includes/MessageCache.php 'Picking up fix for message translation'
  • 21:28 logmsgbot: tfinc synchronized php-1.5/includes/parser/CoreParserFunctions.php 'Picking up fix for message translation'
  • 21:27 logmsgbot: tfinc synchronized php-1.5/includes/parser/ParserOptions.php 'Picking up fix for message translation'
  • 19:21 mark: Permanently doubled sys.net.ipv4.tcp_max_tw_buckets to 360000, as the Varnish systems are constantly close to the limit (visible in /proc/net/sockstat)
  • 19:04 mark: ...on all squid and varnish servers, in puppet sysctl.conf
  • 19:03 mark: Increasing net.ipv4.tcp_max_syn_backlog = 262144
  • 18:57 mark: Experimentally doubled sys.net.ipv4.tcp_max_tw_buckets to 360000 on sq70 (manually, so not in sysctl.conf)
  • 16:45 Ryan_Lane: make that the *new* tesla cluster subnet
  • 16:44 Ryan_Lane: updating dhcpd.conf on brewster to include the tesla subnet (153.192)
  • 16:29 mark: Setup IP helper-address to brewster for vlan 103 / ve 7 on csw5-pmtpa
  • 02:31 Ryan_Lane: unmounted old data partition and replaced it with the newer, larger data partition and updated the fstab on tridge.
  • 00:51 Ryan_Lane: destroyed resized data1 logical volume, and recreated the jfs filesystem on tridge. copying the data from /data to new partition; will remount data as new partition and update fstab
  • 00:49 awjr: Upgraded kernel on CC server to 2.6.32-24
  • 00:49 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Re-enabling CC server on variable page post kernel upgrade'
  • 00:49 Ryan_Lane: install kvm-pxe on alsted
  • 00:49 Ryan_Lane: configured eth1 with vlan tag 103, and added it to a new bridge, br103 on alsted and ersch
  • 00:36 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Taking outage on CC server for kernel upgrade'

September 28

  • 21:57 Ryan_Lane: changing testvm1 to tesla namespace. adding new origin for tesla
  • 21:52 Ryan_Lane: adding testvm1.tesla to dns
  • 19:58 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Switching from landing page 9 to 1 for donate sidebar link'
  • 02:00 Ryan_Lane: added puppet vim syntax highlighting to root's .vim directory on sockpuppet
  • 01:17 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enaling centralnotice for all projects' Project Page
  • 01:12 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enaling dewiki for centralnotice'
  • 01:05 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Re-enaling cswiki for centralnotice'
  • 01:03 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up autoload fix'
  • 00:57 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up db class fix'
  • 00:51 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialBannerListLoader.php 'Picking up db class fix'
  • 00:51 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up db class fix'
  • 00:44 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'picking up new cc form variants for thursday fundraiser'
  • 00:35 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialBannerListLoader.php 'Turning on debugging for CN'
  • 00:27 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Turning cs wiki centralnotice off'
  • 00:10 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Turning central notice back on for en wikiversity'

September 27

  • 23:38 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Turning central notice back on for cs wiki'
  • 23:03 logmsgbot: tfinc synchronizing Wikimedia installation... Revision: 73744
  • 20:23 Ryan_Lane: updated puppet configuration to require php version 5.2.4-2ubuntu5.12wm1 for all apaches
  • 20:21 Ryan_Lane: testing updated puppet config for php on srv212
  • 20:18 mark: Changed MIME type for geoiplookup.wikimedia.org in wikimedia.vcl (using puppet)
  • 20:17 Ryan_Lane: updated puppet configuration to update php on all apaches
  • 20:12 mark: Powercycled sq67
  • 20:06 Ryan_Lane: testing updated php packages deployed from puppet on srv 211
  • 20:05 Ryan_Lane: updating wikis on prototype
  • 19:36 Ryan_Lane: reinstalled wikimedia-task-appserver on srv210
  • 19:27 Ryan_Lane: testing updated php from repository on srv210
  • 18:45 Ryan_Lane: pushing updated php to the repository

September 26

  • 21:13 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'disabled wgFlaggedRevsComments on hewikisource'
  • 17:47 mark: Sending esams bits traffic to bits.pmtpa
  • 17:28 mark: Sending esams bits traffic to text.esams
  • 17:28 mark: Sending esams bits traffic to bits.esams
  • 17:18 RobH: aware of bits issues, looking into them
  • 07:40 apergos: (25/09/2010 10:17:43 UTC) restarted apache on mobile3. hope that holds us for awhile. lvs was depooling alternately mobile3 and then mobile2
  • 00:42 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Re-adding cc donate sidebar links'

September 25

  • 19:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 're-deployment'
  • 15:56 apergos: restarted apache on mobile2, it had fallen over
  • 11:41 logmsgbot: catrope synchronized php-1.5/extensions/ArticleAssessmentPilot/api/ApiArticleAssessment.php 'r73744'

September 24

  • 23:19 Ryan_Lane: rebuilding ersch for open nebula testing
  • 22:53 rainman-sr: disk was full on searchidx1, cleared up some disc space by deleting some old backup indexes
  • 22:42 RobH: bayes back online
  • 22:40 Ryan_Lane: enabling https on owa1 and owa2
  • 22:39 Ryan_Lane: changing lvs scheduler on lvs1.tesla.usability.wikimedia.org to use wlc, since session stickyness is no longer needed
  • 22:28 Ryan_Lane: added egencia to email whitelist on mchenry
  • 22:15 RobH: updating firmware on bayes, leave it alone
  • 22:07 Ryan_Lane: starting testing of open nebula on alsted
  • 22:06 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Removing payments links for outage on loudon'
  • 21:32 JeLuF: restarted apache on srv211, was non-responsive.
  • 20:48 Ryan_Lane: installed php5 packages on test wikipedia, restarting apache
  • 20:41 Ryan_Lane: testing php5_5.2.4-2ubuntu5.12wm1 packages on test.wikipedia.org
  • 19:58 RobH: bayes is goin down for rack relocation, weeeeee
  • 18:58 Ryan_Lane: installing ganglia on loudon
  • 14:45 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 14:45 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/PrefSwitch.js 'r73655'
  • 14:44 logmsgbot: catrope synchronized php-1.5/extensions/ArticleAssessmentPilot/ArticleAssessmentPilot.hooks.php 'r73658'
  • 14:44 logmsgbot: catrope synchronized php-1.5/extensions/ArticleAssessmentPilot/ArticleAssessmentPilot.php 'r73658'
  • 14:44 logmsgbot: catrope synchronized php-1.5/extensions/ArticleAssessmentPilot/js/ArticleAssessment.combined.min.js 'r73658'
  • 14:42 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Add settings for r73658; will work once that's deployed'
  • 14:08 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Temp fix for missing jQuery UI CSS in ArticleAssessment feedback dialog. Proper software-side fix underway'
  • 12:17 mark: Doing another reboot test of streber
  • 12:11 mark: Started lighttpd on streber
  • 12:10 mark: Ran apt-get upgrade && reboot on linne
  • 12:05 mark: Rebooting streber for kernel upgrade

September 23

  • 23:41 Ryan_Lane: upgraded drupal on prototype.wikimedia.org
  • 22:49 Ryan_Lane: started profile-collector on spence... properly this time
  • 22:38 Ryan_Lane: started profile-collector on spence
  • 22:24 Ryan_Lane: added cgi module to apache config on fenari so that report.py will run
  • 21:29 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Whitelisting banners for thursday test'
  • 21:12 Ryan_Lane: fixed apache configuration on noc. highlight.php is working again.
  • 20:57 tomaszf: ran sync-docroot to pick up new version of bannerImpression.php r46 fixing firefox mime type issues
  • 20:33 mark: Deployed ^/wiki/Special:Banner(Controller|ListLoader) exemption from header_replace Cache-Control on the text frontend squids (ACL wpstatic)
  • 20:25 logmsgbot: tfinc synchronized php-1.5/extensions/VariablePage/VariablePage.i18n.php 'Picking up fixes for tracking'
  • 20:24 logmsgbot: tfinc synchronized php-1.5/extensions/VariablePage/VariablePage.alias.php 'Picking up fixes for tracking'
  • 20:24 logmsgbot: tfinc synchronized php-1.5/extensions/VariablePage/VariablePage.body.php 'Picking up fixes for tracking'
  • 20:23 logmsgbot: tfinc synchronized php-1.5/extensions/VariablePage/VariablePage.php 'Picking up fixes for tracking'
  • 20:21 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Adding new config option in prep for variable page push'
  • 20:05 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25180 - upload disabled on zhwikinews'
  • 20:03 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25142 - Set $wgGroupPermissions['*']['createpage'] = true; zhwiki (Allowed IP createpage)'
  • 20:00 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25266 - Please set the Tatar Wiktionary logo'
  • 19:25 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '17627 - Allow autoconfirmed users to patrol on ar Wikisource'
  • 18:36 RobH: resetting the SP on following servers to fix the fact no LEDs are lit on the front panels, yet servers are up (known sun lom issue for us) db21,db23,db24,db29 all SPs back online and LEDs working properly
  • 16:55 RobH: ignore storage2 flapping, it is offline, trying to fix it enough to recover things from it.
  • 15:07 hcatlin: deploying mobile changes... UDP fix
  • 06:56 Tim: rebooting bayes, Erik Z doesn't seem to be running any scripts on it at the moment
  • 06:45 Tim: rebooting tarin, tridge and williams for kernel upgrades
  • 05:53 Tim: rebooting sockpuppet, stafford, spence for kernel upgrade

September 22

  • nimishg: created user 'file_mover' on locke and dataset1 for automating file movement between these machines
  • nimishg: added logging on locke to stress-test packet loss for 24 hrs
  • 23:51 logmsgbot: laner synchronized php-1.5/wmf-config/InitialiseSettings.php 'Changing wikispecies logo for bug 25026'
  • 18:24 Ryan_Lane: disabled firewall on alsted, commencing more openstack work
  • 18:18 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable ArticleAssessment on enwiki for Category:Article_Assessment_Pilot'
  • 18:17 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Kill dark launch hack'
  • 18:11 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 18:10 logmsgbot: catrope synchronized php-1.5/extensions/ArticleAssessmentPilot/js/ArticleAssessment.combined.min.js 'r73553'
  • 17:41 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php
  • 17:39 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php
  • 17:33 logmsgbot: root synchronized php-1.5/wmf-config/InitialiseSettings.php 'Adding Sequence namespace to commons per Erik and mdale's request'
  • 17:22 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Enable AA for me only on enwiki'
  • 17:18 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable ArticleAssessment on enwiki, but without a trigger category'
  • 17:10 RoanKattouw: Dark-launching ArticleAssessment+SimpleSurvey on enwiki, such that it's only visible to me
  • 16:51 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'bug 25243: Change Image: to File: in filerepo config to save HTTP redirects'
  • 15:48 RobH: added memcached roles to all apaches srv150+ that are not image scalers. Once puppet updates them all, they will be tested and moved into the spares group in mc.php
  • 15:06 logmsgbot: kate synchronized php-1.5/wmf-config/db.php 'removed ms1 from ES rotation to dump for TS'
  • 14:29 logmsgbot: catrope synchronizing Wikimedia installation... Revision: 73534
  • 14:28 RoanKattouw: Running scap to deploy updates to ArticleAssessment and SimpleSurvey. Still only enabled on testwiki for now
  • 14:26 logmsgbot: catrope synchronized php-1.5/includes/DefaultSettings.php 'r73514'
  • 14:25 logmsgbot: catrope synchronized php-1.5/includes/api/ApiMain.php 'r73514'
  • 11:53 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Reenable action=parse on Commons now ImageAnnotator is disabled'
  • 08:46 Tim: deploying shortened error messages (Wikimedia_en) on all squids, since they all seem to have the updated package now
  • 08:42 Tim: fixed RCS/permissions mess in /home/wikipedia/conf/squid
  • 08:23 logmsgbot: tstarling synchronized php-1.5/includes/api/ApiMain.php 'r73511'
  • 08:23 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'disable action=parse on commons'
  • 08:20 logmsgbot: tstarling synchronized php-1.5/includes/DefaultSettings.php 'r73511'
  • 01:43 Ryan_Lane: installed openstack components on alsted; firewalled off for now

September 21

  • 20:32 RoanKattouw: Applying PrefSwitch schema change (pss_user_text field) on enwiki
  • 20:01 RoanKattouw: Ran prefswitch.pss_user_text SQL patch on testwiki DB
  • 19:45 RoanKattouw: Applying SQL patch for ArticleAssessment to testwiki DB
  • 19:31 logmsgbot: catrope synchronizing Wikimedia installation... Revision: 73472
  • 19:30 RoanKattouw: Running scap to deploy ArticleAssessment and SimpleSurvey extensions. Not enabled anywhere yet
  • 19:16 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/PrefSwitch.classes.php 'r73471'
  • 19:14 JeLuF: fixed broken installation of wikimedia-task-appserver on fenari (missing package libapache2-mod-php5 was installed automatically)
  • 19:11 logmsgbot: catrope synchronized php-1.5/includes/api/ApiBase.php 'r73469'
  • 19:04 logmsgbot: catrope synchronized php-1.5/skins/vector/main-rtl.css 'Deploying r72939, which apparently never got done'
  • 19:02 logmsgbot: catrope synchronized php-1.5/skins/vector/main-ltr.css 'Deploying r72939, which apparently never got done'
  • 18:36 Ryan_Lane: changed eiximenis's ufw configuration to only forward port 9000 to port 80 if the destination address is eiximenis
  • 18:30 Ryan_Lane: changing firewall rules on eiximenis
  • 16:11 mark: Restarted sq32 and sq33
  • 15:39 logmsgbot: tstarling synchronized php-1.5/includes/api/ApiMain.php 'disabling action=parse'
  • 15:21 mark: restarted backend squid on sq31
  • 15:16 mark: Lowered cache_mem from 2500 to 2000 on sq31-33
  • 09:35 Tim: on grosley: tried to fix broken hudson-labs apt source, apt-get dist-upgrade
  • 09:31 Tim: apt-get dist-upgrade on sanger, bayes
  • 09:19 Tim: on mchenry: apt-get dist-upgrade, and chmod 755 /var/spool/exim4 so that NRPE works again
  • 09:03 Tim: configured kaulen to get its packages from the local mirror, and removed the o=Wikimedia pin to make it not use the PHP packages which were compiled on hardy and which conflict with everything
  • 08:28 Tim: apt-get dist-upgrade on brewster, ersch, erzurumi, formey
  • 08:21 Tim: on alsted: apt-get dist-upgrade
  • 07:13 Tim: on hume: apt-get dist-upgrade
  • 05:05 logmsgbot: tstarling synchronized php-1.5/wmf-config/secure.php 'removed downtime notice'
  • 04:58 Tim: on yongle: fixed broken sources.list, upgraded 187 packages
  • 04:54 Tim: upgraded packages on tarin, tridge, williams
  • 04:42 Tim: on streber: removed python-pysnmp2, conflicted with python-pysnmp4 and apparently wasn't being used
  • 04:28 Tim: upgraded packages on stafford, streber. python-pysnmp-common on streber is broken
  • 04:14 Tim: upgraded packages on singer
  • 04:07 Tim: upgrading packages on spence
  • 04:03 Tim: upgrading packages on sockpuppet
  • 03:53 logmsgbot: laner synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 03:53 Tim: scheduled reboot of singer at 05:00 using atd
  • 03:50 logmsgbot: tstarling synchronized php-1.5/wmf-config/secure.php 'secure.wikimedia.org scheduled reboot notice'
  • 03:46 logmsgbot: laner synchronizing Wikimedia installation... Revision: 72920
  • 03:22 Ryan_Lane: updated the kernel on oldusability.wikimedia.org linode server, and rebooted
  • 02:55 Tim: upgrading kernel on wikitech linode to latest 2.6 paravirt, will reboot
  • 02:45 logmsgbot: laner synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'removing debug info'
  • 02:44 logmsgbot: laner synchronized php-1.5/extensions/CentralNotice/SpecialBannerListLoader.php 'removing debug info'
  • 02:43 logmsgbot: laner synchronized php-1.5/extensions/CentralNotice/SpecialCentralNotice.php 'removing debug info'
  • 02:39 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialCentralNotice.php 'removing debug info'
  • 02:13 logmsgbot: tfinc synchronizing Wikimedia installation... Revision: 72920
  • 02:12 Tim: rebooting locke and sanger for kernel upgrade
  • 02:09 Tim: rebooting hume for kernel upgrade
  • 01:33 logmsgbot: tfinc ran sync-common-all
  • 01:12 Tim-away: fixed syntax error in InitialiseSettings.php, which caused the site to go down

September 20

  • 22:23 Ryan_Lane: finished kernel upgrades on tesla VMs
  • 20:54 Ryan_Lane: moved mysql data from /var/lib/mysql to /a/sqldata on prototype
  • 18:33 Ryan_Lane: Correction: previous ufw logs were for eiximenis
  • 18:33 Ryan_Lane: added ufw application configuration for etherpad. Added allow rules for OpenSSH and Etherpad in ufw config. Enabled ufw.
  • 18:32 Ryan_Lane: added ufw prerouting rules for 80->9000 for etherpad.
  • 18:32 Ryan_Lane: restarted etherpad on eiximenis
  • 17:46 Ryan_Lane: upgrading kernel on all tesla VMs
  • 16:30 Ryan_Lane: running database recovery on torrus on streber, and restarting the service
  • 14:29 Tim: rebooting mchenry for kernel upgrade
  • 13:39 Tim: rebooting ersch, erzurumi, formey and kaulen for new kernel
  • 12:34 Tim: rebooting brewster and eiximenis for new kernel
  • 12:25 Tim: updated browne /etc/motd
  • 08:49 Tim: rebooting alsted for new kernel
  • 07:35 Tim: rebooting fenari to get new kernel
  • 07:03 Tim: upgrading everything on fenari

September 17

  • 20:56 RobH: db7 & db8 no longer under warranty, replaced failed 73gb 10k with 146gb 10k drive on site. raid rebuilding on both
  • 20:35 RobH: locke no longer under warranty, replaced failed 73gb 10k with 146gb 10k drive on site. raid rebuilding, closing rt#114
  • 19:36 RobH: db28 coming down for fanboard replacement (not hot swappable since its the controller)
  • 19:30 RobH: all SP back online for search1-search12 reporting error free.
  • 19:25 RobH: resetting the SP on search1-search12 to make them forget the spare powersupply they really do not have.
  • 19:16 RobH: full power reset cleared drac errors, noted for followup, system back online.
  • 19:07 RobH: srv206 has all kinds of errors, working on it, ignore nagios flaps.
  • 18:49 RobH: db34 online and ready to be setup into database deployments.
  • 00:15 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Fixing placement of VariablePage extension'

September 16

  • 21:39 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'adding enwiki to variable page extension'
  • 21:38 Ryan_Lane: fixed puppet issue with ganglia on memcache servers (a bad puppet file had previously been pushed)
  • 21:35 logmsgbot: tfinc ran sync-common-all
  • 21:27 logmsgbot: tfinc synchronized php-1.5/extensions/VariablePage/VariablePage.i18n.php 'Adding variable page extension for donate link'
  • 17:36 Ryan_Lane: took sq33 out of lvs rotation
  • 16:02 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 25142: Redisable anon page creation, there was no consensus'
  • 14:23 RobH: sq60,sq73,sq75 coming back into service
  • 14:13 RobH: fixed sq33, working on backend on sq60,sq73,sq75, all having cache cleaned
  • 13:54 RobH: taking another look at sq33, pulled from service
  • 13:49 RobH: working on srv206
  • 13:46 RobH: sq73 & sq75 up
  • 13:35 RobH: sq38 & sq60 back in service
  • 13:29 RobH: rebooted sq38 & sq60 to bring them back online
  • 12:45 mark: Powercycled amssq52
  • 12:37 mark: knsq3 declared dead
  • 11:43 mark: Inserted extra switch fabric and spare 4x 10G line card modules into csw1-esams

September 15

  • 21:16 Ryan_Lane: doing initial repo copies from mayflower to formey
  • 21:16 Ryan_Lane: added svn users to formey.wikimedia.org
  • 18:43 mark: Moving back bits.esams traffic
  • 18:21 mark: Moving traffic for bits.esams to bits.pmtpa
  • 18:08 mark: Restarted varnish on knsq5
  • 18:07 mark: Changed LVS scheduler from 'wlc' to 'wrr' for bits on amslvs1
  • 18:02 mark: Restarting pybal on amslvs1, with proxyfetch disabled for varnish/bits
  • 17:47 mark: Restarted varnish servers on knsq2, 4 and 5
  • 17:37 mark: temporarily disabled the proxyfetch monitor on amslvs1 to stabilize
  • 17:06 mark: Restarted pdns instances on dobson
  • 16:59 mark: Shutdown powerdns (auth) and pdns-recursor on dobson, preparing for reinstall
  • 16:19 mark: Temporarily making ns1 the DNS master, to rebuild dobson
  • 15:52 mark: Fixed regular expressions in puppet site.pp
  • 14:48 mark: ms2 upgrade to Lucid is complete
  • 14:47 mark: Replaced apparmor profile on ms2 by newer one from ms1, restarted replication
  • 13:25 mark: Starting ms2 upgrade to Lucid
  • 01:09 Ryan_Lane: finished rebuilding ersch
  • 00:58 Ryan_Lane: rebuilding ersch
  • 00:57 Ryan_Lane: finished rebuilding alsted
  • 00:39 Ryan_Lane: rebuilding alsted

September 14

  • 22:55 awjr: Rolled back PHP 5.3.2 to PHP 5.2.10 on payments.wikimedia.org due to outstanding PHP bug 50394
  • 20:58 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'disabled $wgFlaggedRevsComments on huwiki (unused)'
  • 20:06 awjr: restarting payments.wikimedia.org post ubuntu 10.04 upgrade
  • 19:41 awjr: performing upgrade from ubuntu 8.04->10.04 on payments.wikimedia.org
  • 19:06 mark: Restarted pdns on dobson/ns0
  • 19:05 mark: Fixing wikimedia.org MX records in DNS
  • 18:59 RobH: mail server issues are being investigated.
  • 18:48 Ryan_Lane: removed public IP address to test-payments
  • 18:45 awjr: Restarting test-payments post upgrade
  • 18:40 Ryan_Lane: added public IP address to test-payments
  • 18:37 awjr: Performing Ubuntu upgrade from 8.04 -> 10.04 on test-payments on tesla
  • 18:37 RobH: db34 is running memtest86, relocating the memory moved it, who knows if its due to reseating or relocation? hence memtest is in progress.
  • 18:07 RobH: db34 is truely the spawn of the devil. after two trips back to repair, it now appears not only did it need a new mainboard, new cpu, new mainboard again for good measure, new power supply logic board, it also now has two dimm errors. pulling it down to migrate the memory around and confirm its the dimms.
  • 18:00 RobH: db34 system board replaced, and also bad cpu replaced, back in rack and installing os.
  • 17:55 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25128 - Books namespace for fiwiki'
  • 17:53 Ryan_Lane: delegated corp.wikimedia.org to ns1.corp.wikimedia.org and ns2.corp.wikimedia.org
  • 17:49 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25160 - Please create an "autopatrolled" usergroup at metawiki'
  • 17:46 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25142 - Set $wgGroupPermissions['*']['createpage'] = true; zhwiki (Allowed IP createpage)'
  • 17:43 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25166 - patroller on Fawiki'
  • 17:39 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 17:36 JeLuF: closed swwikibooks (bug 25170)
  • 17:36 logmsgbot: jeluf ran sync-common-all
  • 17:10 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25125 - Please update logo for Kurdish Wikipedia'
  • 11:43 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'PagedTiffHandler'
  • 11:42 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'enabling PagedTiffHandler on all wikis instead of just commons'
  • 11:41 logmsgbot: fixed logmsgbot UDP listener
  • 09:58 rainman-sr: search10-19 show down in ganglia and nagios with message "Backend Squid HTTP", although they are alive and well and not squids. Could someone look into this?
  • 02:37 Tim: on locke: filtered mobile packets out of packet-loss input

September 13

  • 20:32 Ryan_Lane: rebuilt formey as 10.04
  • 20:30 RobH: disregard that I cannot spell disregard.
  • 20:30 RobH: disreagrd formey popping up and down, Ryan_Lane is working on it.
  • 18:17 RoanKattouw: logmsgbot on IRC but broken
  • 18:17 RoanKattouw: Synced CommonSettings.php to bump style version appendix
  • 18:17 RoanKattouw: Synced extensions/UsabilityInitiative/WikiEditor/WikiEditor.combined.min.js for r72920
  • 14:33 RoanKattouw: Started logmsgbot as catrope. Filed RT #217 for cleanup of logmsgbot processes
  • 14:28 RoanKattouw: Synced CommonSettings.php for bug 25106 (allow .sla uploads on outreachwiki)
  • 14:27 RoanKattouw: logmsgbot MIA
  • 11:14 hcatlin: deploying a "perm disable" bugfix for m.
  • 11:06 hcatlin: deployed m. language update
  • 07:31 Tim: deploying 72894

September 12

  • 07:43 domas: reinstalling db20

September 11

  • 15:21 RobH: seems srv224 has been stuck on conversions and required manual restart of apache before.
  • 15:20 RobH: srv224 apache stuck on some conversion, wont restart, offlined in lvs but is now back online
  • 02:53 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'removed redundant settings'

September 10

  • 23:51 RobH: bringing back online srv154, it should start up, run puppet, and put itself back into service
  • 23:50 RobH: hey look, the nagios-wm change worked, it logs to both channels, all is well.
  • 23:46 RobH: srv154 may flap, its intentional, i assure you
  • 23:43 RobH: nagios-wm reports in multiple channels, woot
  • 23:21 RobH: tired of messing with nagios bot, its back online
  • 23:13 RobH: trying to bounce the nagios bot to make it report in different channels.
  • 20:23 Ryan_Lane: manually modified /etc/gmond.conf on srv218 temporarily to use an actually existing directory for ganglia plugin configuration
  • 20:22 Ryan_Lane: manually added memcache ganglia plugin to srv218 temporarily for testing before pushing to all memcache servers
  • 20:21 Ryan_Lane: added patched python-memcache package to srv218 for memcache ganglia plugin testing
  • 20:21 Ryan_Lane: created test-payments.tesla.usability.wikimedia.org for payment processing testing (no public IP)
  • 19:06 RobH: mobile2 back online and in lvs pool, mobile pool optimal (as optimal as a 3 server cluster can be)
  • 19:01 RobH: mobile2 unresponsive to ssh, shows partically up in lvs4, wont respond to serial console, rebooting it
  • 18:59 RobH: mobile3 back online.
  • 18:59 RobH: restarted apache/memcached on mobile3
  • 16:17 RobH: db20 back online, but has one power supply while other is replaced
  • 15:38 RobH: db20 all kinds of messed up, ignore nagios flaps.
  • 15:38 RobH: db16 running memory tests until Monday, leave it alone

September 9

  • 21:32 RobH: knsq21, knsq22 online, rt#5 resolved
  • 20:31 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Adding thursday banners to whitelist for contrib stats'
  • 20:17 RobH: knsq21 & knsq22 coming down for reinstallation, set to false in pybal
  • 20:14 RobH: knsq17-knsq20 reinstalled and back in service
  • 19:53 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23081 - Remove "makesysop" right from bureaucrats and stewards groups'
  • 19:50 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25011 - Need Namespace name change in tamil wikinews'
  • 19:43 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24865 - [[special'
  • 19:35 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23186 - Enable upload in portuguese Wikipedia'
  • 19:32 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24852 - New namespace for WikiProject on jawiki'
  • 19:16 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25022 - Add NewUserMessage extension on ru.wikiversity'
  • 18:47 RobH: test test
  • 18:47 RobH: test test
  • 17:44 RobH: knsq14, knsq15 both back online and in lvs pool, still working on knsq16, knsq17
  • 17:00 RobH: knsq14-knsq17 reinstalled, getting added and updated by puppet, then they will be put back into service
  • 16:24 RobH: knsq12 back in lvs pool, set knsq14-17 to false for reinstallation
  • 16:11 hcatlin: deployed XSS fix to m.wiki. thanks to gcouprie and tstarling.
  • 15:52 RobH: holding off on that, knsq12 is not up and online, fixing.
  • 15:51 RobH: knsq8 in cluster, setting knsq14-knsq17 to false, then taking them down for reinstallation
  • 15:49 RobH: knsq8 fixed. setting to true in lvs config
  • 15:45 RobH: knsq8 being fixed and pushed into service, then resuming reinstalltions for rt#5 knsq14-knsq22
  • 13:44 Tim: edited "permanently disable" link out of live m.wikipedia.org code to fix XSS
  • 00:45 ^demon: blocked empty commit summaries in subversion, bug 25025

September 8

  • 21:57 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24849 - Enable the Collection extension for creating books on the Finnish Wikipedia [fiwiki]'
  • 21:52 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24936 - Localizing "Portal" namespace in si.wikipedia'
  • 21:48 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24929 - Make Author namespace for srwikisource'
  • 21:44 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24825 - Create "Livro'
  • 21:39 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24576 - New namespaces for ml.wikisource'
  • 21:28 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24394 - Install AbuseFilter on Hindi Wikipedia'
  • 21:22 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25103 - Create "Pielikums" namespace on lv.wiktionary'
  • 21:18 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24852 - New namespace for WikiProject on jawiki'
  • 21:10 logmsgbot: jeluf synchronized php-1.5/wmf-config/abusefilter.php '25082 - Grant abusefilter-log-detail to only sysop and abusefilter group on jawiki '
  • 18:08 domas: testing testing
  • 18:07 RobH: domas added support for multiple channels into morebots and its now back, yayyyyy
  • 17:54 RobH: killing morebots to restart it, i may end up breaking it completely and having to revert my changes
  • 13:34 hcatlin: m. deployment complete and everything is looking good
  • 13:33 hcatlin: deploying TStarling's UDP patch for mobile
  • 05:55 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 'Remove BOM'
  • 05:44 logmsgbot: jeluf synchronized php-1.5/wmf-config/abusefilter.php '24394 - Install AbuseFilter on Hindi Wikipedia'
  • 05:36 logmsgbot: jeluf synchronized php-1.5/wmf-config/abusefilter.php '25082 - Grant abusefilter-log-detail to only sysop and abusefilter group on jawiki '
  • 05:29 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '25051 - Localize namespaces on newiki'

September 7

  • 21:55 RobH: sq69, sq70 both back online and in lvs pool
  • 21:51 Ryan_Lane: repooled sq68 in pybal bits config
  • 21:49 mark: Powercycled sq68
  • 21:48 RobH: rebooting sq69, sq70
  • 21:41 Ryan_Lane: set sq68 to false in pybal's bits configuration
  • 19:56 logmsgbot: catrope synchronized php-1.5/languages/messages/MessagesFrr.php 'r72545'
  • 19:46 apergos: restarted en wiki full history (with prefetch this time).
  • 19:29 logmsgbot: robh synchronized php-1.5/wmf-config/mc.php 'adding in srv258-289 into memcached spares, all test online'
  • 18:47 Ryan_Lane: split pdf1-3 and mobile1-3 away from misc cluster in ganglia; made two new clusters (pdf and mobile)
  • 18:33 apergos: starting xml full history dumps stage (paralellized) for en wik on snapshot2 in screen session from /backups-testing
  • 17:38 Ryan_Lane: changed password for Utilisateur:Bot de paille on frwiki
  • 14:37 mark: Powercycled mobile1
  • 05:13 apergos: started meta-current phase of xml dumps for en wiki on snapshot2 in screen session
  • 03:39 Tim: installed packet loss monitor on locke

September 6

  • 20:23 apergos: started parallelized pages-articles xml dump for en wiki on snapshot2 in screen session.
  • 11:23 apergos1: started parallel stubs dump for en wiki on snapshot2 in screen session
  • 05:49 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/i18n/Lqt.namespaces.php
  • 05:49 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/i18n/Lqt.namespaces.php
  • 05:35 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 72153
  • 05:32 Andrew: Running l10nupdate, scap and installing LiquidThreads on huwiki

September 5

  • 22:53 domas: why is srv193 not in prod?
  • 22:52 logmsgbot: midom synchronized php-1.5/wmf-config/mc.php 'srv123 was not replaced mc.php, despite previous SAL messages'
  • 21:04 logmsgbot: midom synchronized php-1.5/wmf-config/InitialiseSettings.php 'rolling back logo for axis of eee... nevermind'
  • 04:46 apergos: started the abstracts dump, it works out to 11 jobs in parallel. (screen session on snapshot2) if this seems to bog down the db serves too much, someone can shoot it

September 4

  • 23:13 apergos: starting enwiki xml dump run one stage at a time, with parallelization of later stages. running table dumps now from snapshot2 screen session, using /backups-testing directory as work area
  • 07:16 apergos: tried restarting lighty with tpc6 enabled on dataset1 now that tcp6 connections are closed... no dice. back to ipv4 only
  • 00:34 Ryan_Lane: added initial ldap configuration to puppet.

September 3

  • 23:44 Ryan_Lane: added philippe to the wmffundraising alias
  • 18:56 RobH: srv123 pulled, wiping. also all memcached in updated pool test fine.
  • 18:56 logmsgbot: robh synchronized php-1.5/wmf-config/mc.php 'removing srv123 from memcached due to decommissioning'
  • 18:38 RobH: rt#54 db16 seems to not be throwing errors... installs ok, passed all VTS testing and memtesting. reinstalled the OS
  • 11:45 tomaszf: leaving lighttpd in non ipv6 mode till currently open sockets clear
  • 10:39 tomaszf: disabling ipv6 for lighttpd on dataset1 as its not working post upgrade
  • 10:20 tomaszf: upgrading to ubuntu 10.04 LTS on dataset1
  • 01:44 Ryan_Lane: added pre-commit and post-commit hooks to puppet's svn to check syntax on pre-commit, and svn up on post-commit

September 2

  • 23:09 Ryan_Lane: restarted mysql on db10
  • 23:09 Ryan_Lane: set expire_logs_days to 14 on db10
  • 23:04 Ryan_Lane: fixed password issue for debian-sys-maint on db10
  • 22:56 Ryan_Lane: commented out skip_slave_start in the my.cnf on db10
  • 22:54 Ryan_Lane: restarted db10 (specifically the mysql database)
  • 22:54 Ryan_Lane: restarted db10
  • 19:38 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'adding new thursday test banner to whitelist'
  • 16:53 RobH: reinstalling storage2
  • 16:09 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/SpecialPrefSwitch.php 'Attempt to fix DB error'
  • 16:06 RobH: db16 is running VTS, DO NOT TOUCH THIS SERVER
  • 15:51 RobH: db34 unracked, going back for repair with Oracle Field Engineer.
  • 15:02 RobH: ignore errrors from db16, db28, db34. Today is sun server troubleshooting day.
  • 14:58 RobH: db28 is not in rotation, taking it down to continue hardware testing/repair
  • 14:25 RobH: srv206 back online and in cluster.
  • 14:25 logmsgbot: robh synchronized php-1.5/wmf-config/mc.php 'moved srv206 from down to spare'
  • 13:49 RobH: srv206 offline, looking into it.

September 1

  • 22:42 mark: Shutdown zwinger
  • 22:10 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/SpecialPrefSwitch.php 'Fix fatal'
  • 21:58 Ryan_Lane: created article assessment wiki on prototype
  • 21:31 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/SpecialPrefSwitch.php 'Attempted fix for skin overwriting problem'
  • 21:19 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/SpecialPrefSwitch.php 'Attempted fix for DB issue'
  • 20:56 Ryan_Lane: removed unneeded files from /var/tmp on prototype to free up space
  • 20:43 Ryan_Lane: svn upped bugzilla/skins/custom/buglist.css and bugzilla/skins/custom/show_bug.css
  • 20:42 Ryan_Lane: added an older version of PatchReader to kaulen for bugzilla
  • 20:37 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeText.php 'Picking up changes from r72135'
  • 20:37 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeLocal.php 'Picking up changes from r72135'
  • 20:36 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialCentralNotice.php 'Picking up changes from r72135'
  • 20:36 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/rebuildLocalTemplates.php 'Picking up changes from r72135'
  • 20:36 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/rebuildTemplates.php 'Picking up changes from r72135'
  • 20:35 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/NoticePage.php 'Picking up changes from r72135'
  • 19:07 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/SpecialPrefSwitch.php 'Fix fatal in global opt-out'
  • 18:56 RoanKattouw: Switchover is all done, everyone 's free to touch stuff again as far as I'm concerned
  • 18:55 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Vector switchover, thumb size change. enable global opt-out'
  • 18:54 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Vector switchover, thumb size change. enable global opt-out'
  • 18:54 RoanKattouw: Deploying config for Vector switchover now
  • 18:49 RobH: all new api servers mgmt accessible, still need OS installation.
  • 18:48 RobH: pushing dns change, there was a typo in the zonefile for srv290 mgmt
  • 18:37 RoanKattouw: Put config update for Vector switch in wmf-config/ and synced do test. DO NOT SYNC THIS until I confirm it's working
  • 18:24 logmsgbot: catrope synchronized php-1.5/includes/UserRightsProxy.php 'r72132'
  • 18:22 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/SpecialPrefSwitch.php 'r72127'
  • 18:21 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/PrefSwitch.i18n.php 'r72127'
  • 18:21 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/PrefSwitch/PrefSwitch.php 'r72127'
  • 18:20 logmsgbot: catrope synchronized php-1.5/skins/Vector.php 'r72127'
  • 18:20 RoanKattouw: Deploying r72127 and r72132 with individual syncs
  • 18:03 RoanKattouw: Setting default skin to Vector on testwiki
  • 17:53 RoanKattouw: Testing r72127 on testwiki
  • 17:46 RobH: srv280-srv289 back online. all power balancing in b5-sdtpa complete.
  • 17:22 RobH: srv270-srv279 back online, srv280-srv289 is next (and the last batch)
  • 17:06 RobH: also taking down srv279 in last batch
  • 17:03 RobH: srv264-srv269 back online, now bringing down srv270-srv278
  • 16:49 RobH: morebots needs better parsing to read my mind.
  • 16:49 RobH: srv258-srv263 back online. Now bringing down srv264-srv269
  • 16:31 RobH: srv258-srv263 are being shutdown to have their power rebalanced within the rack. They will be back online shortly.

August 31

  • 23:40 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeTemplate.php 'Picking up all changes in r72057'
  • 23:40 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialCentralNotice.php 'Picking up all changes in r72057'
  • 23:39 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.i18n.php 'Picking up all changes in r72057'
  • 23:39 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.db.php 'Picking up all changes in r72057'
  • 23:04 Ryan_Lane: added wikidev group to mchenry and sanger
  • 22:02 mark: Puppetised jjones account with new ssh key
  • 22:01 Ryan_Lane: modified ssh configuration on cluster to disable password login, except while inside of the cluster.
  • 20:14 logmsgbot: catrope synchronizing Wikimedia installation... Revision: 72043
  • 20:14 RoanKattouw: Running scap to deploy Siebrand's recent commits to 1.16wmf4 (messages files and Names.php)
  • 19:37 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24902 - tawikinews'
  • 19:13 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24957 - enable editinterface user group on hindi wiki and grant bureacrats to add/remove this group'
  • 19:11 Ryan_Lane: added review.tesla.usability.wikimedia.org to forward and reverse DNS
  • 18:42 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24882 - Make NS aliases in Korean Wikinews.'
  • 17:35 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 're-enabling MobileRedirect.js'
  • 17:34 logmsgbot: tstarling synchronized php-1.5/extensions/WikimediaMobile/MobileRedirect.js 're-enabling'
  • 17:15 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'bumping mobile JS version'
  • 17:12 Tim: disabling mobile redirect since it's been down for a long time now
  • 17:11 logmsgbot: tstarling synchronized php-1.5/extensions/WikimediaMobile/MobileRedirect.js
  • 15:21 Tim: fixed mobile1 and mobile2 temporarily with killall -9 -u deploy && /etc/init.d/apache2 start
  • 15:01 hcatlin: rolled back the lastest m. code release because of catostrophic cluster failure. hoping for recovery.
  • 14:52 Tim: mobile mostly down, restored hcatlin's sudo access on mobile1
  • 09:20 logmsgbot: root synchronizing Wikimedia installation... Revision: 71885
  • 09:19 logmsgbot: tstarling synchronizing Wikimedia installation... Revision: 71885
  • 09:11 Tim: added --ignore-externals to /h/w/b/l10nupdate to avoid SSL certificate warning when it attempts to update geshi
  • 09:07 Tim: running LocalisationUpdate/update.php
  • 04:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24956 - enable flood flag user on hindi wiki and grant bureacrats to add or remove this group'
  • 04:52 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php 'Bug 24842 - Enable flagged revisions in category namespace on Hungarian Wikipedia'
  • 04:50 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24957 - enable editinterface user group on hindi wiki and grant bureacrats to add/remove this group'
  • 04:47 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23186 - Enable upload in portuguese Wikipedia'

August 30

  • 23:57 mark: Changed root/enable passwords on mr1-pmtpa, msw1-sdtpa, msw1-pmtpa, asw3-pmtpa(asw-b3-pmtpa)
  • 23:48 mark: Changed root/enable passwords on mr1-pmtpa
  • 23:47 mark: Changed root/enable passwords on asw-a2-pmtpa, asw-b1-pmtpa, asw-b2-pmtpa, asw-b4-pmtpa, asw-c3-pmtpa, asw-c4-pmtpa
  • 23:40 mark: Changed root/enable passwords on asw-b-sdtpa, asw-a4-sdtpa, asw-a5-sdtpa
  • 23:33 mark: Changed root/enable passwords on csw1-esams, csw2-esams, br1-knams
  • 23:19 mark: Changed root/enable passwords on csw1-sdtpa and csw5-pmtpa
  • 20:47 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Enabling jQuery on every page view'
  • 20:47 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enabling jQuery on every page view'
  • 20:38 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24947 - ReaderFeedback Extension requested for Turkish news'
  • 20:21 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '[Bug 24942] Disable file uploading for the Hungarian Wiktionary'
  • 19:51 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'hiwki flaggedrevs'
  • 19:45 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php 'setting hiwiki flaggedrevs reviewer group change assign from sysop to crat'
  • 19:42 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24942 - Disable file uploading for the Hungarian Wiktionary'
  • 19:40 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24846 - Activation of the 'autopatrolled' group on nn WP'
  • 19:35 logmsgbot: jeluf synchronized closed.dblist
  • 19:34 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php 'setting hiwiki flaggedrevs to same as enwiki'
  • 19:27 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24458 - Enable subpages on frwikisource'
  • 19:25 Ryan_Lane: added owa.tesla.usability.wikimedia.org to forward and reverse dns
  • 18:52 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php
  • 18:47 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php
  • 17:39 logmsgbot: robh synchronized php-1.5/wmf-config/flaggedrevs.php
  • 17:28 RobH: pushed changes for bug 24622
  • 17:28 logmsgbot: robh ran sync-common-all
  • 05:14 logmsgbot: tstarling synchronized php-1.5/extensions/PagedTiffHandler/PagedTiffHandler_body.php 'r71927'
  • 04:42 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 're-enabling PagedTiffHandler'
  • 02:56 logmsgbot: tstarling synchronized php-1.5/extensions/PagedTiffHandler/PagedTiffHandler_body.php 'r71924'

August 28

  • 14:35 mark: Restarted pdns on ns2
  • 12:14 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 71783
  • 12:14 Andrew: Finished testing AbuseFilter log entry deletion, rolling out to all wikis with scap
  • 12:04 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Testing abusefilter rights on testwiki'
  • 11:57 Andrew: Setting up for rollout of abusefilter log entry deletion. Will be assigned to the oversight group
  • 11:40 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'LQT for sewikimedia and svwikisource'
  • 11:32 Andrew: deploying LiquidThreads to svwikimedia and svwikisource
  • 01:14 brion: reset list admins & password for WikiPT per IRC request due to old admin being inactive. Set to the admins from WikimediaPT.

August 27

  • 21:18 logmsgbot: jeluf synchronized langlist 'Bug 16077 - Remove nomcom from langlist'
  • 21:14 logmsgbot: jeluf synchronized php-1.5/languages/messages/MessagesFrr.php 'Bug 24663 - linktrail, merge changes from TRUNK'
  • 20:45 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24877 - Enable File Uploading in Korean Wikinews'
  • 20:41 logmsgbot: jeluf synchronized php-1.5/cache/interwiki.cdb 'Updating interwiki cache'
  • 20:35 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24878 - Change Korean Wikinews Logo'
  • 20:34 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24791 - Enable subpage for "Templat" namespace in id.wikisource'
  • 20:30 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24867 - Set logo ckb wiki'
  • 20:28 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24930 - Change the logo image in Sinhala Wikipedia'
  • 20:26 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24958 - Update default old en-logo of frrwiki to new localized logo version'
  • 20:21 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24933 - Allow bureacrats to remove sysop on hindi wiki'
  • 18:15 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24663 - Create North Frisian Wikipedia (wp/frr)'
  • 15:19 RoanKattouw: convertUserOptions.php runs finished on s2 and s3 (s4-6 finished yesterday after ~5 hrs), s1 still running
  • 12:21 apergos: image uploads were broken since the PagedTiffHandler ext was enabled. Part of its image checking logic was being applied to all images, not just tifs. Disabled for now, uploads working again
  • 12:19 logmsgbot: ariel synchronized php-1.5/wmf-config/CommonSettings.php 'disabling PagedTiffHandler pending fixes'
  • 09:00 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php '$wgTiffUseTiffinfo=true'
  • 08:53 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'enabling PagedTiffHandler'
  • 08:46 logmsgbot: root synchronizing Wikimedia installation... Revision: 71783
  • 08:46 logmsgbot: tstarling synchronizing Wikimedia installation... Revision: 71783
  • 08:45 Tim: deploying r71783: latest PagedTiffHandler changes
  • 08:24 Tim: added "umask 002" to tomasz's .bashrc, so that he doesn't break the subversion working copy
  • 05:57 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Adding landing page five'
  • 00:19 logmsgbot: tfinc synchronized php-1.5/includes/Pager.php 'Picking up fixes in r71098'
  • 00:18 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/TemplatePager.php 'Picking up fixes in r71739'

August 26

  • 21:51 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Adding new banners for thursday test'
  • 21:16 Ryan_Lane: rebooting ci.tesla
  • 20:44 Ryan_Lane: redoing the backend IP scheme for the tesla nodes. Shouldn't affect anything unless you are logged into a node through the backend.
  • 20:42 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialCentralNotice.php 'picking up GMT fix for CN'
  • 19:50 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.i18n.php 'CN Phase1 Deployment'
  • 19:49 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/rebuildTemplates.php 'CN Phase1 Deployment'
  • 19:47 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/centralnotice.js 'CN Phase1 Deployment'
  • 19:47 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/centralnotice.css 'CN Phase1 Deployment'
  • 19:46 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/TemplatePager.php 'CN Phase1 Deployment'
  • 19:46 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeText.php 'CN Phase1 Deployment'
  • 19:46 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeTemplate.php 'CN Phase1 Deployment'
  • 19:45 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeLocal.php 'CN Phase1 Deployment'
  • 19:45 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialCentralNotice.php 'CN Phase1 Deployment'
  • 19:44 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/NoticePage.php 'CN Phase1 Deployment'
  • 19:44 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'CN Phase1 Deployment'
  • 19:43 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/CentralNotice.db.php 'CN Phase1 Deployment'
  • 19:18 Ryan_Lane: reconfiguring tesla's backend networks. Selenium grid nodes may be down periodically
  • 18:57 Fred: rebooting pdf1. Hard down.
  • 18:45 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 18:44 logmsgbot: catrope synchronized php-1.5/skins/common/wikibits.js 'r71723'
  • 18:43 RoanKattouw: Commented out srv217 in mediawiki-installation node list cause it's down
  • 16:28 RoanKattouw: Starting six instances of convertUserOptions.php , one for each DB cluster, on hume
  • 11:17 mark: Discarding mail to bugzilla-daemon@wikimedia.org on mchenry

August 25

  • 23:30 RoanKattouw: Running convertUserOptions.php on eswiki, metawiki and enwikinews as a test run
  • 16:28 RoanKattouw: Starting import of another batch of geograph images: 1.1GB in 9,665 files
  • 06:15 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'fawiki milestone logo attempt 3'
  • 06:13 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'fawiki milestone logo attempt 2'
  • 06:02 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'reverting fawiki change, source image deleted'
  • 05:52 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'milestone logo for fawiki'

August 24

  • 19:18 Fred: puppetized Ryan Lane's account and privileges.
  • 18:53 Fred: mobile3 has been deployed and is now in rotation.
  • 18:01 Fred: adding mobile3.w.o to DNS
  • 17:43 Fred: starting deployment of Mobile3
  • 17:39 RobH: racked mobile3, handing off to fred to install
  • 15:34 mark: Reran Nagios sync script to get MySQL monitoring back
  • 13:07 mark: Removed unused IPs in the squid+lvs and public services subnets from DNS
  • 08:51 logmsgbot: tstarling synchronized php-1.5/includes/media/Bitmap.php 'r71547'
  • 08:19 logmsgbot: tstarling synchronized php-1.5/includes/media/Bitmap.php 'live patch to chain -resize with -thumbnail to fix bug 24824'

August 23

  • 13:19 Tim: on ms4: fastcgi re-enabled
  • 13:06 mark: Finished removing all 2009 daily zfs snapshots on ms4
  • 12:08 Tim: on ms4: disabling fastcgi to try to get basic cached service
  • 12:03 Tim: on ms4: increasing max thread count to 256
  • 12:00 mark: Removing all the 2009-10 daily zfs snapshots on ms4
  • 11:46 mark: Removing all the 2009-09 daily zfs snapshots on ms4
  • 11:36 mark: Removing all the 2009-08 daily zfs snapshots on ms4
  • 11:33 mark: Removed oldest daily thumbs zfs snapshot on ms4
  • 10:43 Tim: on ms4: restarting webserver7 with fcgi re-enabled, reduced thread pool count to 4
  • 10:42 mark: Added swap2 to /etc/vfstab on ms4
  • 10:36 mark: zfs create -V 4gb rpool/swap2; swap -a /dev/zol/dsk/rpool/swap2 (on ms4)
  • 10:11 Tim: trying to start webserver7 on ms4, will see if it crashes it again
  • 09:56 mark: svcadm disable puppetd on ms4
  • 09:11 Tim: ms4 back up, after some mucking around with /etc/vfstab
  • 08:48 Tim: ms4 timeout on http, squid serving "cannot forward", will reboot
  • 08:44 Tim: ms4 not responding to ssh, giving "stub start error" on http, trying serial console, very slow
  • 03:05 domas: 'ps -ef | grep php-cgi | awk '$3==1 { print $2 }' | xargs kill; rm /tmp/https-ms4-5351d5c9/stub.pid' to recover from ms4 fastcgi death, not sure what are the causes yet
  • 00:50 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'oops, wrong cluster'
  • 00:49 logmsgbot: midom synchronized php-1.5/wmf-config/db.php

August 22

  • 20:55 apergos: should probably debug the ms4 issue but I will sleep soon. yet another restart.
  • 18:25 apergos: restarted webserver7 on ms4
  • 11:16: apergos: reports that new thumbnails are not being generated on demand. See http://commons.wikimedia.org/w/index.php?title=Special:NewFiles&until=20100822111026. hard restarted apaches on the scalers.
  • 10:02 apergos: had some converts on File:Approximated_*_Inscribed_in_a_Circle.gif stuck again on the image scalers, shot them

August 20

  • 05:44 logmsgbot: andrew synchronized php-1.5/languages/Language.php 'Deploying r71329'
  • 05:43 Andrew: not deploying r71327, actually deploying r71329 because r71327 does not work on 1.16wmf4
  • 05:36 Andrew: deploying r71327

August 19

  • 22:39 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'fixing case on banner bnames'
  • 21:42 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Adding new banners and appeal pages'
  • 21:34 RobH: bug 24664 for mk chapter done
  • 21:30 logmsgbot: robh ran sync-common-all
  • 21:20 RobH: pushed live project ko.wikinews.org, no apache or dns changes needed since ko langcode was already in dns
  • 21:18 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 21:17 logmsgbot: robh ran sync-common-all
  • 20:59 RobH: created new project frr.wikipedia.org, dns, apache, etc..
  • 20:53 logmsgbot: robh ran sync-common-all
  • 20:40 mark: Downpreffed AS16265 transit routes to local-pref 90
  • 20:28 RobH: pushed dns changes and apache changes for the bookshelf project url, bug # 24872
  • 20:28 mark: Turned up AS157 transit on 10G link e1/3 on br1-knams
  • 14:34 Tim: killed hung convert on all image scalers

August 18

  • 20:03 RobH: sq57 drive replaced, but raid didnt work (seems like grub wasnt copied to both drives) leaving offline for now, will investigate later
  • 19:42 RobH: sq57 set to false in lvs, replacing bad disk.
  • 18:33 RobH: kicking around db16, trying to fix it
  • 10:37 mark: Restored VRRP priorities to original state
  • 10:33 mark: authdns-scenario normal
  • 10:30 mark: Enabled ve1 on csw1-esams
  • 02:31 Tim: also edited /etc/gai.conf on fenari to prefer IPv4, to fix ExtensionDistributor
  • 02:28 Tim: edited /etc/gai.conf on kaulen to avoid broken IPv6 connection to mayflower, so CR will start working again

August 17

  • 22:57 mark: Shutdown ve1 on csw1 to force VRRP backup
  • 22:53 mark: Packet loss, authdns-scenario esams-down
  • 22:48 mark: authdns-scenario normal
  • 22:43 mark: Configured all VRRP instances on csw1-esams to have priority 1, to reliably stay in backup mode
  • 22:11 RobH: dns changed to route traffic to tampa
  • 22:10 river: authdns-scenario esams-down
  • 20:13 RobH: set srv278 to false in lvs, taking it down for hardware testing per rt#24
  • 18:06 rainman-sr: disabling interwiki search on all wikis, not only en.wp until we figure out what is going on
  • 16:49 rainman-sr: search11 is fully up with all features, and seems to work fine .. will keep an eye on it
  • 16:43 RobH: srv230 online
  • 16:41 rainman-sr: all of search up, still fiddling with search11 to see why it gave strange I/O spikes during the batch2 migration
  • 16:37 RobH: investigating srv230.
  • 16:32 RobH: srv230 back online with memory replacement, synced and back in cluster
  • 16:31 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php
  • 16:29 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'Returning all search values to normal, should restore full search functionality.'
  • 16:22 rainman-sr: bringing up search5,12, 13-20
  • 16:21 RobH: shutting down srv230 to swap out bad memory
  • 15:50 RobH: search13-search20 relocated to b3-sdtpa. All servers are online, working to bring search back to full deployment.
  • 14:36 rainman-sr: search5,12 will also show offline because they run parts of the services that are temporarly disabled
  • 14:30 RobH: search13-search20 will show offline during their relocation, approx until 16:00 if all things go well
  • 14:09 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'Disabling wgEnableLucenePrefixSearch on projects while search servers are relocated'
  • 14:04 rainman-sr: starting relocation of second batch of search servers
  • 13:05 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/i18n/Lqt.namespaces.php
  • 13:05 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/i18n/Lqt.namespaces.php
  • 12:19 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 71190
  • 12:19 Andrew: running scap to update LiquidThreads localisation
  • 11:01 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Add LiquidThreads to ptwikibooks'
  • 10:34 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/LiquidThreads.php
  • 10:34 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/i18n/Lqt.namespaces.php
  • 10:33 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/i18n/Lqt.i18n.php
  • 10:33 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/LiquidThreads.php
  • 10:33 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/i18n/Lqt.namespaces.php
  • 10:33 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/i18n/Lqt.i18n.php
  • 05:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php '22109 - add fourth level subdomains of wikimedia to '
  • 05:42 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24808 - Localize Wikipedia name for newiki'
  • 05:34 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24355 - Namespace changes - si.wikipedia'
  • 05:31 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24255 - Change meta namespace names on siwikibooks'
  • 05:24 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24579 - Reconfigure Turkish Wikibooks'
  • 05:07 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24822 - XML-import for de-wiktionary'
  • 05:05 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24823 - alias for de-wiktionary'
  • 04:09 Tim: wikimedia-task-appserver on srv224 was half-configured because parentless convert processes stuck in a deadlock were holding the *:80 listen filehandle open, preventing apache from starting. Killed the convert processes and reinstalled.
  • 04:05 Tim: deployment of the new wikimedia-task-appserver led to nagios alerts about half-configured status on 16 servers due to a warning in rsync sync-common about file permissions. Fixed by deleting Mark's vim swapfile and reinstalling.
  • 03:36 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'disabling PagedTiffHandler, no identify present on non-scalers so upload fails'
  • 03:26 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'enabling PagedTiffHandler'
  • 03:23 logmsgbot: tstarling synchronizing Wikimedia installation... Revision: 71190
  • 03:22 Tim: doing another scap to update ExtensionMessages.php
  • 03:16 logmsgbot: root synchronizing Wikimedia installation... Revision: 71190
  • 03:14 Tim: doing svn up/scap to r71188 for PagedTiffHandler and supporting changes

August 16

  • 21:44 RobH: dns update for login vmhost on tesla, rt#108
  • 20:38 Fred: rebooting srv217 ->oom
  • 19:15 logmsgbot: mark synchronized php-1.5/extensions/MWSearch/MWSearch_body.php 'Reenabling the disabled section, as rainman has disabled interwiki results in the lucene backend'
  • 18:51 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24304 - Reconfigure English Wikibooks'
  • 18:50 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php '24304 - Reconfigure English Wikibooks'
  • 18:49 rainman-sr: restarting search1,3,4,9
  • 18:46 logmsgbot: root synchronized php-1.5/extensions/MWSearch/MWSearch_body.php 'Temporarily disable part of MWSearch'
  • 18:45 JeLuF: archived admin logs, moved January-June 2010 to Archive 15
  • 18:45 logmsgbot: root synchronized php-1.5/extensions/MWSearch/MWSearch_body.php 'Temporarily disable part of MWSearch'
  • 18:43 logmsgbot: root synchronized php-1.5/extensions/MWSearch/MWSearch_body.php 'Temporarily disable part of MWSearch'
  • 18:38 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '17627 - Allow autoconfirmed users to patrol on ar Wikisource'
  • 18:08 Fred: MWSearch extension is having issues...
  • 17:36 Fred: API servers all started segfaulting. Restarting apache for the time being
  • 04:50 logmsgbot: jeluf synchronized php-1.5/wmf-config/abusefilter.php '24304 - Reconfigure English Wikibooks'
  • 04:48 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php '24304 - Reconfigure English Wikibooks'

August 15

  • 22:42 domas: resynced db11 and db17 from db27, db33 from ixia, db19 from db1, with accompanying BIOS flashing to 3.0 and OS reinstalls. decommissioned ixia and db1
  • 22:40 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'drumrolllll'
  • 22:04 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 21:56 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 21:49 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 21:39 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24520 - Transwiki import source for ml.wikiquote.org'
  • 20:55 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24458 - Enable subpages on frwikisource'
  • 20:39 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24623 - Enable 'eliminator' flag on ptwiki'
  • 20:36 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24304 - Reconfigure English Wikibooks'
  • 20:36 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php '24304 - reconfigure enwikibooks'
  • 20:35 logmsgbot: jeluf synchronized php-1.5/wmf-config/abusefilter.php '24304 - reconfigure enwikibooks'
  • 20:32 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 20:14 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24623 - Enable 'eliminator' flag on ptwiki'
  • 20:09 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace'
  • 20:04 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 19:54 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24777 - Request for a patrolling function on the Nynorsk (nn) Wikipedia'
  • 19:48 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24535 - Enable "mark as patrolled" feature in hindi wiki'
  • 19:44 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24394 - Install AbuseFilter on Hindi Wikipedia'
  • 19:41 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24336 - Change Simplewiki's (EN) autoconfirm time/edit rates'
  • 19:36 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24790 - Localize Wikipedia sitename in devanagari'
  • 19:33 JeLuF: fixed zero byte thumbnail of commons:Shkval_head.jpg
  • 19:19 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24789 - Enable AbuseFilter for ja.wikipedia'
  • 19:18 logmsgbot: jeluf synchronized php-1.5/wmf-config/abusefilter.php '24789 - Enable AbuseFilter for ja.wikipedia'
  • 19:14 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 18:44 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 18:24 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 17:44 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 16:23 logmsgbot: midom synchronized php-1.5/wmf-config/db.php

August 14

  • 20:15 mark: Decommissioning srv150
  • 19:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24789 - Enable AbuseFilter for ja.wikipedia'
  • 19:52 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24626 - Add an "autopatrolled" status for frwiktionary'
  • 15:37 mark: dobson has failed RAID1 array member /dev/sda. Running long SMART self test on /dev/sda
  • 14:18 logmsgbot: mark synchronized php-1.5/wmf-config/db.php 'Add ms2 and ms1 to clusters rc1 an cluster22'
  • 14:06 mark: FLUSH TABLES WITH READ LOCK on ms1 for testing
  • 13:59 mark: Stopping mysql on ms1 as monitoring test
  • 13:59 mark: Granted SELECT on mysql.* to nagios on ms3
  • 10:57 mark: Removed oldest LVM snapshot on ixia
  • 09:43 mark: Fixed apparmor profile /etc/apparmor.d/usr.sbin.mysqld on ms1, restarted mysql under apparmor
  • 09:39 mark: START SLAVE on ms1, catching up with ms3
  • 09:38 mark: RESET SLAVE on db5
  • 09:37 mark: STOP SLAVE on db5
  • 09:35 mark: Stopped apparmor on ms1
  • 08:41 Andrew: Leaving as-is for now, hoping somebody with appropriate permissions can fix it later.
  • 08:40 Andrew: STOP SLAVE on db5 gives me ERROR 1045 (00000): Access denied for user: 'wikiadmin@208.80.152.%' (Using password: NO)
  • 08:34 Andrew: Slave is supposedly still running on db5. Assuming Roan didn't stop it when he switched masters a few days ago. Going to text somebody to confirm that stopping is correct course of action.
  • 08:24 Andrew: db5 can't be lagged, it's the master ;-). Obviously something wrong with wfWaitForSlaves.
  • 08:19 Andrew: db5 lagged 217904 seconds
  • 05:09 Andrew: Ran thread_pending_relationship and thread_reaction schema changes on all LiquidThreads wikis
  • 05:06 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 70933
  • 05:04 Andrew: About to update LiquidThreads production version to the alpha.

August 13

  • 22:03 mark: API logins on commons (only) are reported broken
  • 21:45 mark: Set correct $cluster variable for reinstalled knsq* squids
  • 21:03 mark: Increased cache_mem from 1000 to 2500 on sq33, like the other API backend squids
  • 20:58 mark: Stopping backend squid on sq33
  • 20:50 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24769 - Import source addition for tpi.wikipedia.org'
  • 17:46 Fred: and srv100
  • 17:45 Fred: restarted apache on srv219 and srv222
  • 15:57 logmsgbot: mark synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned from the down list'
  • 15:56 logmsgbot: mark synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned hosts from the down list'
  • 15:53 RobH: srv146 removed from puppet and nodelists, slated for wipe, decommissioned.
  • 15:47 mark: Sent srv146 to death using echo b > /proc/sysrq-trigger. It had a read-only filesystem and is therefore decommissioned.
  • 15:38 mark: Restarted backend squid on sq33
  • 15:36 logmsgbot: mark synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned hosts from the down list'
  • 15:25 mark: Reinstalled sq32 with Lucid
  • 15:01 mark: Removed sq86 and sq87 from API LVS pool
  • 14:55 mark: sq80 had been down for a long time. Brought it back up and synced it
  • 14:54 rainman-sr: all of the search cluster restored to pre-relocation configuration
  • 14:34 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'reverting search13 to search11'
  • 13:55 mark: /dev/sda on sq57 is busted
  • 13:54 RobH: removed search17 from search_pool_3
  • 13:50 mark: Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal
  • 13:44 mark: powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746]
  • 13:42 mark: sq58 was down for a long long time. Brought it back up and synced it
  • 13:37 RobH: added search7 back into search_pool_3, kept search17 in as well
  • 13:27 RobH: changed search_pool_3 back from search7 to search17 since it failed
  • 13:25 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use'
  • 12:45 mark: API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well
  • 12:40 mark: Shutdown puppet and backend squid on sq32
  • 11:41 mark: Corrected changed hostname for api.svc.pmtpa.wmnet in text squid config files
  • 11:37 mark: Temporarily rejecting requests to sq31 backend to give it some breathing room while it's reading its COSS dirs
  • 11:32 mark: Reinstalled sq31 with Lucid
  • 10:25 mark: Shutting down backend squid on sq31 to see the load impact
  • 10:18 mark: Setup backend request statistics for the API on torrus
  • 09:15 rainman-sr: bringing up search1-12 and doing some initial index warmups
  • 01:53 RobH: searchidx1, search1-search12 relocated and online, not in cluster until Robert can fix in the morning. The other half will have to move on a different day, 12 hours in the datacenter is long enough.
  • 01:40 RobH: finished moving searchidx1 and search1-12, bringin them back up now

August 12

  • 23:10 RobH: shutting down searchidx1, search1-12 for move
  • 22:40 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'swapped search13 and search18 for migration'
  • 22:37 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'reverting so search13 and search18 can change roles'
  • 22:22 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'changes back in place to migrate searchidx1 and search1-10'
  • 22:19 RobH: puppet updated on all search servers, confirmed all have all three lvs ip addresses
  • 21:55 mark: Configured puppet to bind all LVS service IPs to all search servers
  • 21:54 RobH: reverted search_pool changes on lvs
  • 21:54 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'rolling it back'
  • 21:48 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'changing settings for migration of searchidx1 and search1-search12'
  • 21:43 RobH: changing lvs3 search pool settings for server relocations
  • 20:33 logmsgbot: robh synchronized php-1.5/wmf-config/lucene.php 'commented out wgEnableLucenePrefixSearch for search server relocation'
  • 19:30 RobH: srv281 reinstall done but not online as puppet has multiple package issues, leaving out of lvs
  • 19:09 RobH: srv230 is on, but set to false in lvs. do not push back into rotation until after new memory arrives and is installed tomorrow (rt#69)
  • 18:59 logmsgbot: robh synchronized php-1.5/wmf-config/mc.php 'updating without srv230'
  • 18:53 RobH: srv230 coming down for memory testing
  • 18:49 RobH: set srv230 to false in lvs, need to test memory
  • 18:04 RobH: reinstalling srv281
  • 17:59 RobH: nix that, srv125 was ex-es, leaving those for now.
  • 17:58 RobH: pulling srv103 & srv125 for wipe (pulling stuff with temp warnings first)
  • 17:53 logmsgbot: robh synchronized php-1.5/wmf-config/mc.php 'removed srv103, replacing it with srv244'
  • 17:47 RobH: pulling srv95 for wipe
  • 17:38 RobH: srv110 removed from lvs3 config
  • 17:36 mark: Removed all apaches up to srv150 from the appserver LVS pool on lvs3
  • 17:21 Fred: restarting apache on webservers (220,221,222,224)
  • 16:45 RobH: wipe running on adler and amane, and they have been removed from puppet and dsh node groups
  • 16:12 logmsgbot: jeluf synchronized docroot/bits/index.html
  • 15:41 mark: Setup ports ge-2/0/0 to ge-2/0/20 for search servers on asw-b-sdtpa
  • 15:03 mark: Shutdown BGP session to AS1257 130.244.6.249 on port 2/5 of br1-knams, preparing for cable move
  • 13:08 mark: Recovered backend squid on knsq11
  • 12:53 mark: Reassembling RAID arrays md0 and md1 on knsq11
  • 12:40 mark: Running apt-get upgrade && reboot on amssq31
  • 11:17 mark: Shutdown knsq1 and knsq11 for swapping drives
  • 09:34 logmsgbot: catrope synchronized php-1.5/extensions/TitleBlacklist/TitleBlacklist.hooks.php 'r70933'
  • 09:08 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24710: Enable $wgVectorShowVariantName on srwiki'
  • 06:00 logmsgbot: jeluf synchronized php-1.5/cache/interwiki.cdb 'Updating interwiki cache'
  • 05:42 JeLuF: Bug 24736 - Update wikimania.wikimedia.org to point to wm2011
  • 02:43 RobH: dataset1 back online, serving http, any jobs in process are borked (sorry ariel!)
  • 02:39 RobH: dataset1 unresponsive to physical console, serial console, had to do a hard reset
  • 02:05 RobH: dataset1 crashed while querying its raid controller about a bad disk, in route to dc to fix.

August 11

  • 21:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24561 - Quiz on Polish Wikibooks'
  • 21:53 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '21375 - new wiki as internal working space for the fiwiki arbcom'
  • 21:48 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24486 - Create Appendix namespace on the Luxembourgish Wiktionary'
  • 21:46 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24374 - Create new usergroups at commons'
  • 21:28 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24460 - Set up transwiki import for lb.wiktionary'
  • 21:10 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24626 - Add an "autopatrolled" status for frwiktionary'
  • 21:06 RobH: db16 will remain offline until replacement parts arrive from Sun. rt#54
  • 21:02 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24570 - Request'
  • 20:42 mark: Fixed ganglia mess on ms1
  • 20:37 mark: Started rsync of ms2:/a to ms1:/a
  • 20:32 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24685 - Author, Index, Page namespace for id.wikisource'
  • 20:30 mark: FLUSH TABLES WITH READ LOCK on ms2
  • 20:27 mark: Readded spare /dev/mdak1 to /dev/md1 on ms1. Why do spares go missing all the time...
  • 20:26 mark: Upgraded ms1 to Lucid, rebooted it
  • 20:26 RobH: working on db16
  • 20:24 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24719 - Extension'
  • 20:07 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 20:07 domas: promoted db5 to slave on s4
  • 19:53 mark: Upgrading ms1 to Lucid
  • 19:44 mark: Readded missing spare drive to /dev/md1 on ms1
  • 17:25 logmsgbot: robh synchronized php-1.5/wmf-config/mc.php 'removed srv95 as it has temp warnings and is going to go away soon.'
  • 16:41 RobH_dc: pulled network on srv110 and started wipe, byebye
  • 16:35 RobH_dc: db19 eth1 disconnected per dc tasks
  • 16:24 RobH_dc: ms1 power cable was messing it up, rerouted the cable to be securely in place and system is now operating normally (no more sudden shutdowns hopefully) there was no evidence of hardware failure in logs, but power issues, so this should fix it.
  • 16:02 RobH_dc: working on ms1
  • 15:22 RobH_dc: bad disk replaced in db7, raid is currently rebuilding, system still online.
  • 15:10 RobH_dc: pulled hdd5 from db7 for replacement
  • 15:02 mark: Shutdown clematis for decommissioning
  • 14:18 RobH: knsq11,knsq12,knsq13 are post os reinstall, pre squid deployment config, will finish them in a bit
  • 14:07 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 24441 - Enable Rollback in Quechua Wikipedia'
  • 13:33 RobH: knsq11-knsq13 coming down for reinstallation
  • 12:23 Tim: deployed non-threaded version of imagemagick on all image scalers
  • 11:43 logmsgbot: tstarling synchronized php-1.5/includes/media/Bitmap.php 'OMP_NUM_THREADS=1'
  • 11:21 mark: Reconfigured wikimedia-lvs-realserver on hume, so wikimedia-task-appserver install succeeds
  • 11:19 logmsgbot: tstarling synchronized php-1.5/includes/media/Bitmap.php 'reduced magick memory limit from 100M to 50M to stop hanging with vsize limit 300M'
  • 10:46 mark: Removed pattern check from nagios check_http
  • 09:42 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php
  • 09:38 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php
  • 09:35 Tim: rebooting srv223, went OOM and mostly died
  • 09:32 logmsgbot: tstarling synchronized php-1.5/includes/media/Bitmap.php 'temporary patch to stop scalers going OOM'
  • 09:19 Tim: temporarily increased memory limit on the image scalers, since the new convert tends to hang when it runs out of memory instead of crashing nicely
  • 09:17 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'more memory for image scalers'
  • 08:56 Tim: upgrading imagemagick on image scalers to 6.6.2.6-1wm1, package recently committed to svn
  • 02:48 Tim: on techblog, disabled WP_DEBUG since it was messing up the admin panels with E_NOTICE messages
  • 02:42 Tim: disabled WP-SpamFree on techblog due to bug 19540

August 10

  • 23:12 Fred: upgraded Tridge to Lucid. Now rebooting.
  • 22:04 RobH: knsq10 back online
  • 20:59 RobH: knsq10 reinstalling
  • 20:44 RobH: knsq9 online
  • 19:37 RobH: handed off knsq8 to mark, reinstalling knsq9
  • 19:02 ^demon: disabled svn post-commit hook for parser tests, long-since broken
  • 18:57 mark: Stopping backend squid on amssq60 for testing
  • 15:24 RobH: knsq8 reinstalled, not yet online, will push online shortly
  • 14:56 mark: Setup RT on rt.wikimedia.org (streber)
  • 14:32 RobH: knsq30 online and in cluster, knsq8 coming down for work
  • 14:18 RobH: updated wordpress versions on blog.wikimedia.org and techblog.wikimedia.org
  • 13:35 RobH: finishing install on knsq30
  • 12:50 Tim: installed schroot on stafford, for hardy versions of uupdate etc.
  • 11:19 mark: Fixed broken hourly cron job mw-serve
  • 11:18 mark: Changed su www-data into su mwlib in cleanup cronjob on pdf1
  • 10:23 mark: Removed broken daily system health report on srv178
  • 10:22 mark: Removed broken daily system health report on db4
  • 07:13 logmsgbot: andrew synchronized php-1.5/extensions/CommunityApplications/SpecialCommunityApplications.php 'Merge r70798'
  • 07:13 logmsgbot: andrew synchronized php-1.5/extensions/CommunityApplications/CommunityApplications.i18n.php 'Merge r70798'
  • 04:02 RobH: knsq30 set to false in pybal, install half done, will finish tomorrow morning.
  • 02:44 RobH: knsq29 online and in cluster
  • 02:30 RobH: knsq30 reinstalling
  • 00:09 RobH: knsq28 back online
  • 00:03 RobH: knsq27 back online
  • 00:03 RobH: knsq29 reinstalling

August 9

  • 23:34 RobH: knsq28 reinstalling
  • 23:32 RobH: knsq26 online
  • 23:32 RobH: knsq25 online
  • 23:12 RobH: continuing reinstallation, ignore errors for knsq27, reinstalling
  • 22:33 RobH: knsq23, knsq24 back online, knsq25, knsq26 still being reinstalled, knsq27-30 still online not yet reinstalled
  • 21:51 mark: Added Nagios router interfaces check for br1-knams (using puppet)
  • 21:11 mark: Unmounted /dev/sda6 (/a) on srv171, replaced it by /dev/mapper/nonredundant-data (LV with the same data and more space)
  • 21:02 RobH: knsq24, knsq25, knsq26, knsq27 coming down for reinstall and puppetfication
  • 20:49 RobH: knsq23 reinstall done and pushed back into cluster
  • 18:40 mark: Running apt-get upgrade on db9
  • 18:29 mark: Fixed ganglia mess on sq45
  • 18:24 mark: Powercycled sq45
  • 18:18 mark: Added a new MegaCli64 to wikimedia-raid-utils, made check-raid.py use it instead (we have all 64 bit servers anyway), and deployed the new package to the repository. Puppet will upgrade it everywhere.
  • 16:26 Fred: fixed DPKG issue on transcode... another one of those conflicting gmond install
  • 16:14 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24735: Sanitize private/fishbowl config'
  • 16:03 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24732: Portal and Book namespaces for yowiki'
  • 13:59 mark: srv110 decommissioned itself
  • 13:55 RobH: knsq23 coming down for reinstallation
  • 13:31 mark: Changed broken HTTP nagios check for Squid on brewster into a TCP port check
  • 13:28 mark: Stopped MySQL on srv171, created LVM PV,VG and LV on unused drive /dev/sdb. Copying MySQL data onto it.
  • 13:09 mark: START SLAVE on srv171 to get rid of relay binlogs
  • 12:56 mark: Shutdown db3 for decommissioning
  • 12:56 RoanKattouw: Mark 12:53 Shutdown db2 for decommissioning
  • 12:56 RoanKattouw: 12:52 mark synchronized php-1.5/wmf-config/db.php 'Remove db3 from rotation, decommissioning'
  • 12:55 RoanKattouw: Mark 12:47 Power cycled pdf3, out of memory
  • 12:55 RoanKattouw: Mark 12:44 Restarted Apache on srv91
  • 12:55 RoanKattouw: Mark 12:39 Relaxed NTP peers check for dobson and linne (NTP servers)
  • 12:55 RoanKattouw: Mark 12:36 Shutdown adler for decommissioning
  • 12:54 RoanKattouw: Mark 12:18 Made disk space on mchenry by DELETING LOTS OF OLD BACKUPS
  • 12:53 RoanKattouw: Restarted morebots

August 8

August 7

  • 11:58 logmsgbot: mark synchronized php-1.5/wmf-config/db.php 'New master: db18, r/w'
  • 11:57 mark: Changed master for s3 to db18 on db11, db27, db25
  • 11:49 mark: New master db18 log position: db18.bin.001 pos 79
  • 11:32 logmsgbot: mark synchronized php-1.5/wmf-config/db.php 'Setting s3 to read-only'
  • 11:21 mark: Stopping mysql on db17
  • 11:21 mark: For reference, SHOW SLAVE STATUS on db18 before the switch:
      Master_Log_File: db17-bin.368
  Read_Master_Log_Pos: 650276717
       Relay_Log_File: db18-relay-bin.048
        Relay_Log_Pos: 650276247
Relay_Master_Log_File: db17-bin.368
     Slave_IO_Running: No
    Slave_SQL_Running: Yes
  • 11:02 RoanKattouw: All s3 slaves down, master serving all read load and getting overloaded
  • 11:00 RoanKattouw: db17 (s3 master) has full disk

August 6

  • 22:59 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 22:59 logmsgbot: catrope synchronized php-1.5/includes/GlobalFunctions.php 'r70605'
  • 22:58 logmsgbot: catrope synchronized php-1.5/skins/vector/main-rtl.css 'r70605'
  • 22:56 logmsgbot: catrope synchronized php-1.5/skins/vector/main-ltr.css 'r70605'
  • 15:43 logmsgbot: catrope synchronized php-1.5/languages/messages/MessagesCs.php 'r70573'
  • 15:08 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24688: Namespace aliases for kowiki'
  • 10:45 logmsgbot: catrope synchronized php-1.5/extensions/WikimediaMessages/WikimediaLicenseTexts.i18n.php 'r70550'
  • 10:11 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Add foundationwiki addgroups in correct section'

August 5

  • 21:14 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24678: Set $wgAddGroups, $wgRemoveGroups on foundationwiki'
  • 14:30 RobH: replacing certificate file on sanger

August 4

  • 22:57 rainman-sr: search1 somehow got stuck, restarting
  • 22:42 Fred: restarted Nagios bot
  • 16:28 mark: Reverted Fred's automatic security upgrades in puppet
  • 16:25 mark: base::puppet and base::apt were not being included on every Linux host, fixed the case statement in the Puppet base class
  • 11:35 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24652: Allow bureaucrats to add/remove communityapps group on officewiki'

August 3

  • 22:58 mark: Changed /etc/default/exim4 to make exim listen on SMTP
  • 22:56 mark: s/srv9/grosley/ on /etc/exim4/exim4.conf on grosley
  • 21:20 RobH: grosley exim config was overwritten when converted to puppet control (default install uses simple exim setup now). defined host specifically, adding in ganglia details and removing exim control from puppet
  • 20:21 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24652: Archive namespace for officewiki'
  • 20:10 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 20:09 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/WikiEditor/WikiEditor.combined.min.js 'r70409'
  • 15:59 logmsgbot: catrope synchronized php-1.5/languages/messages/MessagesEo.php 'r70387'
  • 14:24 RobH: knsq7 coming down for reinstall
  • 14:24 RobH: kndq7 coming down for reinstall
  • 04:57 Andrew: scap
  • 04:57 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 70064
  • 04:55 Andrew: Preparing to update LiquidThreads alpha to trunk with r70106 and r70100 unmerged.
  • 01:30 mark: Restored exim config on williams (OTRS)
  • 00:35 RobH: otrs emails still bouncing, working on it.
  • 00:31 RobH: exim was sitting idle? on williams for otrs delivery after complaining of a failed database connection. restarted exim, it appears to be working on the delivery backlog now, will check back on it in 30 minutes or so

August 2

  • 23:55 guillom: afaict, all emails sent to OTRS are being rejected with the "retry time not reached for any host after a long failure period" message. The issue seems to have started a few hours ago.
  • 22:34 tomaszf: upgrading civicrm to 3.1.6
  • 18:20 RobH: pulling knsq6 for reinstallation and such
  • 17:47 RobH: sq33-sq40 reinstalled, online, serving requests
  • 15:54 RobH: reinstalling sq34-sq40
  • 14:55 mark: Installed gmetad on streber for collecting ganglia information in local RRDs
  • 14:34 mark: Upgraded streber to Ubuntu 10.04
  • 14:24 RobH: restarted pdns on linne
  • 14:20 RobH: updated dns for tesla host ci
  • 12:46 mark: Fixed ganglia package mess on search12-20
  • 12:10 mark: Depooled all text squids from the bits.esams LVS pool
  • 12:02 mark: Reinstalled knsq2 and knsq5 with Ubuntu 10.04, set them up as Varnish bits caches, and pooled them in LVS
  • 11:50 mark: Fixed ganglia mess on kaulen
  • 11:43 mark: Added missing Wikimedia APT repository to kaulen. Why was it not there? Was this host installed in some nonstandard way?
  • 11:35 mark: Set up exim::simple-mail-sender classes for kaulen
  • 11:35 mark: Fixed puppet for snapshot*, base classes were not included
  • 11:19 mark: Turned off knsq3; broken HBA and out of warranty
  • 10:55 mark: Started puppetd on spence inside gdb

August 1

  • 17:21 mark: Reinstalled linne.wikimedia.org with Lucid
  • 15:41 mark: Fixed the haproxy for puppetmaster on brewster, was broken by the upgrade
  • 15:19 mark: Upgraded brewster to Ubuntu 10.04 Lucid
  • 15:18 mark: Removed Wikimedia repository default pinning on brewster, as it's doing more harm than good
  • 15:07 mark: Explicitly install and deinstall gmond / ganglia-monitor packages in puppet, depending on the ubuntu version
  • 14:52 mark: Removed broken amanda backup client from brewster. Broken package install/dpkg state, no logs, no documentation

July 31

  • 17:31 mark: Setup bits.esams varnish cluster, pooled knsq4 (varnish) with all the text squids
  • 17:06 mark: Reinstalled knsq4 with lucid, redeploying it for varnish
  • 16:24 mark: Depooled knsq1-knsq4 in squid config

July 30

  • 20:00 domas: hotfixed db12 build to have faster mysqldumps
  • 19:01 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 24466 reversion'
  • 18:29 RobH: gracefulled apaches a few times, had a rogue typo
  • 17:45 RobH: dns push successful, all nameservers still online and up to date
  • 17:45 RobH: updated dns for movementroleswiki
  • 17:43 logmsgbot: robh ran sync-common-all
  • 16:51 RobH: added movement roles information to apache, still setting up other stuff for it
  • 05:53 Tim: svn up/scap r70064

July 28

  • 15:26 Rob: running updates on sockpuppet
  • 13:08 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 70064
  • 12:57 Andrew: about to update LiquidThreads alpha to trunk state
  • 07:54 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'adding 1.16 branch to ExtensionDistributor'
  • 05:01 apergos: added bayes to list of clients for exports on dataset1; added /data to fstab on bayes for stats use
  • 04:23 logmsgbot: tstarling synchronized php-1.5/includes/api/ApiBase.php
  • 04:22 logmsgbot: tstarling synchronized php-1.5/includes/api/ApiMain.php
  • 04:22 Tim: deploying r70063 and r70064 to fix API fatals
  • 02:09 logmsgbot: root synchronizing Wikimedia installation... Revision: 70061
  • 02:04 Tim: doing svn up/scap to r70061, to get the API cache header fix

July 27

  • 23:27 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24466'
  • 21:39 mark: Added wikimedia-base to the standard packages list in puppet
  • 21:34 Rob: pushed hu language file change on survey.wikimedia.org
  • 21:16 Rob: singer is being a pain, ssh isnt running, so forth... working on it
  • 19:07 Rob: racktables physical audit of pmtpa done (ignore the bottom of rack a5, will remove them later today when I have a damned mouse)
  • 18:49 Rob: physical audit pmtpa row b complete
  • 18:40 Rob: physical audit pmtpa row c complete
  • 18:18 Rob: uplink successfully moved for asw-b5-sdtpa
  • 18:17 Rob: moving the uplink
  • 18:17 Rob: all the apaches in sdtpa-b5 are migrated from old to new asw-b5-sdtpa.
  • 18:05 Rob: bugzilla admin note: disregard the email bounces for the scireview domain user, their mail server will be fixed shortly
  • 17:58 Rob: srv284 fixed, drac online, needs setup
  • 17:54 Rob: !log srv281 was shutdown when I came into the DC. popped case to use as base for rebuilding another system. need to investigate its initial shutdown
  • 17:17 mark: Reinstalled sq68
  • 16:41 atglenn: started up several more workers on snapshot3 doing xml dumps
  • 14:43 Tim: running svn cleanup on some ExtensionDistributor working copies
  • 14:29 Tim: cleaning up old ext-dist tarballs, removing all that were older than a month
  • 11:48 mark: Stracing puppet on spence
  • 08:01 apergos: started one thread of xml dumps from screen session on snapshot3 as root, if these look good tomorrow we'll crank up more of 'em

July 26

  • 18:29 Rob: moved up to srv268, all working
  • 18:25 Rob: moved srv258-srv261 to new asw-b5-sdtpa ports, all seems shiny
  • 17:59 Rob: hooked up additional ports for sq67/sq68
  • 17:17 mark: Installed Lucid on sq69 and sq70
  • 14:41 mark: Moved db19 back to vlan 2
  • 14:34 mark: Pooled sq68 in the bits.pmtpa varnish LVS pool
  • 14:06 mark: Installed lucid on sq68 and deployed varnish for bits.pmtpa
  • 12:22 mark: Started puppet on spence
  • 05:21 Tim: removed a log on search12 to give it a tiny bit more space

July 25

  • 21:55 mark: Pooled sq67 (Varnish) in the bits.pmtpa LVS pool along with the text squids
  • 21:54 mark: Moved bits.pmtpa.wikimedia.org DNS from 208.80.152.2 (text squids) to 208.80.152.118 (dedicated LVS service)
  • 21:21 mark: Started puppetd on spence
  • 20:50 mark: Setup Nagios monitoring for bits.pmtpa
  • 20:33 mark: Added LVS service for bits.pmtpa.wikimedia.org on lvs4
  • 19:53 mark: Added LVS service ip 208.80.152.118 (bits.pmtpa.wikimedia.org) to all text squids and sq67
  • 19:18 mark: Fixed ganglia varnish monitoring on sq67
  • 13:01 domas: thumbs get i/o errors across multiple clients
  • 10:42 domas: xmltypecheck loop filled / with error logging on srv243 ( http://p.defau.lt/?KtZMFW9x7xaEa2eDwuqMEw ) - had to do some cleanup. didn't livehack anything yet.

July 24

  • 21:31 mark: Deployed varnish with configuration for bits.wikimedia.org on sq67 - not active yet
  • 03:59 apergos: running xml dump file consistency checking script on snapshot2 and snapshot3 in screen as root, expect these to run all night, maybe through the next day as well

July 23

  • 19:17 Rob: synced for Bug 24470 - Enable NewUserMessage extension on lv.wikipedia
  • 19:17 logmsgbot: root synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 18:21 tomaszf: moving otrs-783-529404994.sql to tridge to free up space
  • 18:21 kaldari: Backing up dev_civicrm database in preparation for drupal upgrade
  • 11:33 RoanKattouw: Started Apache on srv278, had died mysteriously
  • 09:30 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24505: Editprotected group for hiwiki'
  • 02:02 Rob: dig tested against all three nameservers after update, all nominal
  • 01:59 Rob: pushing dns update to fix pointer for bugs.wikimedia.org
  • 01:30 Tim: running some statistics queries on db38

July 22

  • 19:31 Rob: mail server works, bypass expiration issue, rob will fix when all of the employees are not hitting the mail server.
  • 18:13 logmsgbot: kate synchronized php-1.5/wmf-config/db.php 'put ixia back'
  • 17:50 Rob: wmf mail server cert expired, we know, working on replacing it now
  • 17:29 mark: Fixed ganglia data sources by qualifying all hostnames in gmetad_pmtpa.conf; this must have broken by the resolv.conf change last week
  • 17:15 logmsgbot: catrope synchronized php-1.5/skins/MonoBook.php 'r69735'
  • 17:15 logmsgbot: catrope synchronized php-1.5/skins/Vector.php 'r69735'
  • 17:11 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Hide Navigable TOC preference'
  • 17:10 atglenn: xml dumps stopped in preparation for removal of bad dumps and xml code fix push
  • 17:09 atglenn: snapshot3 added back to sync cluster in preparation for xml dumps fixes
  • 16:50 logmsgbot: kate synchronized php-1.5/wmf-config/db.php 'put db27 back'
  • 16:37 mark: Installed Lucid on sq67
  • 16:26 mark: Made Lucid the default distribution for new installs
  • 16:21 mark: Fixed sq73
  • 15:56 mark: Fixed ganglia cron job on spence.
  • 15:46 logmsgbot: kate synchronized php-1.5/wmf-config/db.php 'put db7 back'
  • 15:29 RoanKattouw: Looks like that resolved the zombie issue on srv86
  • 15:15 mark: Fixed puppet for app servers
  • 15:05 RoanKattouw: Defunct Apache process running on srv86, Apache won't start. Need root to kill zombie
  • 15:04 RoanKattouw: Started Apache on srv270, srv163, had died for some reason
  • 15:04 RoanKattouw: Deployed r69728, r69729 (UsabilityInitiative, Vector updates) about 30 minutes ago
  • 15:03 RoanKattouw: <RoanKattouw> !log Deploying r69728, r69729 (UsabilityInitiative, Vector updates) to test.wikipedia.org
  • 15:03 RoanKattouw: !log Upgraded Locke to Ubuntu 10.04 Lucid
  • 15:03 RoanKattouw: <logmsgbot> !log kate synchronized php-1.5/wmf-config/db.php 'removed ixia to dump s4 for TS'
  • 15:03 RoanKattouw: <logmsgbot> !log kate synchronized php-1.5/wmf-config/db.php 'removed db7 to dump s6 for TS'
  • 15:03 RoanKattouw: <logmsgbot> !log kate synchronized php-1.5/wmf-config/db.php 'removing db27 to dump s3 for TS'
  • 15:02 RoanKattouw: Started morebots as catrope. werdnum has a defunt instance running that I can't kill

July 21

  • 20:02 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/ui/CodeRevisionListView.php 'r69705'
  • 19:44 Rob: all 5 new misc servers have mgmt working, will allocate network port in a bit and get them loaded up
  • 19:43 mark: Setup logging for Special:Book on locke
  • 19:42 Rob: all dns servers back online and happy
  • 19:38 Rob: grosley back online providing services
  • 19:30 Rob: grosley was down due to my dns issue, rebooting it, sorry about that.
  • 19:29 Rob: restarted pdns on linne
  • 19:25 Rob: fixed letter reversal in dns, repushed
  • 19:13 Rob: linne pdns restarted
  • 19:06 Rob: updating dns for 5 new misc servers mgmt
  • 16:35 Rob: lots more time wasted diagnosing evident mainboard failures on srv284, hopefully replacement will soon be inroute
  • 16:10 mark: Done basic setup of asw-b-sdtpa, added to RANCID and torrus
  • 14:27 Rob: ms1 is back, passing it off to someone who doesnt hate it
  • 14:06 Rob: working on ms1, no touchy
  • 09:56 logmsgbot: catrope synchronized php-1.5/includes/api/ApiLogin.php 'r69661'
  • 02:23 Tim: increasing nagios retry count for NTP from 8 to 15

July 20

  • 15:55 Rob: pushed blog and techblog updates on existing plugins, core wordpress, but not themes (because I dont feel like rehacking our css)
  • 15:41 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 24374 - Create new usergroups at commons'
  • 15:35 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 24364 - Install Extension:Collection for PDF export on foundation wiki'
  • 15:16 Rob: synced for bug 24449 disable file talk pages on cs wikis (since they all have upload disabled)
  • 15:15 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 14:34 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'noboard_chapterswikimedia logo'
  • 14:16 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Change dbname noboardwiki -> noboard_chapterswikimedia'
  • 14:12 logmsgbot: robh ran sync-common-all
  • 13:54 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'communityapplications is too long for user_groups table'
  • 13:48 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 69504
  • 13:48 Andrew: scapping to deploy CommunityApplications extension
  • 13:47 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'New group for viewing community applications'
  • 13:42 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'New group for viewing community applications'
  • 13:32 logmsgbot: robh ran sync-common-all
  • 12:21 logmsgbot: andrew synchronized php-1.5/extensions/CommunityHiring/SpecialCommunityHiring.php 'r69606'
  • 12:20 logmsgbot: andrew synchronized php-1.5/extensions/CommunityHiring/CommunityHiring.php
  • 12:20 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Configuration changes for r69606'
  • 11:53 Andrew: Adding CommunityHiring tables to officewiki database

July 19

  • 20:05 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 24448 - w:cs: upload settings'
  • 18:58 aZaFred_OSCON: started slave on db10 (sync had stopped on July 8th)
  • 16:59 Rob: upded tfinc email quota cuz i was tired of seeing the alerts to postmaster
  • 13:09 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24440: New namespaces for dawikisource'
  • 10:20 logmsgbot: midom synchronized wmf-deployment/cache/trusted-xff.cdb
  • 08:32 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 24435: Enable $wgBlockAllowsUTEdit on hiwiki'
  • 07:32 Tim: changing max retry count on NTP nagios monitoring to avoid constant flapping
  • 06:45 Tim: restarted ntpd on mchenry, wasn't responding on IPv4

July 18

  • 21:26 domas: fixed opensearch caching, unbreaked API
  • 21:18 logmsgbot: midom synchronized php-1.5/includes/api/ApiMain.php 'fix broken API caching'
  • 15:10 logmsgbot: mark synchronizing Wikimedia installation... Revision: 69504
  • 14:59 mark: Making puppet upgrade wikimedia-raid-utils on all servers
  • 12:58 logmsgbot: catrope synchronized php-1.5/extensions/WikimediaMessages/WikimediaLicenseTexts.i18n.php 'r69504'
  • 12:51 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'New groups for hiwiki (bugs 24416, 24417, 24418, 24419)'
  • 08:13 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'db13 and db15 go live'

July 17

  • 22:01 mark: Apparently package wikimedia-raid-utils truncated /etc/sudoers on many hosts; having puppet put a proper sudoers file back on the application servers
  • 21:42 mark: Fixed gmond mess on search12
  • 21:03 RoanKattouw: Importing another batch of ~10,500 files (~22GB) concurrently with the first one
  • 19:38 RoanKattouw: Import is running in a screen on fenari
  • 19:38 RoanKattouw: Importing ~10,500 files (aggregate size ~21GB) with importImages.php for commons:User:OrdnanceSurveyBot
  • 18:49 logmsgbot: catrope synchronized php-1.5/languages/messages/MessagesLad.php 'r69487'
  • 18:48 mark: Having puppet install wikimedia-raid-utils on all servers
  • 18:12 mark: srv278 rebooted for no apparent reason
  • 17:21 mark: Fixed ganglia mess on search19
  • 16:19 mark: Having puppet install NRPE on all internal servers by default
  • 15:52 logmsgbot: root synchronized php-1.5/wmf-config/lucene.php 'Use fixed LVS service search-pool3 instead of search7 directly'
  • 15:31 mark: Removed fscking nscd from spence
  • 15:20 mark: Added internal LVS services to pmtpa.wmnet DNS
  • 15:11 mark: Doing reboot test of nescio (esams DNS)
  • 15:08 mark: Fixed puppet exchange of ssh hostkeys which has been broken for a while
  • 14:30 mark: Truncated all tables in the puppet db on db9
  • 12:30 mark: Implemented domain search list in resolv.conf for fenari (use sparingly!)
  • 12:24 mark: Put /etc/resolv.conf under Puppet management on all servers. Setting timeout option to 3s, to avoid PyBal depools due to 5s timeout when the primary resolver is down.
  • 11:38 mark: Removed rkhunter and chkrootkit on bayes. What is the point with 2 year old software? Just creating more cron spam? :)
  • 10:06 mark: Fixed degraded raid on nescio
  • 09:53 mark: Removed X.org and gdm from bayes. Why was it installed?
  • 09:30 domas: doing maintenance on db13 and db15 (BIOS, OS, MySQL upgrades, resync of data)

July 16

  • 21:05 mark: Removed old-style DNS monitoring from Nagios conf.php, now fully Puppet managed
  • 21:03 mark: Deployed authoritative DNS on nescio.esams, and moved the service IP
  • 19:32 mark: Deployed PowerDNS recursor on nescio, and moved the 91.198.174.6 service ip to it
  • 19:16 mark: Installed lucid on new server nescio.esams
  • 13:49 mark: Fixed entry in upload.wikimedia.org georecord, sent to text squids for a few ip ranges
  • 12:27 mark: Pointed country code 'eu' (127.0.255.1 or 65281) to esams in geodns; since geobackend uses a signed short, I had to mask it to 0x7ff / 32513 in the director map
  • 10:49 mark: Shutdown fuchsia for decommissioning
  • 07:57 logmsgbot: catrope synchronizing Wikimedia installation... Revision: 69416
  • 07:56 RoanKattouw: Running scap to deploy r69416
  • 01:48 logmsgbot: tstarling synchronized php-1.5/includes/diff/DifferenceInterface.php 'r69414'

July 15

  • 19:23 mark: Replaced all occurrences of 'rr.wikimedia.org' with 'text.wikimedia.org' in DNS
  • 19:14 mark: Updated IP of deprecated record rr.esams.wikimedia.org
  • 19:10 mark: Started PyBal on amslvs1 with a new config; it automatically picked up the traffic for both text.esams (91.198.174.232) and bits.esams (91.198.174.233)
  • 19:07 mark: Stopped PyBal on amslvs1, BGP and OSPF did an automatic failover of bits.esams (91.198.174.233) to amslvs3
  • 18:59 mark: Removed IP 91.198.174.2 (old text squids service ip) from amslvs1. Anyone still using the old IP after weeks will now be unable to reach our sites.
  • 18:56 mark: Depooled knsq1-knsq7 in PyBal
  • 17:38 Fred: fixed nfs mounts on Bayes.
  • 15:35 apergos: chowned /mnt/upload6/private/ExtensionDistributor/mw-snapshot/trunk/extensions tree to extdist. ExtensionDistributor apparently working now
  • 15:01 apergos: running svn cleanup on /mnt/upload6/private/ExtensionDistributor/mw-snapshot/trunk/extensions as extdist user
  • 12:34 logmsgbot: tstarling synchronizing Wikimedia installation... Revision: 69381
  • 12:18 Tim: svn up/scap to r69380
  • 05:13 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace'

July 14

  • 23:44 Fred: re-added ccron job to periodically save rrds on our ganglia server. (cron job seems to have vanished for some reason)
  • 17:59 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Favicon for wikimaniateamwiki per Guillaume'
  • 16:06 Fred: restarted apache on mobile1 (had begun to return 500)
  • 14:07 mark: Fixed memcached on srv110
  • 12:19 mark: Fixed ganglia and puppet on stafford
  • 11:54 mark: Migrated DNS monitoring to puppet
  • 10:31 mark: Migrated ZFS RAID nagios check to puppet
  • 10:14 mark: Migrated monitoring of lucene to puppet
  • 09:37 mark: Migrated monitoring of image scalers to puppet
  • 08:49 Tim: using stafford for some pbuilder experimentation

July 13

  • 22:02 mark: Migrated monitoring of application servers to Puppet
  • 20:29 mark: Fixed puppet on ms4
  • 20:16 mark: Hacked up nagios conf.php to not create host entries for most servers (now in puppet), except special cases
  • 19:58 mark: Hacked up nagios conf.php to not create host entries
  • 16:51 mark: Migrated Squid Nagios monitoring to puppet, commented some functionality in nagios conf.php
  • 15:51 mark: Split puppet nagios config over multiple files

July 12

  • 16:54 Fred: changed LONGQUERIES check threshold
  • 16:08 Fred: restarting morebots since it had died.
  • 16:08 Fred: restarting Nagios since it was down.
  • 14:29 mark: Added "cfg_file=/etc/nagios/puppet_hosts.cfg" to nagios.cfg
  • 13:25 JeLuF: added disk space monitoring for apaches
  • 12:51 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24306 - Create namespaces for Lithuanian Wiktionary'
  • 12:48 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace'
  • 12:46 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace'
  • 12:41 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24344 - Namespace changes - si.wiktionary'
  • 11:45 JeLuF: fixed broken ganglia-metrics installation on srv146 (chown gmetric /var/log/gmetricd/gmetricd.log)
  • 11:41 JeLuF: added DPKG status monitoring for all app servers to nagios. Reports all packages that are not in state 'rc' or 'ii'.
  • 10:43 JeLuF: lots of false alerts from nagios due to missing SSL setup for NRPE. Working on it.
  • 09:53 JeLuF: changed puppet config to install nrpe on all app servers
  • 09:28 JeLuF: replacing opsview-nrpe agents by nagios-nrpe agents (image_scalers, some other apaches). Most apaches already use nagios-nrpe
  • 07:40 Tim: set up NRPE disk space monitoring on ms4, discovered that /mnt2 is full
  • 04:54 Tim: updated NFS host/service groups to monitor the actual NFS servers, not a random collection of miscellaneous ex-NFS servers
  • 04:46 Tim: installed NRPE on nfs1 and nfs2
  • 04:08 Tim: adding rendering, m, bits.esams, recursor0, recursor1, recursor0.esams to nagios
  • 04:02 Tim: added forward DNS entry for recursor0.esams, modified reverse DNS entry resolver0.esams -> recursor0.esams
  • 03:55 Tim: fixed reverse DNS entries for recursor0 and recursor1, were set incorrectly to non-existent hostnames "resolver0" and "recursor1"
  • 03:36 Tim: renamed db6.mgmt to locke.mgmt

July 10

  • 14:14 rainman-sr: search7 disk was full, deleting some old unneccessary indexes
  • 12:50 Fred: applied security updates on all machine running Karmic or Lucid (per USN-959-1)

July 9

  • 18:07 domas: forgot to log, rebooted locke, put startup stuff to rc.local, maybe Tim changed it afterwards, hehe. beer is good too.
  • 15:31 Rob: wikimania2011wiki is now using vector
  • 15:31 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php
  • 12:48 logmsgbot: robh ran sync-common-all
  • 01:06 logmsgbot: tstarling synchronized php-1.5/includes/filerepo/RepoGroup.php
  • 01:04 logmsgbot: tstarling synchronized php-1.5/includes/filerepo/RepoGroup.php
  • 01:04 logmsgbot: root synchronized php-1.5/includes/filerepo/RepoGroup.php
  • 01:03 logmsgbot: tstarling synchronized php-1.5/includes/filerepo/RepoGroup.php
  • 00:59 logmsgbot: tstarling synchronized php-1.5/includes/filerepo/RepoGroup.php

July 8

  • 22:27 apergos: powercycled db9 fromm drac after shutdown failed
  • 22:20 Fred: re-imaging srv225 back to normal until wikimedia-task*can be ported to lucid.
  • 22:15 apergos: rebooting db9, mysqld was defunct but the port was in use so couldn't restart it the nice way
  • 17:06 mark: Set temporary 91.198.174.0/24 null0 route on br1-knams, to investigate prefix announcement problems
  • 16:10 Rob: updated puppet to add zak to the mortals admin group and allowed access to shell on fenari as non-root
  • 04:10 Tim: starting upload of BnF images, using importImages.php in screen on fenari

July 7

  • 21:38 Fred: RIP sfoservices. (box not booting at all anymore)
  • 17:10 Fred: re-imaging srv225 to the apache cluster.
  • 16:26 mark: Fixed puppet on srv193
  • 15:56 mark: Fixed horrible gmond mess on searchidx1
  • 15:42 mark: Fixed puppet on sr255
  • 15:30 mark: Mounted /mnt/upload6 on srv255
  • 14:23 mark: Fixed /home backup on nfs1/nfs2 to tridge
  • 07:57 Rob: srv193 is refusing to take my updates, removed it from pybal so it doesnt serve out of data information
  • 07:53 logmsgbot: robh ran sync-common-all
  • 07:50 Rob: updated dns for wikimania wiki
  • 07:38 Rob: adding wikimaniawiki apache support, sycning lots of apaches and docroots.
  • 01:49 Tim: downloading the DjVu files via rsync/ssh for http://www.wikimedia.fr/wikim%C3%A9dia-france-signe-un-partenariat-avec-la-bnf

July 6

  • 13:43 mark: Fixed puppet on nfs1 and nfs2
  • 11:11 mark: Removed config cache on srv110
  • 10:32 mark: Fixed puppet on srv110
  • 10:26 mark: Stopped apache on srv110
  • 00:28 Tim: restarted mailman on lily
  • 00:23 Tim: killed all mailman processes on lily in an attempt to save it from swap death (swapping severely since 00:07)
  • 00:12 Tim: fixed stale /home on searchidx1 and restarted indexer
  • 00:02 Tim: codereview-proxy is up now. Pinging CR update API for all recent revisions

July 5

  • 23:55 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'new URL for codereview proxy'
  • 23:33 Tim: changed CNAME for codereview-proxy to kaulen
  • 23:20 Tim: moving codereview-proxy to kaulen to replace isidore (which is down)
  • 22:53 Tim: on srv124: remounted /home to fix test.wikipedia.org
  • 18:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php '24010 - id.wikipedia requesting FlaggedRevs'
  • 17:01 mark: Did BGP soft clear outbound on all AMS-IX sessions; no prefixes were being announced as of two weeks ago
  • 16:01 mark: Made puppet ensure apache is running on the app servers; running "sync-common" upon start
  • 15:38 mark: Fixed puppet on srv145
  • 11:39 mark: Remounted /home on hume
  • 11:30 logmsgbot: root synchronizing Wikimedia installation... Revision: 68850
  • 11:29 logmsgbot: mark synchronized php-1.5/wmf-config/CommonSettings.php 'CommonSettings.php out of sync on a few apaches'
  • 09:17 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24264 - Create a namespace aliases on zhwiki'
  • 06:26 ronabop: Kudos to the team who rebuilt a multi-hundred node system under extreme pressure.
Was there extreme pressure? :-) --domas
  • 06:23 Tim: fixed broken ircd auth configuration, irc.wikimedia.org now working again
  • 05:13 Tim: on browne: reinstalled udprec to fix IRC server
  • 04:16 Tim: switched nagios monitoring for search to less flappy TCP connection check instead of HTTP
  • 03:58 domas: s3 pos: db17-bin.321:0
  • 03:58 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 's3 rw'
  • 03:54 Tim: fixed search monitoring in nagios
  • 03:47 Tim: started lsearchd on search1-20
  • 03:44 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'rw s4 s4'
  • 03:42 Tim: fixed search1: just needed /home remounted
  • 03:38 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 03:32 domas: new repl positions, s2: db30-bin.000015:1227, s4: db16-bin.019:0
  • 03:17 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 03:13 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 03:04 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php 's3 fake master r/o'
  • 02:54 Tim: mysql status: s2 and s4 have replication broken with "Client requested master to start replication from impossible position". s3: still waiting for innodb recovery on master. Other clusters good.
  • 02:49 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php
  • 02:48 Tim: on db8: read_only=1 again and setting wiki to r/o
  • 02:46 Tim: on db8: read_only=0, started up r/o (s4)
  • 02:42 Tim: putting s2 into read-only mode due to replication issues
  • 02:35 RobH_: search server defaults to sitting on grub screen for search13-search20, will fix later, for now they are booting back up.
  • 02:30 Tim: fixed m.wikipedia.org on lvs4
  • 02:26 RobH_: search13 back up, working on the others
  • 02:24 mark: Moved bits.pmtpa to point to Text squids in DNS
  • 02:08 Tim: starting mysqld on a lot of DB servers
  • 02:08 RobH_: seems like a power outage, not an AC issue.
  • 02:07 RobH_: email back online
  • 01:52 Tim: on nfs1: river fixed the filesystem with fsck
  • 01:33 Tim: (about 5 minutes ago) started mysqld on db17
  • 01:26 Tim: on lvs4: removed dead squids from text list
  • 01:16 Tim: started mysql on db8
  • 01:15 Tim: started mysqld on db5
  • 01:11 Tim: power went off briefly again, lvs4 came back up properly this time, starting pybal on it again
  • 00:59 Tim: got LVS set up and working on lvs4
  • 00:56 Tim: s/nfs4/lvs4
  • 00:55 Tim: got nfs4 back online

July 3

  • 01:25 logmsgbot: midom synchronized php-1.5/wmf-config/db.php
  • 00:40 logmsgbot: midom synchronized php-1.5/wmf-config/db.php

July 2

  • 22:17 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'style version'
  • 22:14 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 68850
  • 21:47 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24119 - Change the logo image in Sinhala wiktionary'
  • 21:46 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24119 - Change the logo image in Sinhala wiktionary'
  • 21:42 logmsgbot: andrew synchronized php-1.5/skins/common/shared.css
  • 21:19 logmsgbot: andrew synchronized php-1.5/wmf-config/ExtensionMessages.php
  • 21:17 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 68850
  • 20:27 Fred: added replag nagios check for all slave DBs.
  • 19:56 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24025 - Enable NewUserMessage extension on ko.wikipedia'
  • 19:50 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24067 - New namespace for gl.wiktionary'
  • 19:40 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24048 - Change sitename for Osetian Wikipedia.'
  • 19:28 mark: Slave SQL thread stopped on db40 due to lock wait timeout (?), restarted
  • 17:58 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ParserFunctions.php
  • 17:47 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24157 - Enable RevisionMove on testwiki'
  • 11:30 JeLuF: mark: !log Upgraded asw-b3-sdtpa, asw-b4-sdtpa and asw-b5-sdtpa to newer JunOS
  • 00:54 Fred: starting MTA on list server again...
  • 00:51 Fred: Cause of spam: spammers using wiki@wikimedia.org as originating address. result: we are getting hit by the responses.
  • 00:46 Fred: purging mail queue for spam 'replies' on list server.
  • 00:21 Fred: mailing list server down while fixing .

July 1

  • 22:58 tomaszf: setting watchdog timer on db9 to 60sec for process kill
  • 21:08 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '17531 - Add subpage feature to the article namespace on nowikimedia'
  • 21:05 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '17510 - Give patroller-group access to suppressredirect on nowiki'
  • 20:57 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24065 - Add some interwiki links in Special'
  • 20:55 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24018 - Change the logo image in Sinhala wikibooks'
  • 20:53 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24138 - Logo of the fr.wikisource'
  • 20:49 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24205 - New namespace for de.arbcom'
  • 20:28 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/backend/CodeRevision.php 'r68850'
  • 20:27 logmsgbot: catrope synchronized php-1.5/extensions/CodeReview/CodeReview.php 'r68850'
  • 18:11 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ParserFunctions.php
  • 18:09 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
  • 18:09 Rob: every single access switch is now accessible via serial mgmt
  • 18:08 Rob: only switch in pmtpa-row a is asw-a2-pmtpa, which was not responsive to serial, fixed scs settings and properly labeled the port
  • 18:07 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/js/plugins.combined.min.js 'r68842'
  • 18:06 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/Vector/Vector.combined.min.js 'r68842'
  • 17:35 Rob: pmtpa-rowc switches connected to scs-c1-pmtpa
  • 17:03 Rob: asw-b1,asw-b2,asw-b3,asw-b4,asw-b5-pmtpa connected to scs-c1-pmtpa
  • 16:30 mark: Rob moved management fiber on pmtpa end from csw5-pmtpa:8/24 to msw1-pmtpa:0/1/1
  • 16:05 Rob: moved the primary serial mgmt interface of csw5-pmtpa from scs-ext to scs-c1-pmtpa.mgmt.pmtpa.wmnet, Also ran the permanent connections to the same serial console for msw1-pmtpa and mr1
  • 14:04 mark: Configured Exim to bypass spamd for wiki@wikimedia.org recipient
  • 13:51 mark: Restarted exim4 and spamassassin on mchenry
  • 07:51 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Second attempt at restoring thumb size on svwiki'
  • 07:48 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Change thumb size on svwiki back to 250px'
  • 04:49 Tim: on singer: configured "php_admin_flag engine off" in all planet vhosts
  • 03:24 Tim: added a user account for myself on filesrv1, in tech group, I figure I've been here long enough to deserve one

Archives