19:47 hashar: manually fixed some permissions rights that prevented automatic deployed of mediawiki-config.it Been broken since rougly Nov 21st at 7pm UTC.
November 19
16:17 hashar: applying role::ci::slave::labs::common class on deployment-parsoid2
16:07 manybubbles: rebuilding elasticsearch indexes to suck up configuration changes
16:03 manybubbles: running puppet on elasticsearch machines and restarting elasticsearch to suck up new configuration
November 18
11:20 hashar: Cleaned out Parsoid submodule: sudo su - mwdeploy then cd /home/wikipedia/common/php-master/extensions/Parsoid && git reset --hard origin/master && cd .. && git submodule update --init Parsoid
November 15
18:41 manybubbles: rebuilding Cirrus search indexes to have the 2 replicas like production
14:51 manybubbles: rebuilding search indexes using jobs for testing
14:09 hashar: rebooting both apaches
14:08 hashar: rebooting sql and sql02
14:05 hashar: upgrading mysql on -sql
November 14
22:54 hashar: upgrading packages on -jobrunner08
20:38 manybubbles: updating search indexes in labs
00:44 MaxSem: Rebooting deployment-solr, jetty (or java?) is FUBAR
14:14 hashar: deleting and reinstalling Parsoid node modules dependencies
14:13 hashar: changing Parsoid from 4 months old cdbfdbb to 986c1e7
13:47 hashar: upgrading varnish on all caches.
November 7
09:39 hashar: rebooting apache33 for kernel upgrade
09:38 hashar: rebooting apache32 for kernel upgrade
09:19 hashar: reenabling puppet on deployment-apache33
09:15 hashar: deleted sudo policy 'webadmins' only had petrb in it with no specific access.
09:14 hashar: removed sudo group 'admin', removing root access from any volunteers
09:08 hashar: Restarted bits varnish to clear out the cache.
November 6
12:09 hashar: apt-get dist-upgrade on deployment-eventlogging
11:38 hashar: upgrading packages on deployment-parsoid2
November 5
21:39 hashar: applying role::logging::mediawiki::errors on deployment-fluoride. Should get a listener of some sort on port 8423 to receive fatal/exceptions
16:16 hashar: fixed up mediawiki/extensions.git which still add the deleted extension WikibaseDatabase . That has been blocking code update since Oct 30th.
October 28
13:16 manybubbles: restarted elasticsearch nodes to pick up new config
09:47 hashar: mobile varnish frontend cache is not starting anymore : /usr/lib/x86_64-linux-gnu/varnish/vmods/libvmod_netmapper.so: cannot open shared object file: No such file or directory bug 55662
10:16 hashar: git directory of mediawiki/extensions was borked following the NFS migration. Ffixing it up manually
10:05 hashar: stopped udp2log, started udp2log-mw
10:04 hashar: rebooting deployment-bastion
10:04 hashar: Jenkins jobs failing, jenkins-deploy user apparently can't write to its home dir /home/jenkins-deploy/workspace
October 7
10:48 hashar: applied iptables rules for bug 45868 on deployment-apache{32,33} and jobrunner08
10:05 hashar: applied iptables NAT rules on deployment-bastion bug 45868
October 4
19:32 MaxSem: Created table bug_54847_password_resets on all wikis
October 3
13:22 manybubbles: finished rebuilding search indexes after cirrussearch update
00:38 manybubbles: rebuilding search indecies after cirrussearch update
September 30
08:16 hashar: upgrading and restarting memcached on memc0 and memc1 to let them limit their memory at 15GB instead of 89G bug 52378
September 24
13:38 manybubbles: indecies finished rebuilding some time last night.
September 23
16:26 manybubbles: rebuilding search indecies after new index config deployment
September 20
13:39 manybubbles: rebuilt most search indecies in beta but commonswiki crashed late last night so it is half rebuilt. filing bug.
September 19
19:26 manybubbles: elasticseach filled up the system disk on its hosts so I moved its data to /mnt with a symlink.
18:37 manybubbles: rebuilding search indecies after a few merges in cirrussearch
September 17
20:05 hashar: upgrading PHP on bastion, jobrunner and apaches to from 5.3.10-1ubuntu3.7+wmf1 to 5.3.10-1ubuntu3.8+wmf1
19:00 manybubbles: upgraded elasticsearch in beta to 0.90.4
18:08 manybubbles: upgrading elasticsearch in beta to 0.90.4 so we can make sure it works so we can use some new features in it
September 10
16:29 hashar: rebooted bastion after some nfs outage. Stopped udp2log, started udp2log-mw
September 7
01:07 manybubbles: rebuilding search indecies on beta after lots of updates
September 3
14:59 hashar: upgrading PHP5 ( 5.3.10-1ubuntu3.7+wmf1 ) on deployment-apache32, deployment-apache33 and deployment-jobrunner08
14:55 hashar: upgrading PHP5 package on deployment-bastion
August 26
17:47 manybubbles: rebuilding search indecies to unbreak CirrusSearch....
August 20
18:47 manybubbles: rebuild search indecies after some changes to indexing code.
August 19
19:05 manybubbles: rebuilding the search indecies to pick up some recent changes
August 12
19:10 manybubbles: rebuild search indecies for updates
18:45 manybubbles: rebuilding all search indecies using updates
18:45 manybubbles: unstuck CirrusSearch so it'd update.
August 8
17:41 manybubbles: simplewiki's search index has completed building. All search indecies should now be up to date.
15:53 manybubbles: reindexed all wikis to add accent squashing. simplewiki is still rebuilding but I reindexed what was complete and starting the rebuild again so it'd pick up accent squashing.
11:55 manybubbles: all search indecies have finished building except simplewiki
August 7
20:31 manybubbles: rebuild all the small search indecies. waiting on enwiki, enwikivoyage, simplewiki, and commonswiki.
20:08 manybubbles: rebuild search indecies after large-ish code change to CirrusSearch
07:27 andrewbogott: rebooted deployment-memc1 and deployment-memc0 (not at the same time) while freeing up space on virt servers.
12:46 manybubbles|away: enwikivoyage's search index finished building over night. dewikivoyage seems to have stalled out. I'm going to profile it. simplewiki is still running and will need some love to finish more quickly.
09:02 hashar: rebooted both memcached instances to be able to log on them. Apt upgrading both of them
08:57 hashar: Deleting deployment-cache-upload03 , replaced by the fully puppetized instance deployment-cache-upload04
08:57 hashar: Deleting the old squid instance since we run varnish cache for text nowadays
July 31
21:37 ^d: Fixing permissions on /mnt/upload7/wikivoyage to be like the other domains
22:05 ^d: Memcached moved off of the apache instances to their own dedicated hosts (-memc0 and -memc1). Should have a lot more memc storage now.
July 29
23:44 hashar: fixed up timeline on beta, it never worked there. Thanks ^demon !
13:29 hashar: rebuilding l10n cache, has been broken for a while
July 26
21:21 hashar: applying misc::syslog-server on deployment-bastion to make it a syslog server bug 36748
July 24
20:41 hashar: restarted memcached on both apache boxes. Might clear their caches.
20:40 hashar: apt-get upgrading apache32 and apache33. Running puppet on them
11:30 hashar: manually running sync-site-resources : su - apache -s /bin/bash then /usr/local/bin/sync-site-resources
July 23
09:03 hashar: restarted varnish text cache
July 22
07:57 hashar: deleting deployment-varnish-t3 , used as a mobile cache, now replaced by deployment-cache-mobile01
07:56 hashar: deleting deployment-puptest , unused, no class applied
July 19
19:55 hashar: rebooting deployment-cache-text01.pmtpa.wmflabs , can't access it
July 18
12:41 hashar: Text cache was not in wgSquidNoPurge, that caused all requests to be interpreted as coming from the text cache causing misc issue (such as throttling account creation for everyone).
14:57 hashar: restored /data/project/apache/common-local/php-master/extensions/Diff/Diff.php got deleted somehow by git
July 10
08:57 hashar: rebooting -sql instance to make it use NFS as /home
08:06 hashar: shutting down deployment-cache-upload03
08:04 hashar: migrating upload.beta.wmflabs.org from cache-upload03 (lucid/squid) to cache-upload04 (precise/varnish)
July 9
19:34 hashar: Attempting to reboot a bunch of instances prevent ssh access because /home is borked . uploadtest08 uploadtest07 -cache-upload04 -cache-text01 parsoid2 cache-mobile01 deployment-sql02 cache-upload03
17:42 aude: added Yuvipanda to the project
July 8
08:27 hashar: rebooting deployment-cache-text1 , maybe I can get ssh access this wa
08:26 hashar: Set $wgLoadScript to points to bits instead of the wiki local docroot. 70322
15:18 hashar: deleting deployment-searchidx02 , not being used
13:38 hashar: restarted mw-cgroup upstart service on apaches box. That recreated the wgCgroup directory /sys/fs/cgroup/memory/mediawiki
13:10 hashar: removed iptables 'nat' rule from deployment-upload
13:10 hashar: pointed deployment-upload thumb handler to the varnish cache text instead of squid. Done by editing /data/project/upload7/scripts/thumb-handler.php
12:58 hashar: installing iptables on deployment-upload
12:57 hashar: Updating iptables rule that workaround NAT issue in beta. Applied on deployment-searchidx01 and deployment-upload : iptables -t nat -I OUTPUT --dest 208.80.153.219 -j DNAT --to-dest 10.4.1.133 See also bug 45868
July 1
22:05 hashar: upgraded packages on deployment-eventlogging
18:57 hashar: deleted deployment-nginx-test , not needed anymore, nginx proxies for mobile are working
14:42 hashar: Migration to the new mobile instance was tracked by bug 49469
14:40 hashar: shutdowning deployment-varnish-t3 (replaced by deployment-cache-mobile01
14:40 hashar: binding mobile IP address 208.80.153.143 to deployment-cache-mobile01
14:38 hashar: rebooting deployment-varnish-t3
14:35 hashar: updated puppet repository on deployment-varnish-t3 and running puppet there
14:34 hashar: applying role::protoproxy::ssl::beta on deployment-cache-mobile01 (attended to replace varnish-t3 for mobile caching)
14:14 hashar: rebooting deployment-cache-mobile01
13:54 hashar: attempting to enable HTTPS on the varnish text cache by applying role::protoproxy::ssl::beta
12:28 hashar: restarted both apaches. Beta has been down for a couple hours due to a NFS issue on labstore3.
08:43 hashar: Shutdowning deployment-squid , service migrated to deployment-cache-text01 (varnish).
08:36 hashar: Switching the text cache traffic from deployment-squid to deployment-cache-text1 by reassociating the public IP 208.80.153.219
June 26
09:22 hashar: Squid restarted properly, that fixed some stalled resource loader entries that were causing some outdated Javascript modules to be served. Fixed at least an inconsistency such as bug 49911
22:01 hashar: clearing memcached , that might cleanup some resource loader cache causing bug 49911 "nab collapse missing in beta"
15:42 hashar: restarted lucene-search-2 on searchidx01
15:37 hashar: upgrading -searchidx01 and refreshing puppet manifests
June 20
20:45 hashar: Jasper Deng joined in AbuseFilter manager group :)
20:23 hashar: VisualEditor self updated on beta, it was stuck due to a misconfiguration in gerrit bug 49846
June 19
08:36 hashar: Fixing up the abuse filter central DB to points to 'labswiki' instead of the non existent 'metawiki' 69461. Suggested by Steinsplitter :)
22:13 hashar: Applying MaxSem 'misc::beta::sync-site-resources' to deployment-bastion. That syncs .css articles from production to beta!
June 17
16:25 hashar: Apache was down on apache32. Restarted it as well as on apache33.. Solved bug 49700
16:22 hashar: varnish-t3 (mobile cache): cleaned up operations/puppet local repo and re ran puppet. Still blocked :/ bug 49700
11:16 hashar: created /data/project/apache/uncommon/master , owned by mwdeploy:mwdeploy and mode 0755.
June 12
08:04 hashar: Creating deployment-cache-upload04 using a Precise image. The aim is to replace deployment-cache-upload03 which runs Lucid (see also bug 49470
May 30
08:27 hashar: Added Nikerabbit to the project. Will setup solr for translate
12:24 hashar: creating a dumb proxy blocker touch /data/project/apache/common-local/php-master/../wmf-config/mwblocker.log
12:20 hashar: mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki en special wikidatawiki wikidata.beta.wmflabs.org
12:20 hashar: attempting to install wikidata
May 20
18:30 hashar: Added Krenair to the project
08:21 hashar: removing thumbnails from the Gluster shared directory: cd /data/project/upload7 && find -maxdepth 3 -wholename '*/thumb'|xargs -n1 -P4 rm -v -fR
May 16
14:53 hashar: restarted job service on jobrunner08 : /etc/init.d/mw-job-runner restart . It was missing /usr/local/apache/common 64057 and 64065 fix it by using a symlink to /data/project/apache just like on apache webservers.
May 14
12:53 hashar: deleting deployment-lucene, we are using search01 and searchidx01
12:49 hashar: rebooting -cache-upload03 for kernel upgrade
12:49 hashar: rebooting -sql for kernel / mysql upgrade
20:01 hashar: applying role::labsnfs::client on -bastion
19:45 hashar: applying the very recent `role::labsnfs::client` class on deployment-integration
19:43 hashar: Upgraded puppet manifests on deployment-integration and running puppet.
19:21 hashar: Migrating homes to the new NFS server
18:27 hashar: rsync to the NFS server are completed. There are most probably still some tiny files than need to be copied though
16:46 hashar: Mounted new NFS server on /srv/project on instances: apache32, apache33, video05 and jobrunner08
16:01 hashar: Clearing out years old backup from /data/project such as copy of extensions, databases dumps and some old instances backups.
15:28 hashar: Copying l10n cache to the new NFS server: rsync -av /home/wikipedia/common/php-master/cache /srv/project/apache/common/php-master
15:11 hashar: syncing upload data from the Gluster share to labnfs server: rsync -avv /data/project/upload7 /srv/project
13:59 hashar: bastion: created NFS mount point thanks to Coren. echo 1 >/sys/module/nfs/parameters/nfs4_disable_idmapping ; mount -t nfs -o nfsvers=4,port=0,hard,rsize=65535,wsize=65536 labnfs.pmtpa.wmnet:/deployment-prep/project /srv/project
12:41 hashar: Refreshed most extensions and running mw-update-l10n
April 29
21:00 hashar: updated MobileFrontend manually to 9356d00ac5
April 26
15:56 MaxSem: Enabled GeoData cronjobs
April 24
12:41 hashar: on searchidx01 iptables -t nat -I OUTPUT --dest 208.80.153.219 -j DNAT --to-dest 10.4.0.17 (see bug 45868 )
April 23
08:32 MaxSem: Deployed GeoData
April 22
20:49 hashar: Manually updating all mw extensions to make sure everything works fine.
April 21
21:17 hashar: beta is up again. Apache2 could not start because the error log file was not accessible ( bug 47479 )
20:33 hashar: Apache down on both apaches instances
April 19
19:50 hashar: The l10n cache was stalled since Mar 22 13:08 at least. The files were owned by `mwdeploy` seems something changed and they are now owned by `l10nupdate` So I ran: chown l10nupdate -R /home/wikipedia/common/php-master/cache/l10n/
19:46 hashar: Attempted to update the l10n cache (sudo -u mwdeploy mw-update-l10n ) got a permission deny on /home/wikipedia/common/php-master/cache/l10n
19:43 hashar: Gluster is broken on beta. Extensions are no more updating nor the l10n update can run. bug 47425
13:32 hashar: switching udp2log on bastion: /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start (see bug 38995 )
13:31 hashar: rebooting deployment-bastion too : gluster issue
13:26 hashar: Cluster is back up :-]
13:25 hashar: rebooting both apaches.
13:24 hashar: Gluster failure again /data/project/apache/conf/ has some files missing: www.wikipedia.conf en2.conf wikimedia.conf
13:23 hashar: apache2: Syntax error on line 324 of /etc/apache2/apache2.conf: Syntax error on line 9 of /etc/apache2/wmf/all.conf: Could not open configuration file /etc/apache2/wmf/www.wikipedia.conf: No such file or directory
13:20 hashar: apt-get upgraded apache32 and apache33 . Note that apache is down on them.
13:19 hashar: no pages being served. Most probably a PHP Fatal error
11:40 hashar: Resetting the extensions checkout. Been broken for a few days because of extension renaming.
March 22
21:11 hashar: Search is back! Turns out that lucene-search2 service was not running on deployment-search01 despite puppet ensure => running on the service :( See also bug 46459
21:03 hashar: Starting lucene-search-2 on deployment-search
15:00 hashar: manually update puppet sources on -search01 and -searchidx01
14:40 hashar: manually refreshing extensions on -bastion
14:39 hashar: Updated mediawiki/extensions.git which was lacking the Thanks extension 55263
13:34 hashar: -squid shows a ton of stalled `redirector` processes. Killed them all.
March 14
22:20 hashar: deployment-bastion is now a jenkins slave of the production Jenkins machine
22:13 hashar: manually installing openjdk-7-jre on -bastion
22:02 hashar: Successfully added jenkins-deploy to deployment-prep.
21:59 hashar: adding jenkins-deploy to the project
21:37 hashar: removing restrictions from deployment-bastion . authorized_keys is not read when in labs :] (thx Ryan)
21:16 hashar: -bastion changed restricted_to to (project-deployment-prep) (jenkins)
21:15 hashar: on -bastion: Added group restrictions and set variable restricted_to = (project-jenkins) (jenkins) thanks ryan
21:02 hashar: creating jenkins homedir manually on -bastion
20:38 hashar: applying jenkins::user to deployment-bastion
19:03 hashar: rebooting -bastion to find out whether the security rule is applied
19:01 hashar: updated security rule to allow TCP port 22 connection from gallium.wikimedia.org [208.80.154.135/32]
04:22 hashar: killed job runners on jobrunner08 and restarted service
04:22 hashar: Restarted apache on apache32,33
04:20 hashar: Upgrading apache32, apache33, video05 and jobrunner08
02:13 hashar: Trying out geoip module from 53714 on deployment-integration
March 13
04:49 hashar: rebooting deployment-integration
04:47 hashar: rebooting deployment-lucene
March 12
20:39 Chad: added port 1099 to search engine security group to allow RMI messaging to go through
March 11
05:44 hashar: Running MediaWiki update.php on all databases
March 8
20:15 hashar: The search backend is apparently working now !!! bug 34250
00:46 hashar: upgrading all instances
March 7
23:43 hashar: OAI repository set up on beta !!! bug 45814
23:42 hashar: for squid login=PASSTHRU replaced by login=PASS. Reloaded squid.
23:41 hashar: reloading squid
23:41 hashar: setup squid to pass the WWW-Authorization headers to the Apache. Done by configuring login=PASSTHRU for each cache_peer (*crosses fingers*)
22:15 hashar: Set up an OAI repository user for lucene search. Password in puppet.
22:04 hashar: Restored mysql admin password on deployment-sql
21:57 hashar: stopping mysql server on -sql
21:31 hashar: Creating OAI repositories on sql and sql02 master databases
09:18 hashar: Beta is broken in some random and creative ways AGAIN. /home on bastion is corrupted, some instances do not let us connect anymore, apache docroot disappeared.
February 1
10:21 hashar: nslcd probably points to a wrong LDAP or has a faulty DNS configuration. Can't login on it anymore :/
10:12 hashar: rebooting the varnish-t3 instance, nslcd can't resolve somepath
January 31
15:49 hashar: Deleting out /data/project/squid1 which has been migrated to /mnt/squid_cache. The gluster volume for data-project is corrupted on beta so we don't want to use it anymore.
15:46 hashar: stoping squid, migrating ufs cache from /data/project/squid1 (gluster) to /mnt/squid_cache
15:42 hashar: cleaned out deployment-squid:/mnt/ (add an old enwiki dump and some squid files
15:19 hashar: restarting squid process on deployment-squid
15:18 hashar: starting apache2 on -apache32
14:53 petan: restarted squid and rebooted apache32
January 30
15:17 hashar: removing -cache-bits-02 (been replaced a long time ago by -cache-bits-03)
January 21
12:42 hashar: -varnish-t3 : removing /dev/sda* entries from /etc/fstab , applying 44709 ps 6 and rerunning puppet
12:28 hashar: applying role::cache::mobile on deployment-varnish-t3
11:27 hashar: created deployment-varnish-t3 , deleted deployment-varnish-t2
10:48 hashar: moved 208.80.153.143 from deployment-varnish-t to deployment-varnish-t2 (IP is in DNS as *.m.beta.wmflabs.org )
10:38 hashar: creating deployment-varnish-t2 to replace broken deployment-varnish-t
10:25 hashar: re rebooting dpeloyment-varnish-t
09:58 hashar: Rebooting deployment-varnish-t from labsconsole. I guess there is a mount for /dev/sda* :(
09:51 hashar: rebooting deployment-varnish-t to find out how well it goes on restart :-]
21:49 hashar: renamed /data/project/apache/common-local to common-local.pre-git-deploy
January 14
22:58 hashar: renamed php-1.21wmf{6,7} with a -back prefix. Created symbolic links to the git-deploy slots: ln -s /srv/deployment/mediawiki/slot1 php-1.21wmf6 and /srv/deployment/mediawiki/slot0 php-1.21wmf7
17:08 wm-bot: this is a creepy log with | and such shitty chars $@#% 6346 w@#%^@# 6bla
January 10
08:29 Ryan_Lane: deployed all repos to destination hosts
08:29 Ryan_Lane: made deployment-bastion a git-deploy deployment host
08:18 hashar: removed misc::deployment::scripts from -bastion, already provided by misc::deployment::scap_scripts
08:09 hashar: put back role::beta::autoupdater on -bastion
January 9
21:47 hashar: running puppet on apache boxes to get the new role::applicationserver::appserver::beta class
21:40 hashar: migrating apaches box to the new role::applicationserver::appserver::beta (replaces both appserver and imagescaler)
20:40 hashar: removing the phased out imagescaler::labs from apaches in favor of role::applicationserver::imagescaler
20:20 hashar: Migrated Apache box to use role::applicationserver::appserver instead of the old (and no more existent) role::applicationserver
16:04 jeremyb: the recent (today at least, but probably most of the earlier ones too) !logs from wm-bot are really from hashar. in case you were looking for the source.
10:36 wm-bot: updating mediawiki config to latest master
December 21
20:18 wm-bot: rebooting some instances so they get the new /home
December 20
08:58 wm-bot: manually updated git puppet repo on deployment-video05
December 19
20:52 hashar: Granted MaxSem and Mgrover sysadmin rights. They are WMF contractors going to work on setting up MobileFrontend on beta.
11:37 wm-bot: finally had GettingStarted extension installed.
10:37 hashar: /home/wikipedia/common/php-master/extensions/.git/FETCH_HEAD gave I/O error. I have deleted it and reran git pull + git submodule update --init aka : UPDATED ALL EXTENSIONS TO THEIR LATEST master VERSION.
10:32 wm-bot: removing live hack on UserMerge extension (attempted to grant some user right to bureaucrat, that should be done in CommonSettings.php )
10:31 wm-bot: manually running 'git submodule update --init' under extensions directory to find out what is going on there
10:10 wm-bot: rebooting apache32 and apache33 to get new /home
09:50 wm-bot: updating mediawiki-config
09:46 hashar: rebooting -bastion to get the new /home
December 4
12:27 hashar: Apache boxes seems to be running again. Had to manually restart apache on apache33.
08:52 hashar: Apache32 is somehow up
08:42 hashar: on apache33 : removed /var/log symlink, recreated directory, restarted gluster, moving files form /data/project/apache33
08:32 hashar: rebooting apache32 so all its service knows about /var/log :-]
08:30 hashar: on apache32 : removed /var/log symlink, recreated directory, restarted gluster, moving files from /data/project/apache32
November 6
15:32 hashar: fixed up beta by repelling EventLogging extension which was in a weiiiiird stat
15:12 hashar: Resetted all extensions to their latest master version...
14:51 hashar: blank pages on beta are caused by the EventLogging extensions being required although it is not pulled.
14:11 hashar_: configured apaches to send their errors log in /home/wikipedia/logs (conf file is /data/project/apache/conf/wmflabs-logging.conf )
10:45 wm-bot: stashing change in /h/w/c
10:14 hashar: made mwdeploy gitconfig file to support color + added the 'git lg' and 'git lg2' aliases which gives a nice + concise log
08:45 hashar: manually running the beta auto updater from -bastion instance
November 5
23:11 hashar: applied the new role::beta::autoupdater class to -bastion.
21:00 hashar: changing ownership of all files in /home/wikipedia/common to mwdeploy:mwdeploy as per deployment-bastion GID/UID. Running as root in screen 27986.pts-1.i-00000390 .
20:56 hashar: rebooting jobrunner06 to ensure that wmf-beta-autoupdater is gone
20:37 hashar: uninstalling beta updater from jobrunner06 , will be deployed on -bastion
20:27 Damianz: csteipp: ran git pull of master, kept local dblist conflicts
October 17
20:43 hashar: applying nfs::upload::labs to apache32 and 33. It is not more applied by the role::applicationserver class (prod apply nfs::upload directly on nodes)
20:30 hashar: moving /data/project/upload6 to /data/project/upload7 to match production. See bug 41121
13:44 hashar: -sql02 removed ganglia from host and reran puppet.
13:41 hashar: Added CSteipp and Reedy as sudoers
12:03 hashar: Fixed assets on bits. The static-master symbolic links got removed at some point. See 28337
09:13 labs-logs-bottie: petrb: OSB doesn't seem to be installed properly, investigating
09:08 labs-logs-bottie: petrb: deployed OSB to enwiki
09:07 labs-logs-bottie: petrb: inserted OSB to update-extensions.sh and extensions
August 30
21:22 hashar: Deployed the automatic code updater on beta. It is running on deployment-integration, service is wmf-beta-autoupdate managed by puppet to always run.
20:55 wm-bot: applying beta::scripts to deployment-integration
20:38 hashar: trying out 22116 on deployment-integration (that is the beta auto upda�her)
16:39 wm-bot: updating all extensions and core to their latest master version
15:00 hashar: deployment-bastion does not let us log in despite being a fresh instance. Logged as bug 38846
14:47 hashar: rebooting -bastion
12:48 hashar: Shutdowning deployment-nfs-memc for a while, will see if it is still needed around or if we can safely delete it (see bug 38084). All data should be on /data/project .
12:43 hashar: Recreating deployment-bastion using a Precise image and s1.small (1CPU, 1GB RAM, 80G storage)
12:40 hashar: Deleting -bastion , was corrupted.
July 28
03:38 labs-logs-bottie: j: only have one commons wgForeignFileRepos: wikimediacommons at commons.wikimedia.beta.wmflabs.org (/data/project/apache/common/wmf-config/filebackend-wmflabs.php)
01:09 Platonides: testwiki is now showing random captchas
01:09 Platonides: moved /mnt/upload6/captcha/random to /mnt/upload6/private/captcha/random
00:22 Platonides: generating random-challenge captchas at /mnt/upload6/captcha/random
22:50 Platonides: Filesystem corruption signs in deployment-bastion, most debconf backends /usr/share/perl5/Debconf/FrontEnd are zeroed files. This explains some of the earlier apt-get problems, and maybe also bug 38747 (aka. magic tail -f fix)
22:49 Damianz: Thumbnails are currently broken - thumbs folder seems empty, images are there as expected - beta quirk or need rebuilding?
22:22 Damianz: DocumentRoot got wiped, broken all the sites - fixed the broken symlink on bastion for /data/project/apache to /usr/local/apache. Also fixed common-local to common symlink.
21:56 Platonides: apt-get removed wikimedia-task-appserver in deployment-bastion :(
21:42 Platonides: removed generated captcha files
21:33 Platonides: installed in deployment-bastion the packages joe, python-imaging and wamerican
20:29 labs-logs-bottie: j: deployment-bastion: removing all deployment-nfs-memc entries from /etc/fstab
13:14 hashar: restarted job runner on jobrunner06
13:11 hashar: removing all deployment-nfs-memc entries from /etc/fstab
09:51 hashar: bug 38749 jobrunner06 : removed /usr/local/apache and /mnt/upload6 empty dir. Downgraded PHP manually. Rerunning puppet.
08:46 hashar: migrating jobrunner06 to use the /data/project for uploads
17:00 labs-logs-bottie: j: clear messages after updating localization(cache/l10n) to get new messages in TMH: php MWScript.php ../php/extensions/WikimediaMaintenance/clearMessageBlobs.php --wiki=aawiki
14:43 hashar: rebooting bits02
14:39 hashar: dist-upgrade on cache-bits02, will reboot after that (bits.beta.wmflabs.org will be disabled while it reboot)
14:24 hashar: fixed puppet on cache-bits02 : ln -s /var/lib/git/operations/puppet/modules /etc/puppet/modules . That was an empty directory, thus prevented puppet to find the modules and made it breaking when trying to install ntp::client
13:48 hashar: rebooting apache33 so it can fsck /dev/vdb
00:02 labs-logs-bottie: j: run mwscript rebuildLocalisationCache.php --wiki=commonswiki --threads=2
July 24
21:58 labs-logs-bottie: j: run mwscript rebuildLocalisationCache.php --wiki=aawiki --threads=2
18:25 hashar: Instances send their syslog again! To deployment-dbdump for now 14090
15:42 hashar: on deployment-integration, applied 15545 patchset 7 to test out the symlinks from /data/project/upload6 to /mnt/upload6 .
15:41 hashar: on deployment-integration, applied 15545 patchset 7 to te
15:32 hashar: rerunning rsync with --delete : root@deployment-nfs-memc:/mnt/export# rsync -a --progress --delete --inplace /mnt/export/upload6 /data/project
15:26 hashar: root@deployment-nfs-memc:/mnt/export# rsync -a --progress --inplace /mnt/export/upload6 /data/project
15:00 hashar: banned another /22 at squid level.
14:45 hashar: banned, at squid level, a crawler hosted on OVH. Just added the IP to squid.conf blacklist :)
13:42 hashar: Ran dist-upgrade on deployment-dbdump and rebooting. Will break udp2log loggers.
13:40 hashar: Rebooting all apaches
13:35 hashar: Running "apt-get dist-upgrade" on apache{32,33} to fix PHP5 using ubuntu packages instead of wmf packages. Upgrade kernel.
July 23
16:03 hashar: hopefully half fixed the udp2log on deployment-dbdump . Need several changes in the puppet files though cause the udp2log-mw init script seems to conflict with the udp2log one :/
13:41 hashar: rebooting -dbdump to make sure everything works fine :D
13:40 hashar: udp2log restored on beta!!! Still in /home/wikipedia/logs/ and logged by deployment-dbdump
13:11 hashar: applying role::logging::mediawiki to -dbdump (will bring log2udp)
19:33 hashar: Updated ArticleFeedback and ArticleFeedbackv5 to latest master. Dropped their tables and ran the updater ( mwscript update.php --quick --wiki=enwiki ). Solves bug 38422 - trash and redo ArticleFeedbackv5 on beta enwiki. See http://en.wikipedia.beta.wmflabs.org/wiki/Special:ArticleFeedbackv5
16:04 hashar: Created upload.beta.wmflabs.org to points to deployment-cache-upload01 ( 208.80.153.242 )
15:46 hashar: just logging the rsync command: root@deployment-nfs-memc:/data/project/upload6# rsync -a --progress --inplace /mnt/export/upload6 /data/project/upload6
15:41 hashar: started rsync of /mnt/export/upload6 to /data/project/upload6
14:55 hashar: deployment-cache-bits02 now serves bits.beta.wmflabs.org using pending gerrit changes 15445 and 13304 (using varnish configuration from production)
10:19 hashar: Easy fix for the leap second bug: /etc/init.d/ntp stop; date `date +"%m%d%H%M%C%y.%S"`; /etc/init.d/ntp start
10:06 Hydriz: ukwiki was giving errors regarding flaggedrevs's flaggedpages table not existing. Fixed it by running mwscript update.php ukwiki.
07:50 hashar: Gave access to the cluster to Hydriz
July 1
17:43 hashar: manually rebooted most servers
10:00 hashar: rebooted apache31 due to leap second bug. Stopped mysql on apache30 which was using 100%CPU
06:59 hashar: rebooting all boxes
June 29
15:07 hashar: Removing thumbnails that not have been access for the last 15 days : sudo find . -atime +15 -wholename '*/thumb/*' -exec rm {} \;
15:02 hashar: deleted .nfs** files in /mnt/export/upload6/
08:58 hashar: restarted jobrunner service (had a wrong path pointing to common-backup)
June 28
16:07 hashar: apache conf uses site.conf :-(((( need to puppetize that one day
15:47 hashar: updating mediawiki-config to latest master
14:10 hashar: running puppet on apache{30,31}. The /etc/sudoers conflict has been merged in :)
11:03 hashar: migrated all wiki from php-trunk to php-master by editing wikiversions.dat. Refreshed wikiversions.cdb and renamed ExtensionMessages-trunk.php to ExtensionMessages-master.php
10:59 hashar: set group write on deployment-nfs-memc:/mnt/export/apache/common-local would let us rewrite the wikiversions.cdb file
June 27
10:40 hashar: made deployment-mc to use 'memcached' puppet class. Now uses 2000MB apparently
10:25 hashar: removed memcached from deployment-nfs-memc , it is running on deployment-mc nowadays.
10:21 hashar: rebooting deployment-mc for kernel upgrade
10:07 hashar: updating packages on deployment-cache-bits
09:42 hashar: deleted deployment-thumbproxy instance. We are not going to replicate the production thumbnailing architecture
20:16 hashar: deleted deployment-syslog instance. It is of no use till we have a way to setup syslog server on labs bug 36748 (syslog-ng conflict with rsyslog from base::??? puppet class)
20:13 hashar: Removed misc::mediawiki-logger from deployment-feed. Was replaced by some new udp2log system I can't understand. So for now, -feed is locally hacked and does not rely on puppet anymore.
19:50 hashar: deployment-feed removed wireshark then ran 'apt-get auto remove' , various X11 packages got removed. Now up to 262MB free.
19:46 hashar: deployment-feed removed some old kernels apt-get remove --purge linux-image-2.6.32-318-ec2 linux-image-2.6.32-342-ec2 linux-image-2.6.32-38-virtual linux-image-2.6.32-34-virtual
19:39 hashar: deployment-feed is now out of disk space :-(
19:17 hashar: Removed role::cache::bits from deployment-cache-bits. Only work in production.
11:06 hashar: Files migrated. A copy of the old common is in /usr/local/apache/common-back
10:12 hashar: migrating beta to use operations/mediawiki-config
08:29 hashar: restarted several time the job runner on jobrunner05. It eventually started working again :-(
07:50 hashar: restarted udp2log
07:48 hashar: killing python demux on deployment-feed
June 25
15:14 hashar: Deleted InitialiseSettingsDeploy.php (no longer used). Replaced by InitialiseSettings-wmflabs.php
15:00 hashar: updating Ubuntu on deployment-transcoding
June 23
13:02 labs-logs-bottie: petrb: deploying 37852
June 22
13:18 labs-logs-bottie: petrb: updated /usr/local/apache/common/wmf-config/InitialiseSettingsDeploy.php to match the feed I just made :)
11:17 hashar: Created /etc/wikimedia-realm file containing 'labs' on -dbdump, -apache30, -apache31 and -jobrunner05. Related puppet change is https://gerrit.wikimedia.org/r/#/c/12377/
10:07 hashar: bug 37116 removed deployment-nfs-memc cronjob /var/fs which did some nasty recursing file changes. Has been disabled since May 25th anyway.
09:08 hashar: Deleting hostname mobile.beta.wmflabs.org and releasing 208.80.153.244
20:41 hashar: rebooting jobrunner05 following some package installs made earlier by puppet
20:39 hashar: manually running puppet on jobrunner-05
May 26
18:14 beta-logmsgbot: hashar: killed webtranscode job on commons
05:59 beta-logmsgbot: hashar: Edited squid.conf to limit memory to 1G and restarted squid
05:42 beta-logmsgbot: hashar: squid was killed by linux OOM!
May 25
15:18 beta-logmsgbot: hashar: deleted jobrunner06 (precise), we just need one precise instance which will be jobrunner05 for now
14:45 beta-logmsgbot: hashar: installing applicationserver::homeless and applicationserver::jobrunner on jobrunner06
14:42 beta-logmsgbot: hashar: installing applicationserver::homeless and applicationserver::jobrunner on jobrunner05
14:27 beta-logmsgbot: hashar: installed jobrunner05 and 06 using Ubuntu precise. Should let get a 0.27 ffmpeg installation for bug 37043
08:38 beta-logmsgbot: root: on dbdump, deleted /etc/logrotate.d/mw-udp2log . Most probably in conflict with the one from deployment-feed which host the udp2log process
08:35 beta-logmsgbot: root: gzipped /home/wikipedia/logs/archive/*20120525 see bug 37012 :-(
08:23 beta-logmsgbot: hashar: killed stuck jobs on jobrunner 02 and 03. Restarted loop.
May 24
11:52 beta-logmsgbot: hashar: Rewrote log command to use dologmsg and the new beta-logmsgbot
11:51 beta-logmsgbot: hashar: yeah I do log
11:47 hashar: Moving /bin/log to /usr/local/bin/log
11:45 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit` on deployment-dbdump
09:43 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit` on deployment-dbdump
09:42 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit`
07:05 hashar: killed some stalled jobs on jobrunner02
02:56 labs-logs-bottie: jeremyb: foo
May 23
19:54 hashar: rebooting apache20 following installation of imagescaler puppet class
19:49 hashar: rebooting apache23 following installation of imagescaler puppet class
19:46 hashar: rebooting apache22 following installation of imagescaler puppet class
19:15 hashar: rebooting apache21 following installation of imagescaler puppet class
18:48 hashar: Adding puppet class 'imagescaler' on all deployment-apacheXX instances in an attempt to fix thumbnails
18:36 labs-logs-bottie: hashar: relocalisation cache done 367/367 languages rebuilt
18:28 labs-logs-bottie: hashar: running `mwscript rebuildLocalisationCache.php --wiki=aawiki` for bug 36806
16:22 labs-logs-bottie: hashar: delete all 3 webVideoTranscode jobs from enwiki database
16:10 hashar: rebooted jobrunner03 to check everything works fine there
15:41 hashar: deleting jobrunner01, it is crashed beyond repair. Will create a new one named jobrunner03
15:27 hashar: rebooting jobrunner01 to see how it goes
14:56 hashar: stopped job runner on jobrunner01, amounted /mnt/upload6 and /mnt/
14:37 hashar: running puppet on job runner to check change 8584 & 8585 worked
May 22
13:49 hashar: Deleting jobrunner03 and 04, not going to need them afterall
13:24 hashar: deleting refreshLinks2 jobs from enwiki database
12:52 hashar: deleting deployment-jobrunner{3,4} installation failed I got permission denied. Will recreate them using same hostname
11:55 hashar: create two more job runner instances
10:09 hashar: Remove deployment-webs instance which was meant to emulate the HTTPS access. Hacky and low priority for now, we will need to setup a nginx proxy one day to properly replicate the production infrastructure.
09:39 labs-logs-bottie: hashar: rebooting jobrunner02 just to be sure it is properly loaded up
09:30 labs-logs-bottie: hashar: jobrunner logs are available in /home/wikipedia/logs/runJobs.log now
21:11 hashar: deployment-nfs-memc : fix user right for upload6 : chown apache /mnt/export/upload6
21:11 hashar: On deployment-nfs-memc : added apache (uid 48) entry in /etc/passwd
21:11 hashar: Adding Faidon and Platonides to default CC list of "depoyment prep (beta)" component
21:11 hashar: In Bugzilla, I have removed Petr Bena as a default assignee of bugs opened for "deployment-prep (beta)" component. Default is now "Nobody", Petr is on CC. That will makes bug triage a bit easier.
08:58 hashar: rerebooting deployment-feed
08:53 hashar: Looks like -feed is dead : EXT3-fs: INFO: recovery required on readonly filesystem.
08:49 hashar: rebooting deployment-feed
08:22 hashar: installing iotop on deployment-nfs-memc
May 20
02:02 hashar: on -nfs-memc, running 'chown -R 48 /mnt/export/upload6' so file get owned by user apache on apaches and job runner boxes
01:05 hashar: might have disabled IRC notification by setting wgRC2UDPAddress in InitialiseSettingsDeploy.php
00:46 labs-logs-bottie: hashar: jobrunner01 seems to start catching up with jobs
00:42 labs-logs-bottie: hashar: Created a dumb `aawiki` database
10:23 hashar: updated enwiki database using 'mwscript update.php enwiki --quick'
May 17
10:29 labs-logs-bottie: hashar: afterall, made /mnt/export/upload6 to be world writable "sudo chmod -R 777 *"
10:20 labs-logs-bottie: hashar: Fixed filerepo backend by using chown -R www-data:depops /mnt/export/upload6/wikibooks
10:15 labs-logs-bottie: hashar: Fixed cluster which was giving blank page. Root cause was wmfUdp2logDest which must be <IP address:port> (aka not use a hostname
09:43 labs-logs-bottie: hashar: removed Draft extension for now.
May 16
16:46 hashar: cleaned up more of CommonSettings.php today. Moved some hacks to disable features as settings in InitialiseSettingsDeploy.php . See git log.
19:48 labs-logs-bottie: hashar: restarted udp2log on deployment-feed, lot of zombie python processes there
12:22 hashar: replaced most occurrences of /mnt/upload to /mnt/upload6
09:38 labs-logs-bottie: hashar: Applying apache::service to dbdump
08:53 labs-logs-bottie: petrb: updating to head
07:56 hashar: Deleted all of deployment-nfs-memc:/mnt/export/upload-back , it contained only thumbs
07:51 hashar: cleaning out deployment-nfs-memc:/mnt/export/upload-back from thumb, lock dirs and related
May 14
19:30 labs-logs-bottie: hashar: Fixed X-Forwarded-For IP not being recognized
19:30 labs-logs-bottie: hashar: fo
19:30 hashar: Fixed IP :-D
May 11
19:13 hashar: Removed OnlineStatusBar extension. It is not in Gerrit / WMF
19:02 hashar: Removing misc::mediawiki-logger from dbdump, it is on 'feed'
18:35 hashar: restarting udp2-log on dbdump
18:27 hashar: deleting symlinks in /home/wikipedia to /data/project : breaks logging�
18:21 hashar: Replaced extensions with a fresh clone of mediawiki/extensions.git
17:33 hashar: restarted squid several time to fix some minor typos in conf
16:39 hashar: cloning mediawiki/extensions.git which has all extensions as submodules
16:37 hashar: updated MediaWiki up to 05e656a (aka master)
09:09 labs-logs-bottie: petrb: fixed nrpe on boxes where it was failing, we need to insert motd to puppet
04:37 jeremyb: [~2 hrs ago] 02:43:33 < hashar_> !deployment-prep setting up "apache20" instance by using only puppet. We will see what happens :-D
02:28 hashar_: deleted all remaining deployment-apache instances : : You don't have enough free space in /var/cache/apt/archives/. So we really want to use m1.large , not m1.tiny pretending to save disk space :-D
01:28 hashar_: Created upload2.beta.wmflabs.org to be the entry point for the "new" thumbnailing infrastructure
01:24 hashar_: moving upload.beta.wmflabs.org from the non working instances back to the main entry point
May 10
21:46 hashar: Creating a syslog server instance. I have a VERY nasty conflict between misc::syslog-server and misc::mediawiki-logger which tries to install conflicting packages ( syslog-ng / rsyslog )
20:36 Krinkle: fixing a few php notices and general logic problems in wmf-config
20:24 hashar: running 'apt-get install --reinstall apache2.2-common' to attempt to fix /var/log/apache2 rights (root:arm)
20:24 hashar: deployment-imagescaler01 apache does not log anymore :-(
10:09 labs-logs-bottie: petrb: fixed teh missing NOT FOUND error page
May 9
18:56 Ryan_Lane: added hostnames for associated IPs
18:56 Ryan_Lane: allocated three more IPs for upload, bits, and mobile
18:18 hashar: Made logs from wgCommandLine script to be redirected to /home/wikipedia/logs/cli.log instead of /home/wikipedia/logs/catchall.log
May 8
01:49 Ryan_Lane: rebooting deployment-squid
01:43 Ryan_Lane: restarting squid on deployment-squid
May 4
10:48 mutante: added class nfs::home::wikipedia to puppet group list in "beta-labs"
10:46 mutante: added myself to admin groups to add/change puppet groups
10:36 hashar: Creating 5 new m1.large instances hosting apaches and named deployment-apacheXXX
May 3
18:24 hashar: installed on dbdump misc::syslog-server
15:53 labs-logs-bottie: hashar: adding misc::mediawiki-logger and misc::scripts classes to deployment-dbdump
15:29 labs-logs-bottie: hashar: running puppet on apaches to have them send their syslog to deployment-dbdump (bug 36246)
11:13 labs-logs-bottie: petrb: removing logrotate from all apaches it broke central log
06:02 jeremyb: [deployment-prep, deployment-nfs-memc] ran `for u in catrope hashar jeremyb johnduhart krinkle mah petrb platonides werdna; do sudo usermod -a -G depops $u; done`; krinkle was unable to modify files in wmf-config and I thought i saw why he couldn't but couldn't see why I could. turned out the groups on nfs-memc were the important ones and I was there. synced the 2 boxes with eachother and added krinkle to the list. some other deployment-prep boxes have different depops groups. (one empty with a different gid than the rest. one is same gid but just has petrb)
05:03 Krinkle: [deployment-dbdump] apt-get purged 'ack'; - On ubuntu ack is "ack-grep" which was already installed
04:48 jeremyb: [deployment-dbdump] (that was to address complaints about beta simplewiki appearing in #simple.wikipedia on irc.wikimedia.org)
04:47 jeremyb: [deployment-dbdump] changed all refs to IPs of prod hosts nfs-home and ekrem to be deployment-feed instead. and commited that to the local repo. (again not pushed anywhere yet)
04:44 jeremyb: [deployment-dbdump] did a checkpoint `git commit -a` on deploymentprep-conf (/usr/local/apache/common) (locally not pushed anywhere) because there were lots of changes on disk but not in the repo. but didn't add any new files to the repo. (so there's still stuff reported uncommited by `git status`)
May 2
15:14 labs-logs-bottie: petrb: making syslog on apaches be /data/project/apaches_log
11:52 labs-logs-bottie: petrb: changed sudo policies on web to test if puppet override it
11:52 labs-logs-bottie: j: add transcoding settings to CommonSettings.php again
11:46 labs-logs-bottie: petrb: purged stuff on transcoding and freed some 119468kb
11:08 labs-logs-bottie: hashar: install dsh package on deployment-dbdump
09:54 labs-logs-bottie: hashar: /usr/local/apache/conf is now an independant git repository
April 30
13:31 labs-logs-bottie: petrb: rebooting web5
13:27 labs-logs-bottie: petrb: web4 reboot for same reason
15:41 hashar: made deployement-web host a wikimedia-task-appserver , add to create some apache2 configuration placeholder. Apache2 does launch but it is not working though (timeout)
15:40 petan|wk: I told hashar to log stuff, if he won't, slap him
13:18 hashar: added apache::service on deployment-web host
12:18 labs-logs-bottie: petrb: moved the log file storage to gluster
12:18 labs-logs-bottie: petrb: updated git and commited all changes
March 21
03:22 mutante: mysqld on deployment-sql is stopped - did not start it though after i heard petan is working on corrupted db's
03:06 mutante: added myself as a member just to see the instance names and check for the sql server...
March 20
16:16 labs-logs-bottie: petrb: it seems that corruption of db is worse than I expected, need to restore backup old few months
16:12 labs-logs-bottie: petrb: mysql is back up
15:41 labs-logs-bottie: petrb: getting sql server down I found a bunch of corrupted db's, rollback is necessary
15:41 labs-logs-bottie: j: install php-pear on deployment-web3/4/5 required by TMH
March 19
08:18 labs-logs-bottie: root: restoring sql tables from backup
March 15
20:17 labs-logs-bottie: root: restored ok
20:14 labs-logs-bottie: root: restoring database from backup
10:16 labs-logs-bottie: petrb: failed auth on db server reboot was required
09:53 labs-logs-bottie: petrb: scheduling auto replication of sql server
March 14
14:37 labs-logs-bottie: petrb: switching en.wikipedia to older previous v
March 11
22:14 Damianz: Increased nofile on deployment-squid and added max_filedesc option to squid config. Also installed squidclient.
04:36 Ryan_Lane: also deployment-web3
04:35 Ryan_Lane: also deployment-web
04:34 Ryan_Lane: make that deployment-web5
04:34 Ryan_Lane: rebooting deployment-web, it OOM'd
March 9
14:12 labs-logs-bottie: petrb: rebooting -nfs
13:52 labs-logs-bottie: root: updated apt on webs1
13:34 j^: add ppa:j/timedmediahandler and install ffmpeg on web3 and web5
March 6
22:21 labs-logs-bottie: petrb: some instances will need to reboot, however site seems to be ok now
22:09 labs-sexy-bottie: petrb: updating svn
22:07 labs-sexy-bottie: petrb: fixed squid a bit
15:23 labs-sexy-bottie: petrb: test
14:53 labs-sexy-bottie: root: disabling bot for a while
03:11 Andrew: facepalm: apparently all reboots are failing, so this will be down until Ryan brings it all back up tomorrow
02:57 Andrew: rebooting a few hosts, there is something seriously wrong with fetching resources at the moment
March 5
15:48 labs-sexy-bottie: petrb: temporary disabled ssl server
15:48 labs-sexy-bottie: petrb: reconfigured squid
15:36 labs-sexy-bottie: petrb: restarted servers
15:24 labs-sexy-bottie: petrb: temporary changed code of localsettings to debug site
15:04 labs-sexy-bottie: petrb: inserted new wiki to sul
15:01 labs-sexy-bottie: petrb: please ignore some of the previous lines in log we were just testing bot
14:58 labs-sexy-bottie: petrb: updated live
14:58 labs-sexy-bottie: petrb: meh
14:36 labs-sexy-bottie: petrb: created a new log system, just type log message to log your change on prep
14:35 labs-sexy-bottie: petrb: this is test :o
March 4
12:00 Andrew: Finished deployment of het deploy, added a new ee.
March 1
16:35 Platonides: Installed dpkg-dev on deployment-dbdump
16:03 Platonides: Installed joe on deployment-dbdump
16:03 petan|wk: platonides needs to check the project name
February 27
08:20 petan: creating 2 more web servers to handle load
08:19 petan: rebooting both web servers, starting with web1
February 23
08:52 petan|wk: fixing the squid
February 22
01:03 Ryan_Lane: reconfiguring the web server instances to remove puppet classes that no longer exist
February 17
02:34 Andrew: Moving /usr/local/apache/common/live to /usr/local/apache/common/live-hom and symlinking live to live-hom
01:56 Andrew: running afl_rev_id patch on all wikis
01:52 Andrew: installing ack (source code search tool) on dbdump
00:59 petan: if anything is broken, it was me
00:51 petan: I broke it!
00:27 petan: switched to HEAD
February 16
10:11 j^: add video/webm to /etc/mime.types on web/webs1/web2
February 13
10:08 petan|wk: removing the puppetized memcached
08:54 petan|wk: removing some extensions from config which are missing in latest branch
January 30
18:46 petan: configuring some boxes for cluster to handle high load
January 29
01:13 hexmode: oom reboot -web
January 27
13:30 j^: install upstart script /etc/init/timedmediahandler.conf on deployment-transcoding and start service
13:05 j^: touch /etc/wikimedia-image-scaler on deployment-transcoding; transcoding needs more wgMaxShellMemory too
12:47 Platonides: updated /usr/local/apache/common/live/extensions/TimedMediaHandler to r110117 per j^request
10:11 j^: add-apt-repository ppa:j/timedmediahandler and update ffmpeg on deployment-web to support frame extraction from WebM videos
06:35 j^: update ffmpeg on deployment-transcoding (new security release from ppa)
January 26
00:00 petan: configured new firewall rule irc
January 25
23:52 petan: linked /usr/local/apache/common-local to /usr/local/apache/common
23:06 petan: updating svn
22:28 petan: reverted unlogged changes made to config which broke whole site
10:03 j^: installed ffmpeg on deployment-web (required by TMH to extract stills)
January 24
20:52 petan: created db user oren and new database for temporary wiki
13:53 petan|wk: reconfigured new instance and fixed some issues on puppet, no logs in sal regarding it
00:51 hexmode: svn up * updatedata
January 23
19:31 hexmode: restart memcache on nfs-memc
19:06 hexmode: aptitude update deployment-web
January 22
09:16 petan: configured nfs to listen for backup server
01:01 petan: configured firewall for backup instance
00:45 petan: creating a backup instance in -prepbackup project for online backup of mysql from deployment project + fs backup
00:30 petan: updating /live to head
00:21 petan: installed timedmediahandler (trunk) to commons
January 16
23:41 hexmode: to solve the trusted XFF problem, I installed tinycdb and created an 0 length file in the right place
20:35 Ryan_Lane: released unused IP address from project
January 15
15:45 petan: ran live/extensions/TrustedXFF/generate.php
11:20 petan: updated to latest head all wikis
January 14
18:49 petan: enabled global blocking
18:21 johnduhart: Removed myself from the project.
14:32 petan: separated common to own deployment file
14:30 hexmode: enabled webfonts for mywiki properly in IntialiseSettingsDeploy.php
14:25 johnduhart: Updated wmf-config/InitialiseSettings.php from production
14:25 johnduhart: Reverted change to wmf-config/InitialiseSettings.php
14:24 hexmode: enabled webfonts for mywiki
January 13
21:34 petan: assigning new dns
21:33 petan: moved deployment to beta.wmf...
13:07 petan|w: installed jdk on search
January 12
18:49 petan: installed all requested sw on search
18:46 petan: mounted conf files
18:38 petan: installed updates on new instances and rebooting it
06:48 Ryan_Lane: added nfs mounts to the fstab for deployment-web
06:47 Ryan_Lane: remounted /mnt/upload on deployment-web as nfs rather than nfs4
06:47 Ryan_Lane: modified export options on deployment-nfs-memc; removed nfs4 specific options, and removed other options not necessary for our environment.
00:38 johnduhart: Unmounted /mnt/export from /tmp on -web
15:55 johnduhart: Dropped all wikis except simplewiki
15:55 johnduhart:
15:53 petan|work: restarted memcached
15:24 johnduhart: Created enwiki and enwikibooks
15:20 johnduhart: Restarted memcached
15:17 johnduhart: Created simplewiki
15:04 petan|work: restarted memcached
15:02 johnduhart: Ran update.php on metawiki
14:59 johnduhart: Creating metawiki
14:55 petan|work: disabled current site
14:55 johnduhart: Creating centralauth db
14:52 johnduhart: DROPing new configuration tables, will recreate
14:38 johnduhart: Forget the metawiki dump
14:09 petan|work: created backup of broken db of meta and replaced it with auth db
13:52 petan|work: test is done, restored test SUL to previous state
13:43 petan|work: created backup of central auth and replace the testing SUL with current data, merged with current SUL so that we can use same logins on all sites