Nova Resource:Deployment-prep/SAL/Archive 1

December 11

10:44 hashar: deleted deployment-search01 and deployment-searchidx01 , Beta cluster has been migrated to ElasticSearch over the summer.

December 10

23:18 bd808: Freed 4.2G on deployment-jobrunner08.pmtpa.wmflab by deleting files in /tmp
23:17 bd808: deployment-jobrunner08.pmtpa.wmflabs is out of disk on /
21:11 hashar: used git fetch && git reset --hard on Flow extension. Just to be sure

December 9

09:52 hashar: added dan-nl so he can look at MediaWiki log files when playing with glamtoolset
09:44 hashar: deleted old jobs from commonswiki job queue (up to timestamp 20130315031930)

December 8

20:55 bd808: chmod -R a+w deployment-bastion:/data/project/upload7/private/gwtoolset

December 6

22:22 hashar: rebooting deployment-cache-text1 (aka text varnish)
22:11 hashar: upgrading packages on text cache / running puppet and rebooting it.
21:15 MaxSem: Debugging apache on deployment-apache33, may look hung
11:24 hashar: upgrading varnish on deployment-parsoidcache
11:18 hashar: made sure puppet agent is enabled on varnish caches and reran it manually
11:13 hashar: shut downing deployment-search01 and deployment-searchidx01. They were for Lucene search. We use Elastic search now.
11:12 hashar: upgrading varnish on deployment-cache-mobile01
11:12 hashar: upgrading varnish on deployment-cache-bits03
11:09 hashar: upgrading varnish on deployment-cache-text1

December 3

22:42 hashar: upgrading packages on deployment-cache-bits03 && rebooting
22:36 hashar: rebooting deployment-cache-mobile01
22:36 hashar: Parsoid got broken since last Friday (serving pages from production …) bug 57926
22:34 hashar: rerunning puppet continuously on deployment-cache-mobile01 + apt-get upgrade of varnish

November 22

19:48 hashar: mwscript update.php --wiki=labswiki --quick (for OAuth database updates
19:47 hashar: manually fixed some permissions rights that prevented automatic deployed of mediawiki-config.it Been broken since rougly Nov 21st at 7pm UTC.

November 19

16:17 hashar: applying role::ci::slave::labs::common class on deployment-parsoid2
16:07 manybubbles: rebuilding elasticsearch indexes to suck up configuration changes
16:03 manybubbles: running puppet on elasticsearch machines and restarting elasticsearch to suck up new configuration

November 18

11:20 hashar: Cleaned out Parsoid submodule: sudo su - mwdeploy then cd /home/wikipedia/common/php-master/extensions/Parsoid && git reset --hard origin/master && cd .. && git submodule update --init Parsoid

November 15

18:41 manybubbles: rebuilding Cirrus search indexes to have the 2 replicas like production
14:51 manybubbles: rebuilding search indexes using jobs for testing
14:09 hashar: rebooting both apaches
14:08 hashar: rebooting sql and sql02
14:05 hashar: upgrading mysql on -sql

November 14

22:54 hashar: upgrading packages on -jobrunner08
20:38 manybubbles: updating search indexes in labs
00:44 MaxSem: Rebooting deployment-solr, jetty (or java?) is FUBAR

November 11

14:18 hashar: Flow was no more functional due to some backtrace in Parsoid daemon (bug 56781). Solved by upgrading Parsoid, reinstalling its dependencies and restarting it. Test page is http://en.wikipedia.beta.wmflabs.org/wiki/Talk:Flow_QA
14:14 hashar: deleting and reinstalling Parsoid node modules dependencies
14:13 hashar: changing Parsoid from 4 months old cdbfdbb to 986c1e7
13:47 hashar: upgrading varnish on all caches.

November 7

09:39 hashar: rebooting apache33 for kernel upgrade
09:38 hashar: rebooting apache32 for kernel upgrade
09:19 hashar: reenabling puppet on deployment-apache33
09:15 hashar: deleted sudo policy 'webadmins' only had petrb in it with no specific access.
09:14 hashar: removed sudo group 'admin', removing root access from any volunteers
09:08 hashar: Restarted bits varnish to clear out the cache.

November 6

12:09 hashar: apt-get dist-upgrade on deployment-eventlogging
11:38 hashar: upgrading packages on deployment-parsoid2

November 5

21:39 hashar: applying role::logging::mediawiki::errors on deployment-fluoride. Should get a listener of some sort on port 8423 to receive fatal/exceptions
16:16 hashar: fixed up mediawiki/extensions.git which still add the deleted extension WikibaseDatabase . That has been blocking code update since Oct 30th.

October 28

13:16 manybubbles: restarted elasticsearch nodes to pick up new config

October 19

20:29 wm-bot: petrb: did mwscript changePassword.php --wiki enwiki --user PiRSquared --password mooh

October 15

21:04 hashar: -bastion rebooted, restarted udp2log : /etc/init.d/udp2log stop; /etc/init.d/udp2log-mw start
21:03 hashar: rebooting deployment-bastion for NFS config fix.

October 14

22:00 wm-bot: hashar: made /data/project/logs group writable, it belongs to nemobis :/
10:12 hashar: purged varnishhtcpd on deployment-upload04 to make it start again.
09:53 hashar: rebooting all varnish caches ( deployment-cache-text1 deployment-cache-upload04 deployment-cache-bits03 deployment-cache-mobile01 )
09:47 hashar: mobile varnish frontend cache is not starting anymore : /usr/lib/x86_64-linux-gnu/varnish/vmods/libvmod_netmapper.so: cannot open shared object file: No such file or directory bug 55662

October 11

10:48 hashar_: beta is back up :-]
10:30 hashar: resyncronizing mediawiki/extensions.git submodules.
10:16 hashar: git directory of mediawiki/extensions was borked following the NFS migration. Ffixing it up manually
10:05 hashar: stopped udp2log, started udp2log-mw
10:04 hashar: rebooting deployment-bastion
10:04 hashar: Jenkins jobs failing, jenkins-deploy user apparently can't write to its home dir /home/jenkins-deploy/workspace

October 7

10:48 hashar: applied iptables rules for bug 45868 on deployment-apache{32,33} and jobrunner08
10:05 hashar: applied iptables NAT rules on deployment-bastion bug 45868

October 4

19:32 MaxSem: Created table bug_54847_password_resets on all wikis

October 3

13:22 manybubbles: finished rebuilding search indexes after cirrussearch update
00:38 manybubbles: rebuilding search indecies after cirrussearch update

September 30

08:16 hashar: upgrading and restarting memcached on memc0 and memc1 to let them limit their memory at 15GB instead of 89G bug 52378

September 24

13:38 manybubbles: indecies finished rebuilding some time last night.

September 23

16:26 manybubbles: rebuilding search indecies after new index config deployment

September 20

13:39 manybubbles: rebuilt most search indecies in beta but commonswiki crashed late last night so it is half rebuilt. filing bug.

September 19

19:26 manybubbles: elasticseach filled up the system disk on its hosts so I moved its data to /mnt with a symlink.
18:37 manybubbles: rebuilding search indecies after a few merges in cirrussearch

September 17

20:05 hashar: upgrading PHP on bastion, jobrunner and apaches to from 5.3.10-1ubuntu3.7+wmf1 to 5.3.10-1ubuntu3.8+wmf1
19:00 manybubbles: upgraded elasticsearch in beta to 0.90.4
18:08 manybubbles: upgrading elasticsearch in beta to 0.90.4 so we can make sure it works so we can use some new features in it

September 10

16:29 hashar: rebooted bastion after some nfs outage. Stopped udp2log, started udp2log-mw

September 7

01:07 manybubbles: rebuilding search indecies on beta after lots of updates

September 3

14:59 hashar: upgrading PHP5 ( 5.3.10-1ubuntu3.7+wmf1 ) on deployment-apache32, deployment-apache33 and deployment-jobrunner08
14:55 hashar: upgrading PHP5 package on deployment-bastion

August 26

17:47 manybubbles: rebuilding search indecies to unbreak CirrusSearch....

August 20

18:47 manybubbles: rebuild search indecies after some changes to indexing code.

August 19

19:05 manybubbles: rebuilding the search indecies to pick up some recent changes

August 12

19:10 manybubbles: rebuild search indecies for updates
18:45 manybubbles: rebuilding all search indecies using updates
18:45 manybubbles: unstuck CirrusSearch so it'd update.

August 8

17:41 manybubbles: simplewiki's search index has completed building. All search indecies should now be up to date.
15:53 manybubbles: reindexed all wikis to add accent squashing. simplewiki is still rebuilding but I reindexed what was complete and starting the rebuild again so it'd pick up accent squashing.
11:55 manybubbles: all search indecies have finished building except simplewiki

August 7

20:31 manybubbles: rebuild all the small search indecies. waiting on enwiki, enwikivoyage, simplewiki, and commonswiki.
20:08 manybubbles: rebuild search indecies after large-ish code change to CirrusSearch
07:27 andrewbogott: rebooted deployment-memc1 and deployment-memc0 (not at the same time) while freeing up space on virt servers.

August 4

22:57 wm-bot: platonides: test
22:56 wm-bot: platonides: git reset --hard to restore /data/project/apache/common-local/php-master/extensions/Translate/specials/SpecialManageGroups.php (bug 52534)
22:54 wm-bot: platonides: --help

August 3

17:59 manybubbles: looks like simplewiki's search index finally finished. party time.

August 1

16:27 manybubbles: building search index for commonswiki and the other wikis that aren't in the main section of http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:SiteMatrix
12:46 manybubbles|away: enwikivoyage's search index finished building over night. dewikivoyage seems to have stalled out. I'm going to profile it. simplewiki is still running and will need some love to finish more quickly.
09:02 hashar: rebooted both memcached instances to be able to log on them. Apt upgrading both of them
08:57 hashar: Deleting deployment-cache-upload03 , replaced by the fully puppetized instance deployment-cache-upload04
08:57 hashar: Deleting the old squid instance since we run varnish cache for text nowadays

July 31

21:37 ^d: Fixing permissions on /mnt/upload7/wikivoyage to be like the other domains
21:24 manybubbles: dewikivoyage and enwikivoyage are still building. simplewiki crashed. https://bugzilla.wikimedia.org/show_bug.cgi?id=52353
21:24 manybubbles: built and populated search indecies for all wikis exception dewikivoyage, enwikivoyage, and simplewiki.
20:45 manybubbles: building search indecies for beta
20:44 hashar: manybubbles never logs anything.
11:00 hashar: Migrated beta code updater from shell to python. https://integration.wikimedia.org/ci/job/beta-code-update/

July 30

22:05 ^d: Memcached moved off of the apache instances to their own dedicated hosts (-memc0 and -memc1). Should have a lot more memc storage now.

July 29

23:44 hashar: fixed up timeline on beta, it never worked there. Thanks ^demon !
13:29 hashar: rebuilding l10n cache, has been broken for a while

July 26

21:21 hashar: applying misc::syslog-server on deployment-bastion to make it a syslog server bug 36748

July 24

20:41 hashar: restarted memcached on both apache boxes. Might clear their caches.
20:40 hashar: apt-get upgrading apache32 and apache33. Running puppet on them
11:30 hashar: manually running sync-site-resources : su - apache -s /bin/bash then /usr/local/bin/sync-site-resources

July 23

09:03 hashar: restarted varnish text cache

July 22

07:57 hashar: deleting deployment-varnish-t3 , used as a mobile cache, now replaced by deployment-cache-mobile01
07:56 hashar: deleting deployment-puptest , unused, no class applied

July 19

19:55 hashar: rebooting deployment-cache-text01.pmtpa.wmflabs , can't access it

July 18

12:41 hashar: Text cache was not in wgSquidNoPurge, that caused all requests to be interpreted as coming from the text cache causing misc issue (such as throttling account creation for everyone).

July 17

08:22 hashar: beta jenkins jobs statuses are now listed on the CI main page at https://integration.wikimedia.org

July 16

23:58 ori-l: applying 'role::eventlogging' to i-00000733
14:57 hashar: Retriggering the database updating job manually https://integration.wikimedia.org/ci/job/beta-update-databases/
14:57 hashar: restored /data/project/apache/common-local/php-master/extensions/Diff/Diff.php got deleted somehow by git

July 10

08:57 hashar: rebooting -sql instance to make it use NFS as /home
08:06 hashar: shutting down deployment-cache-upload03
08:04 hashar: migrating upload.beta.wmflabs.org from cache-upload03 (lucid/squid) to cache-upload04 (precise/varnish)

July 9

19:34 hashar: Attempting to reboot a bunch of instances prevent ssh access because /home is borked . uploadtest08 uploadtest07 -cache-upload04 -cache-text01 parsoid2 cache-mobile01 deployment-sql02 cache-upload03
17:42 aude: added Yuvipanda to the project

July 8

08:27 hashar: rebooting deployment-cache-text1 , maybe I can get ssh access this wa
08:26 hashar: Set $wgLoadScript to points to bits instead of the wiki local docroot. 70322
08:08 hashar: rebooting deployment-cache-upload04

July 3

22:33 aude: repopulated sites table
08:25 hashar: Created a beta logo for enwiki. Purged the related URL manually on deployment-cache-upload03 using squidclient -p 80 -m PURGE http://upload.beta.wmflabs.org/wikipedia/en/b/bc/Wiki.png

July 2

15:18 hashar: deleting deployment-searchidx02 , not being used
13:38 hashar: restarted mw-cgroup upstart service on apaches box. That recreated the wgCgroup directory /sys/fs/cgroup/memory/mediawiki
13:10 hashar: removed iptables 'nat' rule from deployment-upload
13:10 hashar: pointed deployment-upload thumb handler to the varnish cache text instead of squid. Done by editing /data/project/upload7/scripts/thumb-handler.php
12:58 hashar: installing iptables on deployment-upload
12:57 hashar: Updating iptables rule that workaround NAT issue in beta. Applied on deployment-searchidx01 and deployment-upload : iptables -t nat -I OUTPUT --dest 208.80.153.219 -j DNAT --to-dest 10.4.1.133 See also bug 45868

July 1

22:05 hashar: upgraded packages on deployment-eventlogging
18:57 hashar: deleted deployment-nginx-test , not needed anymore, nginx proxies for mobile are working
14:42 hashar: Migration to the new mobile instance was tracked by bug 49469
14:40 hashar: shutdowning deployment-varnish-t3 (replaced by deployment-cache-mobile01
14:40 hashar: binding mobile IP address 208.80.153.143 to deployment-cache-mobile01
14:38 hashar: rebooting deployment-varnish-t3
14:35 hashar: updated puppet repository on deployment-varnish-t3 and running puppet there
14:34 hashar: applying role::protoproxy::ssl::beta on deployment-cache-mobile01 (attended to replace varnish-t3 for mobile caching)
14:14 hashar: rebooting deployment-cache-mobile01
13:54 hashar: attempting to enable HTTPS on the varnish text cache by applying role::protoproxy::ssl::beta
12:28 hashar: restarted both apaches. Beta has been down for a couple hours due to a NFS issue on labstore3.
08:43 hashar: Shutdowning deployment-squid , service migrated to deployment-cache-text01 (varnish).
08:36 hashar: Switching the text cache traffic from deployment-squid to deployment-cache-text1 by reassociating the public IP 208.80.153.219

June 26

09:22 hashar: Squid restarted properly, that fixed some stalled resource loader entries that were causing some outdated Javascript modules to be served. Fixed at least an inconsistency such as bug 49911
09:17 hashar: recreated squid swap directories with `squid -z` restarting squid
09:14 hashar: stopping squid and pruning cache

June 24

22:01 hashar: clearing memcached , that might cleanup some resource loader cache causing bug 49911 "nab collapse missing in beta"
15:42 hashar: restarted lucene-search-2 on searchidx01
15:37 hashar: upgrading -searchidx01 and refreshing puppet manifests

June 20

20:45 hashar: Jasper Deng joined in AbuseFilter manager group :)
20:23 hashar: VisualEditor self updated on beta, it was stuck due to a misconfiguration in gerrit bug 49846

June 19

08:36 hashar: Fixing up the abuse filter central DB to points to 'labswiki' instead of the non existent 'metawiki' 69461. Suggested by Steinsplitter :)

June 18

22:15 hashar: Running /usr/local/bin/sync-site-resources 68309
22:13 hashar: Applying MaxSem 'misc::beta::sync-site-resources' to deployment-bastion. That syncs .css articles from production to beta!

June 17

16:25 hashar: Apache was down on apache32. Restarted it as well as on apache33.. Solved bug 49700
16:22 hashar: varnish-t3 (mobile cache): cleaned up operations/puppet local repo and re ran puppet. Still blocked :/ bug 49700
11:16 hashar: created /data/project/apache/uncommon/master , owned by mwdeploy:mwdeploy and mode 0755.

June 12

08:04 hashar: Creating deployment-cache-upload04 using a Precise image. The aim is to replace deployment-cache-upload03 which runs Lucid (see also bug 49470

May 30

08:27 hashar: Added Nikerabbit to the project. Will setup solr for translate

May 28

18:52 hashar: Fixed up the Wikidata wiki http://wikidata.beta.wmflabs.org/wiki/Wikidata:Main_Page bug 47827

May 24

12:24 hashar: creating a dumb proxy blocker touch /data/project/apache/common-local/php-master/../wmf-config/mwblocker.log
12:20 hashar: mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki en special wikidatawiki wikidata.beta.wmflabs.org
12:20 hashar: attempting to install wikidata

May 20

18:30 hashar: Added Krenair to the project
08:21 hashar: removing thumbnails from the Gluster shared directory: cd /data/project/upload7 && find -maxdepth 3 -wholename '*/thumb'|xargs -n1 -P4 rm -v -fR

May 16

14:53 hashar: restarted job service on jobrunner08 : /etc/init.d/mw-job-runner restart . It was missing /usr/local/apache/common 64057 and 64065 fix it by using a symlink to /data/project/apache just like on apache webservers.

May 14

12:53 hashar: deleting deployment-lucene, we are using search01 and searchidx01
12:49 hashar: rebooting -cache-upload03 for kernel upgrade
12:49 hashar: rebooting -sql for kernel / mysql upgrade
12:47 hashar: rebooting -squid for kernel upgrade
12:46 hashar: upgrading bunch of boxes.
12:28 hashar: refreshing l10n cache
12:28 hashar: fixed l10n cache ownership: chown -R l10nupdate:l10nupdate /data/project/apache/common-local/php-master/cache/l10n/
11:58 hashar: l10n cache is broken since Apr 30th 15:21 utC
11:44 hashar: We somehow have HTTPS on beta now! https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page still have to fix up the cert names though.
11:20 hashar: deployment-squid applying role::protoproxy::ssl::beta
11:15 hashar: deployment-varnish-t3 applying role::protoproxy::ssl::beta
11:14 hashar: deployment-varnish-t3 : updating local puppet repo
10:43 hashar: deployment-cache-bits03 + role::protoproxy::ssl::beta (should give us https on bits.beta.wmflabs.org

May 9

21:32 hashar: added mattflaschen as a sysadmin

May 7

08:38 hashar: Created deployment-nginx-test to try out the nginx manifests for SSL.

May 6

09:36 hashar: Adding ArielGlenn as a member/sysadmin

May 3

01:29 hashar: bastion: /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start
01:18 Coren: rebooted deployment-bastion after manual workaround for a broken puppet run

May 1

12:43 hashar: migrating jobruner08 and video05 to the NFS server
12:43 hashar: updated puppet manifests on -video05 34ad3d6..32fef26

April 30

21:35 hashar: Fixed the git path in mediawiki/extensions.git local copy of -bastion
21:04 hashar: both apaches are now serving content from the NFS cluster.
20:58 hashar: Migrating apache-33 to use the new NFS server
20:58 hashar: apache-32 running with NFS went from 560ms to 260ms when serving pages \O/
20:49 hashar: Recreated wikiversions.cdb on bastion for the new NFS home dir
20:08 hashar: migrating apache32 to new NFS server
20:06 hashar: root@deployment-bastion:~# /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start
20:01 hashar: applying role::labsnfs::client on -bastion
19:45 hashar: applying the very recent `role::labsnfs::client` class on deployment-integration
19:43 hashar: Upgraded puppet manifests on deployment-integration and running puppet.
19:21 hashar: Migrating homes to the new NFS server
18:27 hashar: rsync to the NFS server are completed. There are most probably still some tiny files than need to be copied though
16:46 hashar: Mounted new NFS server on /srv/project on instances: apache32, apache33, video05 and jobrunner08
16:01 hashar: Clearing out years old backup from /data/project such as copy of extensions, databases dumps and some old instances backups.
15:28 hashar: Copying l10n cache to the new NFS server: rsync -av /home/wikipedia/common/php-master/cache /srv/project/apache/common/php-master
15:11 hashar: syncing upload data from the Gluster share to labnfs server: rsync -avv /data/project/upload7 /srv/project
13:59 hashar: bastion: created NFS mount point thanks to Coren. echo 1 >/sys/module/nfs/parameters/nfs4_disable_idmapping ; mount -t nfs -o nfsvers=4,port=0,hard,rsize=65535,wsize=65536 labnfs.pmtpa.wmnet:/deployment-prep/project /srv/project
12:41 hashar: Refreshed most extensions and running mw-update-l10n

April 29

21:00 hashar: updated MobileFrontend manually to 9356d00ac5

April 26

15:56 MaxSem: Enabled GeoData cronjobs

April 24

12:41 hashar: on searchidx01 iptables -t nat -I OUTPUT --dest 208.80.153.219 -j DNAT --to-dest 10.4.0.17 (see bug 45868 )

April 23

08:32 MaxSem: Deployed GeoData

April 22

20:49 hashar: Manually updating all mw extensions to make sure everything works fine.

April 21

21:17 hashar: beta is up again. Apache2 could not start because the error log file was not accessible ( bug 47479 )
20:33 hashar: Apache down on both apaches instances

April 19

19:50 hashar: The l10n cache was stalled since Mar 22 13:08 at least. The files were owned by `mwdeploy` seems something changed and they are now owned by `l10nupdate` So I ran: chown l10nupdate -R /home/wikipedia/common/php-master/cache/l10n/
19:46 hashar: Attempted to update the l10n cache (sudo -u mwdeploy mw-update-l10n ) got a permission deny on /home/wikipedia/common/php-master/cache/l10n
19:43 hashar: Gluster is broken on beta. Extensions are no more updating nor the l10n update can run. bug 47425
19:38 hashar: root@deployment-bastion:~# /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start
19:37 hashar: Rebooting bastion. Seems GlusterFS can not allocate memory ( bug 47425 )
19:18 hashar: manually updating mediawiki extensions
11:52 hashar: Successfully added Mark Bergsma to deployment-prep.
09:00 hashar: Updating puppet repositories on search01 and searchidx01. Running puppet on both of them.

April 18

13:12 hashar: update mobile cache (varnish-t3) to latest puppet manifests.

April 16

16:02 hashar: Updating mobile cache to use some mark change 59401
09:35 hashar: applying role::cache::mobile to deployment-cache-mobile01 (that will replace deployment-varnish-t3 eventually)
09:25 hashar: Updating mobile cache (deployment-varnish-t3) to patchset 47567/9 . Some puppet changes got merged in this morning :-]

April 15

14:03 hashar: Rebooting the fresh deployment-cache-upload-test6 instance
12:49 hashar: -cache-upload03 refreshing local puppet repo

April 10

18:24 ^demon|sick: ran mergeMessageList.php for php-master wikis
13:33 hashar: Restarted the database updating job https://integration.wikimedia.org/ci/job/beta-update-databases/374/
13:32 hashar: switching udp2log on bastion: /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start (see bug 38995 )
13:31 hashar: rebooting deployment-bastion too : gluster issue
13:26 hashar: Cluster is back up :-]
13:25 hashar: rebooting both apaches.
13:24 hashar: Gluster failure again /data/project/apache/conf/ has some files missing: www.wikipedia.conf en2.conf wikimedia.conf
13:23 hashar: apache2: Syntax error on line 324 of /etc/apache2/apache2.conf: Syntax error on line 9 of /etc/apache2/wmf/all.conf: Could not open configuration file /etc/apache2/wmf/www.wikipedia.conf: No such file or directory
13:20 hashar: apt-get upgraded apache32 and apache33 . Note that apache is down on them.
13:19 hashar: no pages being served. Most probably a PHP Fatal error
13:13 hashar: reran Jenkins job https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update/ . Some git failures happened in /home/wikipedia/common .
06:45 hashar: searchidx01 : restarted lucene-search-2 might have been killed by OOM killer (see bug 46459
06:39 hashar: search01 : restarted lucene-search-2 , was not listening on port 8123.

April 8

20:51 hashar: deployment-search01 : /usr/bin/java -Xmx2000m :-]
20:50 hashar: Changing lucene-search-2 memory usage from 20G to 2G by manually editing /etc/init.d/lucene-search-2 (see bug 46459 )
20:00 hashar: deployment-search01 updating local puppet repo c345581..7d036cb
19:57 hashar: deployment-searchidx01 updating local puppet repository 81f5a93..7d036cb

March 29

12:00 hashar: MobileFrontend should not let user login again (bug 46649, the issue was most probably caused by the lack of commonswiki on beta.
11:53 hashar: restoring commonswiki on beta 56593.
10:40 hashar: rebooting jobrunner08 and bastion. High network use too.
10:38 hashar: rebooting both apaches instances. They consume ton of network, most probably related to Gluster

March 25

19:46 hashar: -bastion : restarting puppet. Restarting beta autoupdater.
15:01 hashar: Updated database enwiki
14:26 hashar: getting lazy, dropping Central Notice tables from enwiki and rerunning updater.
14:16 hashar: Attempting to fix central notice database schema for enwiki
11:46 hashar: removing local hack made to ArticleFeedback data/maintenance/DataModelPurgeCache.php
11:44 hashar: /home/wikipedia/common/php-master/extensions : git remote update && git reset --hard origin/master && git submodule update --init
11:40 hashar: Resetting the extensions checkout. Been broken for a few days because of extension renaming.

March 22

21:11 hashar: Search is back! Turns out that lucene-search2 service was not running on deployment-search01 despite puppet ensure => running on the service :( See also bug 46459
21:03 hashar: Starting lucene-search-2 on deployment-search
15:00 hashar: manually update puppet sources on -search01 and -searchidx01
14:40 hashar: manually refreshing extensions on -bastion
14:39 hashar: Updated mediawiki/extensions.git which was lacking the Thanks extension 55263
14:39 hashar: I have setup a Jenkins job to automatically update mediawiki-config. Dashboard is https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update/

March 21

22:37 hashar: -bastion : stopping udp2log that prevents udp2log-mw from running :/ See bug 38995
22:35 hashar: both apaches gives out Error 500 so beta is now serving blank pages.

March 19

09:23 hashar: created sudo policy for jenkins-deploy user. That is the user for the Jenkins slave running deployment-bastion

March 15

13:41 hashar: -squid killed nrpe and restarted it (just to be sure)
13:39 hashar: -squid started puppet service
13:38 hashar: -squid ran puppet manually which deployed the new redirector from https://gerrit.wikimedia.org/r/53935
13:34 hashar: -squid shows a ton of stalled `redirector` processes. Killed them all.

March 14

22:20 hashar: deployment-bastion is now a jenkins slave of the production Jenkins machine
22:13 hashar: manually installing openjdk-7-jre on -bastion
22:02 hashar: Successfully added jenkins-deploy to deployment-prep.
21:59 hashar: adding jenkins-deploy to the project
21:37 hashar: removing restrictions from deployment-bastion . authorized_keys is not read when in labs :] (thx Ryan)
21:16 hashar: -bastion changed restricted_to to (project-deployment-prep) (jenkins)
21:15 hashar: on -bastion: Added group restrictions and set variable restricted_to = (project-jenkins) (jenkins) thanks ryan
21:02 hashar: creating jenkins homedir manually on -bastion
20:38 hashar: applying jenkins::user to deployment-bastion
19:03 hashar: rebooting -bastion to find out whether the security rule is applied
19:01 hashar: updated security rule to allow TCP port 22 connection from gallium.wikimedia.org [208.80.154.135/32]
04:22 hashar: killed job runners on jobrunner08 and restarted service
04:22 hashar: Restarted apache on apache32,33
04:20 hashar: Upgrading apache32, apache33, video05 and jobrunner08
02:13 hashar: Trying out geoip module from 53714 on deployment-integration

March 13

04:49 hashar: rebooting deployment-integration
04:47 hashar: rebooting deployment-lucene

March 12

20:39 Chad: added port 1099 to search engine security group to allow RMI messaging to go through

March 11

05:44 hashar: Running MediaWiki update.php on all databases

March 8

20:15 hashar: The search backend is apparently working now !!! bug 34250
00:46 hashar: upgrading all instances

March 7

23:43 hashar: OAI repository set up on beta !!! bug 45814
23:42 hashar: for squid login=PASSTHRU replaced by login=PASS. Reloaded squid.
23:41 hashar: reloading squid
23:41 hashar: setup squid to pass the WWW-Authorization headers to the Apache. Done by configuring login=PASSTHRU for each cache_peer (*crosses fingers*)
22:15 hashar: Set up an OAI repository user for lucene search. Password in puppet.
22:04 hashar: Restored mysql admin password on deployment-sql
21:57 hashar: stopping mysql server on -sql
21:31 hashar: Creating OAI repositories on sql and sql02 master databases
21:11 hashar: updating mediawiki-config fc22500..71e689a

March 6

20:21 hashar: creating deployment-searchidx02 wich has 16GB of RAM. deployment-searchidx01 does not have enough RAM :(
19:59 hashar: rebooting apache33 : gluster mount is corrupted
19:28 hashar: regenerating lucene prefixes
19:02 hashar: refreshed wikiversions.cdb
18:37 hashar: rebooting search indexer
00:47 hashar: Trying to import enwiki database on the lucene search deployment-searchidx01 : sudo -u lsearch /a/search/lucene.jobs.sh import-db enwiki

March 5

19:01 hashar: Log
00:59 hashar: reinstall the search box packages to make sure they use /a as a mount of /dev/vdb
00:09 hashar: rebooting deployment-search01, stuck somehow

March 4

17:47 hashar: removing all 'aft%' tables to make sure ArticleFeedbackv5 database schema is valid bug 45318
17:43 hashar: set a dummy value for wmgTranslationNotificationUserPassword

March 1

20:39 hashar: Search boxes are now having 51677 patchset 5 applied. Still have to figure out how Lucene works though
19:17 hashar: Applying puppetmaster::self to both search boxes
18:35 hashar: Created deployment-search01 and deployment-searchidx01
18:12 hashar: deleting -dbdump, migrated udp2log on -bastion

February 27

17:34 hashar: updating mediawiki-config 8d1aac9..10bda3a
17:22 hashar: foreachwikiindblist /home/wikipedia/common/all-labs.dblist update.php --quick --quiet

February 26

22:48 hashar: running database update for enwiki
18:50 hashar: adding ram and demon to the project

February 25

17:45 hasharMeeting: mwscript update.php --wiki=testwiki

February 19

15:54 hashar: applied role::cache::text on deployment-cache-text01

February 18

18:03 hashar: running apt-get distupgrade on -cache-text01 , -sql04 and -sql03
18:02 hashar: running apt-get distupgrade on -cache-upload04
17:57 hashar: applying role::cache::upload to -cache-upload04
15:31 hashar: mobile redirection is more or less in place on beta. Browsing with a mobile agent will redirect to the mobile version.
14:32 hashar: apaches giving errors cause wikidatawiki is not configured
14:22 hashar: wikidatawiki is missing oh no beta dead again
14:13 hashar: fixed puppet on -squid, it was blocked by attempting to apply a non existent class: generic::package::git-core
14:08 hashar: applying the new squid::redirector class to deployment-squid so we can handle mobile redirects
11:35 hashar: Deleting -mc instance, memcached is now on apaches
11:27 hashar: rebooting -bastion again
11:26 hashar: got Gluster client upgraded on -bastion
11:21 hashar: rebooting -bastion
11:18 hashar: migrating memcached from -mc to the apaches boxes. 49261
09:07 hashar: Running update.php on all databases.

February 15

17:17 labs-logs-bottie: petrb: rebooting bastion to fix some issues with mw
17:14 hashar: running "mw-update-l10n --verbose" on -bastion as mwdeploy user
15:37 hashar: puppet properly start the apache2 service. Fixed bug 38996
15:20 hashar: rebooting apaches box to find out whether apache2 service is bring up 47398

February 14

14:32 hashar: git maintenance override. Now running: git submodule foreach 'git repack -a -d --depth=250 --window=250'
14:30 hashar: doing some git maintenance: cd /home/wikipedia/common/php-master/extensions ; git submodule foreach 'git gc --aggressive && git repack -a'

February 13

19:41 hashar: starting apaches manually
19:37 hashar: rebooting both apaches. Gluster seems to be stalled
19:32 hashar: rebooting bastion
19:31 hashar: restarting squid
19:20 hashar: updating mediawiki-config to latest master : Updated 70fec38..7c4810c
15:48 hashar: on bastion: stopped puppet and wmf-beta-autoupdater , running git pull manually in php-master/extensions
15:22 hashar: Running mw-update-l10n manually as user mwdeploy. Should regenerate the l10n cache
15:21 hashar: chown -R l10nupdate:l10nupdate /data/project/apache/common-local/php-master/cache/l10n
15:21 hashar: Mutante merged a sudo right change that would unblock the beta auotupdater ( see https://gerrit.wikimedia.org/r/#/c/47795/ )

February 11

11:02 hashar: Granted anth1y sudo access on the project so he can plays with Lucene
11:02 hashar: Added anth1y to the project, he is interested in Lucene / Swift stuff :-D

February 4

14:24 hashar: Started the over long l10n cache rebuild in a screen on deployment-bastion
14:09 hashar: -bastion applying misc::deployment::scap_scripts
14:07 hashar: -bastion removing role::deployment::deployment_servers::labs
13:35 hashar: Applying role::memcached to apache32 and apache33
13:24 hashar: manually updating extensions to make sure the beta autoupdater works properly
13:19 hashar: the infamous beta auto updater is back in action on deployment-bastion
13:07 hashar: starting apache2 on apache32 and apache33
13:06 hashar: multiversion/refreshWikiversionsCDB
13:05 hashar: refreshing /home/wikipedia/common from latest master (no more newdeploy branch)
13:04 hashar: REVERTED GIT-DEPLOY!!!!! rm /data/project/apache/common-local (symlink) and restored backup: mv /data/project/apache/common-local.pre-git-deploy /data/project/apache/common-local
13:00 hashar: rebooting apache32 (locked / can't login)
12:52 hashar: rebasing /srv/deployment/mediawiki/common
12:41 hashar: -dbdump : stopping udp2log, starting udp2log-mw
09:24 hashar: upgrading / rebooting all instances
09:18 hashar: Beta is broken in some random and creative ways AGAIN. /home on bastion is corrupted, some instances do not let us connect anymore, apache docroot disappeared.

February 1

10:21 hashar: nslcd probably points to a wrong LDAP or has a faulty DNS configuration. Can't login on it anymore :/
10:12 hashar: rebooting the varnish-t3 instance, nslcd can't resolve somepath

January 31

15:49 hashar: Deleting out /data/project/squid1 which has been migrated to /mnt/squid_cache. The gluster volume for data-project is corrupted on beta so we don't want to use it anymore.
15:46 hashar: stoping squid, migrating ufs cache from /data/project/squid1 (gluster) to /mnt/squid_cache
15:42 hashar: cleaned out deployment-squid:/mnt/ (add an old enwiki dump and some squid files
15:19 hashar: restarting squid process on deployment-squid
15:18 hashar: starting apache2 on -apache32
14:53 petan: restarted squid and rebooted apache32

January 30

15:17 hashar: removing -cache-bits-02 (been replaced a long time ago by -cache-bits-03)

January 21

12:42 hashar: -varnish-t3 : removing /dev/sda* entries from /etc/fstab , applying 44709 ps 6 and rerunning puppet
12:28 hashar: applying role::cache::mobile on deployment-varnish-t3
11:27 hashar: created deployment-varnish-t3 , deleted deployment-varnish-t2
10:48 hashar: moved 208.80.153.143 from deployment-varnish-t to deployment-varnish-t2 (IP is in DNS as *.m.beta.wmflabs.org )
10:38 hashar: creating deployment-varnish-t2 to replace broken deployment-varnish-t
10:25 hashar: re rebooting dpeloyment-varnish-t
09:58 hashar: Rebooting deployment-varnish-t from labsconsole. I guess there is a mount for /dev/sda* :(
09:51 hashar: rebooting deployment-varnish-t to find out how well it goes on restart :-]

January 18

21:20 hashar: deployment-varnish-t : apt-get upgrade
21:18 hashar: running puppet on cache-bits03 to find out whether role::cache::bits cleanly apply there.

January 16

02:07 Reedy: Created geo_tags tables on all deployment-prep wikis

January 15

21:49 hashar: ln -s /srv/deployment/mediawiki/common /data/project/apache/common-local
21:49 hashar: renamed /data/project/apache/common-local to common-local.pre-git-deploy

January 14

22:58 hashar: renamed php-1.21wmf{6,7} with a -back prefix. Created symbolic links to the git-deploy slots: ln -s /srv/deployment/mediawiki/slot1 php-1.21wmf6 and /srv/deployment/mediawiki/slot0 php-1.21wmf7
22:56 hashar: updating mediawiki-config fd29e6a..329113f

January 11

17:08 wm-bot: this is a creepy log with | and such shitty chars $@#% 6346 w@#%^@# 6bla

January 10

08:29 Ryan_Lane: deployed all repos to destination hosts
08:29 Ryan_Lane: made deployment-bastion a git-deploy deployment host
08:18 hashar: removed misc::deployment::scripts from -bastion, already provided by misc::deployment::scap_scripts
08:09 hashar: put back role::beta::autoupdater on -bastion

January 9

21:47 hashar: running puppet on apache boxes to get the new role::applicationserver::appserver::beta class
21:40 hashar: migrating apaches box to the new role::applicationserver::appserver::beta (replaces both appserver and imagescaler)
20:40 hashar: removing the phased out imagescaler::labs from apaches in favor of role::applicationserver::imagescaler
20:20 hashar: Migrated Apache box to use role::applicationserver::appserver instead of the old (and no more existent) role::applicationserver
16:04 jeremyb: the recent (today at least, but probably most of the earlier ones too) !logs from wm-bot are really from hashar. in case you were looking for the source.
16:02 hashar: enwiktionary beta now running 1.21wmf6 http://en.wiktionary.beta.wmflabs.org/wiki/Special:Version
15:59 wm-bot: cp php-master/cache/trusted-xff.cdb php-1.21wmf7/ca�che/
15:59 wm-bot: cp php-master/cache/trusted-xff.cdb php-1.21wmf6/cache/
15:56 hashar: Refreshing the TrustedXFF cache: cd /home/wikipedia/common/php-master/extensions/TrustedXFF && mwscript extensions/TrustedXFF/generate.php --wiki=aawiki ../../cache/trusted-xff.cdb
15:46 hashar: Running mw-update-l10n on deployment-bastion in screen 16609.pts-0.i-00000390
15:32 wm-bot: -video05 : restarted puppet and puppetmaster, killed stalled puppet processes. Rerunning puppet manually
15:04 hashar: -video05: running apt-get upgrade
15:04 wm-bot: -video05 : trimmed /var/log/glusterfs/data-project.log file
14:55 wm-bot: reloaded udp2log-mw on -dbdump
14:33 hashar: Made enwiktionary to use 1.21wmf6 and enwikibooks to use 1.21wmf7 42951

January 7

15:04 wm-bot: apt updated and upgraded apache32, apache33 and jobrunner08
14:27 hashar: apache32 / apache33 filling is logged as bug 43703. This is caused by gluster client log files not being rotated which is bug 41104
14:24 wm-bot: manually emptied out /var/log/glusterfs/data-project.log on apache32 and apache33.
14:20 wm-bot: apache32 and apache33 have disk full again.
11:01 hashar: Fixed up extension static assets ( bug 43692 ).
10:36 wm-bot: updating mediawiki config to latest master

December 21

20:18 wm-bot: rebooting some instances so they get the new /home

December 20

08:58 wm-bot: manually updated git puppet repo on deployment-video05

December 19

20:52 hashar: Granted MaxSem and Mgrover sysadmin rights. They are WMF contractors going to work on setting up MobileFrontend on beta.
11:37 wm-bot: finally had GettingStarted extension installed.
10:37 hashar: /home/wikipedia/common/php-master/extensions/.git/FETCH_HEAD gave I/O error. I have deleted it and reran git pull + git submodule update --init aka : UPDATED ALL EXTENSIONS TO THEIR LATEST master VERSION.
10:32 wm-bot: removing live hack on UserMerge extension (attempted to grant some user right to bureaucrat, that should be done in CommonSettings.php )
10:31 wm-bot: manually running 'git submodule update --init' under extensions directory to find out what is going on there
10:10 wm-bot: rebooting apache32 and apache33 to get new /home
09:50 wm-bot: updating mediawiki-config
09:46 hashar: rebooting -bastion to get the new /home

December 4

12:27 hashar: Apache boxes seems to be running again. Had to manually restart apache on apache33.
08:52 hashar: Apache32 is somehow up
08:42 hashar: on apache33 : removed /var/log symlink, recreated directory, restarted gluster, moving files form /data/project/apache33
08:32 hashar: rebooting apache32 so all its service knows about /var/log :-]
08:30 hashar: on apache32 : removed /var/log symlink, recreated directory, restarted gluster, moving files from /data/project/apache32

November 6

15:32 hashar: fixed up beta by repelling EventLogging extension which was in a weiiiiird stat
15:12 hashar: Resetted all extensions to their latest master version...
14:51 hashar: blank pages on beta are caused by the EventLogging extensions being required although it is not pulled.
14:11 hashar_: configured apaches to send their errors log in /home/wikipedia/logs (conf file is /data/project/apache/conf/wmflabs-logging.conf )
10:45 wm-bot: stashing change in /h/w/c
10:14 hashar: made mwdeploy gitconfig file to support color + added the 'git lg' and 'git lg2' aliases which gives a nice + concise log
08:45 hashar: manually running the beta auto updater from -bastion instance

November 5

23:11 hashar: applied the new role::beta::autoupdater class to -bastion.
21:00 hashar: changing ownership of all files in /home/wikipedia/common to mwdeploy:mwdeploy as per deployment-bastion GID/UID. Running as root in screen 27986.pts-1.i-00000390 .
20:56 hashar: rebooting jobrunner06 to ensure that wmf-beta-autoupdater is gone
20:37 hashar: uninstalling beta updater from jobrunner06 , will be deployed on -bastion
20:16 hashar: rebooting deployment-dbump (stalled git processes)

November 2

16:14 hashar: started mw-job-runner on jobrunner08
15:51 hashar: applying role::applicationserver::jobrunner to jobrunner08
13:47 hashar: created a second job runner instance: deployment-jobrunner07
13:18 labs-logs-bottie: petrb: rebooting -bastion to install updates
13:12 anomie: Rebooted deployment-dbdump to clear up hung processes, hopefully clear up NFS weirdness

October 31

09:06 wm-bot: running apt-get upgrade on -dbdump

October 30

15:47 wm-bot: running apt-get upgrade on -cache-upload3 and -sql
15:45 wm-bot: running apt-get upgrade on apaches boxes + squid

October 29

22:11 hashar: Added anomie as a sysadmin (for sudo) and netadmin

October 28

01:43 jeremyb: 23:35:40 < beta-logmsgbot> !log deployment-prep csteipp: running mwscript update.php dewikivoyage to update from 1.13 import
01:43 jeremyb: 23:31:42 < beta-logmsgbot> !log deployment-prep csteipp: added dewikivoyage config to db-wmflabs.php (temp hack)
01:43 jeremyb: 23:30:46 < beta-logmsgbot> !log deployment-prep csteipp: added dewikivoyage to all-wmflabs.dblist
01:43 jeremyb: 23:23:35 < beta-logmsgbot> !log deployment-prep csteipp: ran mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki de wikivoyage dewikivoyage de.wikivoyage.beta.wmflabs.org

October 23

10:00 labs-logs-bottie: j: update DocumentRoot of upload.beta.wmflabs.org to /mnt/upload7 in /usr/local/apache/conf/upload.conf (fixes 403 for videos)

October 22

23:14 hashar: getting to bed :-]
23:14 hashar: Applying database updates to all wiki (in a screen on -dbdump)
23:14 hashar: Started a manual l10n rebuild in a screen on -dbdump
23:13 hashar: Fixed the jobrunner spamming dberror.log (the all-wmflabs.dblist contained databases from production)
23:13 hashar: log

October 21

17:50 beta-logmsgbot: csteipp: reimporting enwikivoyage database

October 19

20:27 Damianz: csteipp: ran git pull of master, kept local dblist conflicts

October 17

20:43 hashar: applying nfs::upload::labs to apache32 and 33. It is not more applied by the role::applicationserver class (prod apply nfs::upload directly on nodes)
20:30 hashar: moving /data/project/upload6 to /data/project/upload7 to match production. See bug 41121
13:44 hashar: -sql02 removed ganglia from host and reran puppet.
13:41 hashar: Added CSteipp and Reedy as sudoers
12:03 hashar: Fixed assets on bits. The static-master symbolic links got removed at some point. See 28337
11:22 hashar: updated mediawiki-config : 6bbf8f2..7caabad
11:18 hashar: emptied /var/log/glusters/data-deployment.log huge files on several instances
10:34 hashar: deployment-jobrunner06 / was filled out by Gluster logs /var/log/glusterfs/data-project.log filled it all :(
10:28 labs-logs-bottie: j: create new videoscaler instance deployment-video05 this time with sql access
08:31 hashar: -sql02 : manually installed mysql server using /mnt/mysql as datadir.
08:17 hashar: removed role::db::core from -sql02 : class is not meant for labs :-]
08:04 hashar: attempting to deploy role::db::core class on deployment-sql02

October 15

14:40 labs-logs-bottie: root: 4% freed of /home weeeee
14:37 labs-logs-bottie: root: moving /home/wikipedia/logs/ to /data/project/logs
14:32 labs-logs-bottie: root: moving /home/johnduhart/ to /data/project/old/h/johnduhart/
10:14 labs-logs-bottie: j: create new videoscaler instance deployment-video04

October 9

22:01 hashar: applying misc::beta::scripts on a precise instance: -jobrunner06
20:46 hashar: moving misc::beta::scripts from -integration to -dbdump
20:39 hashar: Updating all DB following the merge of content handler patch : foreachwiki update.php --quick

September 24

18:43 wm-bot: rebooting -jobrunner06 and
18:41 wm-bot: stopped beta auto updater on -integration and running apt-get dist-upgrade
18:39 wm-bot: stopped jobrunner and killed -9 PHP processes on -jobrunner06
18:39 hashar: running dist-upgrade on -jobrunner06
13:25 Damianz: abcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmno
13:25 hashar: shutdowned deployment-cache-bits02
13:24 Damianz: migrating bits from deployment-cache-bits02 to deployment-cache-bits03
13:23 hashar: log me please
13:22 wm-bot: foo

September 14

17:25 wm-bot: updated mediawiki-config : Updating 4d12ee3..2c14daf
17:22 wm-bot: removed a live hack enabling AFTv5 on all wikis

September 13

23:26 hashar: migrate -dbdump misc::scripts to misc::deployment::scripts

September 7

16:04 wm-bot: updating mediawiki configuration

August 31

13:04 wm-bot: ran git pull in /home/wikipedia/common/php/extensions , bringing up a ton of forgotten exts. Will update autoupdater
12:52 wm-bot: rebuilding l10n cache
12:52 labs-logs-bottie: petrb: fixing repo for OSB
09:22 labs-logs-bottie: petrb: rebuilding localization
09:13 labs-logs-bottie: petrb: OSB doesn't seem to be installed properly, investigating
09:08 labs-logs-bottie: petrb: deployed OSB to enwiki
09:07 labs-logs-bottie: petrb: inserted OSB to update-extensions.sh and extensions

August 30

21:22 hashar: Deployed the automatic code updater on beta. It is running on deployment-integration, service is wmf-beta-autoupdate managed by puppet to always run.
20:55 wm-bot: applying beta::scripts to deployment-integration
20:38 hashar: trying out 22116 on deployment-integration (that is the beta auto upda�her)
16:39 wm-bot: updating all extensions and core to their latest master version
09:00 labs-logs-bottie: petrb: php multiversion/MWScript.php changePassword.php --wiki enwiki --user Petrb --password needed to change
08:12 labs-logs-bottie: petrb: /home/wikipedia/common/php git pull

August 29

08:05 wm-bot: deployment-integration : update puppet git repo to latest master

August 28

15:03 labs-logs-bottie: petrb: updated puppet
14:58 labs-logs-bottie: petrb: we have a new bastion :D
14:58 labs-logs-bottie: petrb: fixed mounts
10:20 labs-logs-bottie: root: rebooting bastion
10:14 labs-logs-bottie: root: test
07:48 wm-bot: migrated Apaches boxes from applicationserver::labs to role::applicationserver
07:41 wm-bot: restarted apache process on Apaches boxes

August 27

15:43 wm-bot: l10n cache rebuild
13:31 wm-bot: rebuilding list of extension messages and rebuilding localization cache
08:14 wm-bot: reverted live hacks made to ConfirmEdit extension. Fix saved in file PlatonidesPatch and in git stash
08:05 hashar: Updating all extensions to latest master
08:05 hashar: updating MediaWiki core : Updating d47c1e9..1c00630

August 20

23:37 j^: replace deployment-video02 with deployment-video03 using rolepuppet class from git (role::jobrunner::videoscaler)

August 17

11:03 Platonides: Logging captcha results at /data/project/bug39446.log with live hack for bug 39446

August 3

15:57 hashar: cache-bits02 would need 13304 to be merged in manually whenever the change is completed. Make sure to report any issue to mark :-]
15:31 hashar: updating cache-bits02 puppet to gerrit 13304/21
13:47 hashar: deployed 13304 PS 14 on cache-bits02
13:38 hashar: cache-bits02 : cd /var/lib/git/operations/puppet sudo GIT_SSH=/var/lib/git/ssh git fetch origin refs/changes/04/13304/14 && git checkout -b 13304/14 FETCH_HEAD (aka deploying 13304 patchset 14
10:21 wm-bot: running 'git submodule foreach git gc --aggressive' in screen 2042
10:11 wm-bot: rebooting swapped -dbdump
10:10 wm-bot: Probably killed -dbdump by launching 4 instances of git gc --aggressive :-(((((((((((
10:09 wm-bot: hashar is going to kill us all
10:07 wm-bot: running massive git gc --aggressive on all extensions
09:54 wm-bot: Updating extensions to latest master: git submodule foreach git pull
09:54 wm-bot: hashar :P
09:53 hashar: Updating mediawiki core to latest master: Updating 0c1471c..d47c1e9 (fast forwarded)
09:49 hashar: On extensions: git submodule foreach git checkout master
09:13 hashar: deleting deployment-bastion, it lacks a DNS entry bug 38846
09:12 hashar: running git gc --aggressive in /home/wikipedia/common/php-master
08:48 hashar: making sure perms are correct in /data/project/apache/common-local/php-master : chmod -R g+w . ; chown mwdeploy:svn -R * .*
08:45 hashar: On -dbdump /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start bug 38995
08:35 hashar: running puppet on Apache32 / 33
08:12 hashar: dist-upgrading all instances to get the latest GlusterFS version (3.3.0)
08:11 hashar: Dist upgrading apache32 and 33 and rebooting them

August 2

18:06 andrewbogott: Migrated all instances to new hardware
15:14 hashar: yeah we lost udp2log again! -dbdump : /etc/init.d/udp2log-mw restart
07:58 hashar: bug 38748 deleting unused/corrupted deployment-wmsearch instance. (had stuff like: -bash: /usr/bin/groups: cannot execute binary file. Connection to deployment-wmsearch.pmtpa.wmflabs closed.)

July 30

15:00 hashar: deployment-bastion does not let us log in despite being a fresh instance. Logged as bug 38846
14:47 hashar: rebooting -bastion
12:48 hashar: Shutdowning deployment-nfs-memc for a while, will see if it is still needed around or if we can safely delete it (see bug 38084). All data should be on /data/project .
12:43 hashar: Recreating deployment-bastion using a Precise image and s1.small (1CPU, 1GB RAM, 80G storage)
12:40 hashar: Deleting -bastion , was corrupted.

July 28

03:38 labs-logs-bottie: j: only have one commons wgForeignFileRepos: wikimediacommons at commons.wikimedia.beta.wmflabs.org (/data/project/apache/common/wmf-config/filebackend-wmflabs.php)
01:09 Platonides: testwiki is now showing random captchas
01:09 Platonides: moved /mnt/upload6/captcha/random to /mnt/upload6/private/captcha/random
00:22 Platonides: generating random-challenge captchas at /mnt/upload6/captcha/random

July 27

23:01 Platonides: Running time python captcha.py --output /mnt/upload6/private/captcha --font /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf --count 500 --dirs 3 --key "$(grep -Po '(?<=wmgCaptchaSecret = (["'"'"'])).*(?=\1)' /data/project/apache/common-local/wmf-config/PrivateSettings.php)" --wordlist <( < /usr/share/dict/american-english tr '[A-Z]' '[a-z]' | grep -E '^.{4,5}$' | grep -vE '(.)\1$' | grep -vE '^(.)\1' | LANG=C gre
22:50 Platonides: Filesystem corruption signs in deployment-bastion, most debconf backends /usr/share/perl5/Debconf/FrontEnd are zeroed files. This explains some of the earlier apt-get problems, and maybe also bug 38747 (aka. magic tail -f fix)
22:49 Damianz: Thumbnails are currently broken - thumbs folder seems empty, images are there as expected - beta quirk or need rebuilding?
22:22 Damianz: DocumentRoot got wiped, broken all the sites - fixed the broken symlink on bastion for /data/project/apache to /usr/local/apache. Also fixed common-local to common symlink.
21:56 Platonides: apt-get removed wikimedia-task-appserver in deployment-bastion :(
21:42 Platonides: removed generated captcha files
21:33 Platonides: installed in deployment-bastion the packages joe, python-imaging and wamerican
20:29 labs-logs-bottie: j: deployment-bastion: removing all deployment-nfs-memc entries from /etc/fstab
13:14 hashar: restarted job runner on jobrunner06
13:11 hashar: removing all deployment-nfs-memc entries from /etc/fstab
09:51 hashar: bug 38749 jobrunner06 : removed /usr/local/apache and /mnt/upload6 empty dir. Downgraded PHP manually. Rerunning puppet.
08:46 hashar: migrating jobrunner06 to use the /data/project for uploads

July 26

19:27 labs-logs-bottie: j: delete broken deployment-video01 instance
18:42 labs-logs-bottie: j: new instance deployment-video02 with videoscaler with access to gluster instead of nfs
16:15 hashar: applying nfs::apache::labs on -dbdump to get /usr/local/apache from /data/project
16:12 hashar: -dbdump umounted deployment-nfs-memc:/mnt/export/apache on /usr/local/apache
13:56 hashar: archiving -nfs-memc:/mnt/export : root@deployment-nfs-memc:/mnt# mv export /data/project/deployment-nfs-memc_mnt-export_backup
13:46 hashar: seems to work fine with /data/project now.
13:46 hashar: rsync finished for both apache and upload6. Remounting and restarting apaches
13:35 hashar: rsync from nfs-memc:/mnt/export/upload6 to /data/project/upload6 completed. YEAHHH
13:32 hashar: on -dbdump, unmounted /mnt/upload (from nfs-memc). Please use the /mnt/upload6 -> /data/project/upload6
13:29 hashar: root@deployment-nfs-memc:/mnt/export# rsync -a --progress --delete --inplace /mnt/export/apache /data/project
13:22 hashar: applying nfs::upload::labs to -dbdump so it uses /data/project/upload6 at /mnt/upload6
13:15 hashar: manually umount /mnt/upload6 on apaches
13:14 hashar: stopping apache backends
13:10 hashar: Manually running puppet for 15545 which should fix bug 38084 uses /data/project instead of NFS instance
09:35 hashar: Regenerating captcha with the new shared key fixed bug 38699
09:26 hashar: Deleted all captchas in /mnt/upload6/private/captcha and regenerate them using 1000 and dirs=3
09:24 hashar: forgot to set a directory level. python php-master/extensions/ConfirmEdit/captcha.py --wordlist=/usr/share/dict/words --font=/usr/share/fonts/truetype/ttf-dejavu/DejaVuSansMono.ttf --key=******* --output=/mnt/upload/private/captcha --count=1000 --dirs=3
09:22 hashar: Set $wmgCaptchaSecret in the local file common/wmf-config/PrivateSettings.php and used that value in captcha.py
09:21 hashar: On dbdump, regenerating captcha using: python php-master/extensions/ConfirmEdit/captcha.py --wordlist=/usr/share/dict/words --font=/usr/share/fonts/truetype/ttf-dejavu/DejaVuSansMono.ttf --key=********* --output=/mnt/upload/private/captcha --count=1000

July 25

17:00 labs-logs-bottie: j: clear messages after updating localization(cache/l10n) to get new messages in TMH: php MWScript.php ../php/extensions/WikimediaMaintenance/clearMessageBlobs.php --wiki=aawiki
14:43 hashar: rebooting bits02
14:39 hashar: dist-upgrade on cache-bits02, will reboot after that (bits.beta.wmflabs.org will be disabled while it reboot)
14:24 hashar: fixed puppet on cache-bits02 : ln -s /var/lib/git/operations/puppet/modules /etc/puppet/modules . That was an empty directory, thus prevented puppet to find the modules and made it breaking when trying to install ntp::client
14:19 hashar: root@deployment-cache-bits02:/var/lib/git/operations/puppet(git:13304/10)# git fetch anonymous refs/changes/04/13304/12 && git checkout -b 13304/12 FETCH_HEAD
13:48 hashar: rebooting apache33 so it can fsck /dev/vdb
00:02 labs-logs-bottie: j: run mwscript rebuildLocalisationCache.php --wiki=commonswiki --threads=2

July 24

21:58 labs-logs-bottie: j: run mwscript rebuildLocalisationCache.php --wiki=aawiki --threads=2
18:25 hashar: Instances send their syslog again! To deployment-dbdump for now 14090
15:42 hashar: on deployment-integration, applied 15545 patchset 7 to test out the symlinks from /data/project/upload6 to /mnt/upload6 .
15:41 hashar: on deployment-integration, applied 15545 patchset 7 to te
15:32 hashar: rerunning rsync with --delete : root@deployment-nfs-memc:/mnt/export# rsync -a --progress --delete --inplace /mnt/export/upload6 /data/project
15:26 hashar: root@deployment-nfs-memc:/mnt/export# rsync -a --progress --inplace /mnt/export/upload6 /data/project
15:00 hashar: banned another /22 at squid level.
14:45 hashar: banned, at squid level, a crawler hosted on OVH. Just added the IP to squid.conf blacklist :)
13:42 hashar: Ran dist-upgrade on deployment-dbdump and rebooting. Will break udp2log loggers.
13:40 hashar: Rebooting all apaches
13:35 hashar: Running "apt-get dist-upgrade" on apache{32,33} to fix PHP5 using ubuntu packages instead of wmf packages. Upgrade kernel.

July 23

16:03 hashar: hopefully half fixed the udp2log on deployment-dbdump . Need several changes in the puppet files though cause the udp2log-mw init script seems to conflict with the udp2log one :/
13:41 hashar: rebooting -dbdump to make sure everything works fine :D
13:40 hashar: udp2log restored on beta!!! Still in /home/wikipedia/logs/ and logged by deployment-dbdump
13:11 hashar: applying role::logging::mediawiki to -dbdump (will bring log2udp)
09:13 hashar: updating MediaWiki extensions
09:11 hashar: updated mediawiki/core: Updating ef3132f..f8de6a7
09:10 hashar: updating core + extensions to their lastest master versions
09:09 hashar: Updated mediawiki-config Updating 96ba09e..66ca8b0

July 18

14:38 hashar: New / rebooted instances are no more accessible : bug 38473 - instances can not boot / reboot anymore
13:46 hashar: deleting upload01 (screwed somehow)
13:46 hashar: creating deployment-cache-upload03 to replace upload01
13:44 hashar: deployment-cache-upload01 seems screwed : waiting for metadata service at http://169.254.169.254/2009-04-04/meta-data/instance-id . Failed DHCP acquisition ? => rebooting
13:08 hashar: deployment-cache-upload01 : running apt-get upgrade / dist-upgrade and rebooting
10:22 hashar: copying apache dir to /data/project . Run as root@deployment-nfs-memc in a screen session
09:02 hashar: adding nfs::apache::labs and nfs::upload::labs to deployment-integration
08:59 hashar: Applying 15545 to deployment-integration
08:26 hashar: Created deployment-integration to be used as a puppetmaster::self host

July 17

21:26 Platonides: installed python-imaging and wamerican on deployment-dbdump
21:12 beta-logmsgbot: petrb: updating ArticleFeedbackv5 extension
19:51 hashar: 369 languages rebuilt out of 369
19:45 hashar: rebuilding l10n cache: mwscript rebuildLocalisationCache.php --wiki=aawiki --threads=2
19:39 hashar: beta broken by PAGEID magic word introduced with 0a7cf03 / I11d42ca7 9858
19:32 hashar: running git bisect of core 80fbb70..ef3132f
19:26 hashar: upgrading MediaWiki core 80fbb70..ef3132f
19:20 hashar: updated AFTv5: f97811f..d3bd97f
19:04 hashar: updated robots.txt to specify a user-agent. Will definitely prevents Google from killing beta :)
18:54 hashar: squid resumed. The swap files got corrupted somehow, needed to delete them entirely to start again. Squud storing again.
18:40 hashar: -squid bah doing rm -fR /data/project/squid1/*
18:39 hashar: installed `tree` on deployment-squid
18:38 hashar: removing swap files in /data/project/squid1
18:36 hashar: Squid is bugged as hell : 2012/07/17 18:36:13| Store rebuilding is -0.1% complete and looping
18:21 beta-logmsgbot: hashar: rebooting squid glusterfs gone wild apparently
14:45 hashar: Blacklisted user agents matchin /.*Googlebot.*/
13:45 hashar: Manually restarted apaches
13:44 hashar: Imported all.conf apache conf from production
13:29 hashar: err: /Stage[main]/Mediawiki::Sync/Exec[mw-sync]: Failed to call refresh: Command exceeded timeout at /etc/puppet/manifests/mediawiki.pp:24
13:28 hashar: All apaches are dead :/
09:26 hashar: Adding class role::applicationserver::jobrunner
09:20 hashar: sync upload6 dirs again. root@deployment-nfs-memc:$ rsync -a --progress --inplace /mnt/export/upload6 /data/project/upload6

July 16

19:36 hashar: rebuilding localisation cache
19:33 hashar: Updated ArticleFeedback and ArticleFeedbackv5 to latest master. Dropped their tables and ran the updater ( mwscript update.php --quick --wiki=enwiki ). Solves bug 38422 - trash and redo ArticleFeedbackv5 on beta enwiki. See http://en.wikipedia.beta.wmflabs.org/wiki/Special:ArticleFeedbackv5
15:55 hashar: Updated ArticleFeedbackv5 778f089..ccbc585

July 11

16:04 hashar: Created upload.beta.wmflabs.org to points to deployment-cache-upload01 ( 208.80.153.242 )
15:46 hashar: just logging the rsync command: root@deployment-nfs-memc:/data/project/upload6# rsync -a --progress --inplace /mnt/export/upload6 /data/project/upload6
15:41 hashar: started rsync of /mnt/export/upload6 to /data/project/upload6
14:55 hashar: deployment-cache-bits02 now serves bits.beta.wmflabs.org using pending gerrit changes 15445 and 13304 (using varnish configuration from production)
12:51 hashar: created deployment-cache-upload01 a Lucid instane to serve http://upload.beta.wmflabs.org/
12:49 hashar: deleting cache-upload2 , need a Lucid instance.
12:48 hashar: cache-upload2 set squid_coss_disks to vdb
12:44 hashar: applying role::cache::upload to deployment-cache-upload02
12:27 hashar: Creating deployment-cache-upload02 to replace deployment-cache-upload and serves http://upload.beta.wmflabs.org/
12:21 hashar: Deleting deployment-cache-bits which was corrupted and replaced it by deployment-cache-bits02
12:18 hashar: moved bits.beta.wmflabs.org from deployment-cache-bits to deployment-cache-bits02
12:09 hashar: Enabling puppetmaster::self on deployment-cache-bits02 to get Varnish config (13304)
09:55 hashar: Applying role::cache::bits::labs to deployment-cache-bits02
09:42 hashar: Manually started mw-job-runner on jobrunner06
09:41 hashar: pointed common/php from 'php-1.20wmf6' to 'php-master'
09:25 hashar: updated mediawiki-config on dbdump to latest version
08:55 hashar: add applicationserver::homeless to jobrunner06
08:16 hashar: Squid updated to uses apache 32 and 33. Deleted Apaches 30 & 31
08:01 hashar: Despooling apaches 30 and 31, spooling apaches 32 and 33

July 10

19:22 hashar: rebooting apache32 and apache33 (puppet run finished)
19:02 hashar: running puppets -tv on apache32 and apache33. Should make them able to serve Apache traffic after reboot.
17:32 hashar: creating apache32 and apache33 to replace the corrupted apache30 and apache31 instances

July 5

12:15 hashar: Did some documentation work on Deployment/Overview
11:20 hashar: added a bunch of spamers in /home/wikipedia/common/wmf-config/mwblocker.log which would block them

July 3

19:43 hashar: bug 38118 fixed http://deployment.wikimedia.beta.wmflabs.org/ (was missing the docroot, submitted to gerrit https://gerrit.wikimedia.org/r/14099
19:15 hashar: touched wmf-config/mwblocker.log
19:05 hashar: no machine send their syslog anywhere, some change got lost during the merge. See https://gerrit.wikimedia.org/r/14090
18:44 hashar: Successfully deleted instance deployment-deb, but failed to remove deployment-deb DNS entry. ohno
09:56 hashar: Asked reboot for deployment-transcoding through labsconsole

July 2

20:50 hashar: restarting squid to purge whole cache (yeah I know that is lame)
20:47 hashar: Removed Hydriz from deployment-prep. Messed up the whole dblist files :-D Contact me! ;)
20:46 hashar: set back robots.txt to disallow /
20:28 hashar: deployed a hack in mw config using https://gerrit.wikimedia.org/r/#/c/13932/ . That is a simply git fetch, pending review.
19:47 hashar: hacking to get files on bits.beta
10:19 hashar: Easy fix for the leap second bug: /etc/init.d/ntp stop; date `date +"%m%d%H%M%C%y.%S"`; /etc/init.d/ntp start
10:06 Hydriz: ukwiki was giving errors regarding flaggedrevs's flaggedpages table not existing. Fixed it by running mwscript update.php ukwiki.
07:50 hashar: Gave access to the cluster to Hydriz

July 1

17:43 hashar: manually rebooted most servers
10:00 hashar: rebooted apache31 due to leap second bug. Stopped mysql on apache30 which was using 100%CPU
06:59 hashar: rebooting all boxes

June 29

15:07 hashar: Removing thumbnails that not have been access for the last 15 days : sudo find . -atime +15 -wholename '*/thumb/*' -exec rm {} \;
15:02 hashar: deleted .nfs** files in /mnt/export/upload6/
08:58 hashar: restarted jobrunner service (had a wrong path pointing to common-backup)

June 28

16:07 hashar: apache conf uses site.conf :-(((( need to puppetize that one day
15:47 hashar: updating mediawiki-config to latest master
14:10 hashar: running puppet on apache{30,31}. The /etc/sudoers conflict has been merged in :)
11:03 hashar: migrated all wiki from php-trunk to php-master by editing wikiversions.dat. Refreshed wikiversions.cdb and renamed ExtensionMessages-trunk.php to ExtensionMessages-master.php
10:59 hashar: set group write on deployment-nfs-memc:/mnt/export/apache/common-local would let us rewrite the wikiversions.cdb file

June 27

10:40 hashar: made deployment-mc to use 'memcached' puppet class. Now uses 2000MB apparently
10:25 hashar: removed memcached from deployment-nfs-memc , it is running on deployment-mc nowadays.
10:21 hashar: rebooting deployment-mc for kernel upgrade
10:07 hashar: updating packages on deployment-cache-bits
09:42 hashar: deleted deployment-thumbproxy instance. We are not going to replicate the production thumbnailing architecture
09:16 hashar: -transcoding : dpkg --purge linux-image-2.6.32-37-virtual linux-image-2.6.32-318-ec2 linux-image-2.6.32-34-virtua
09:14 hashar: upgrading deployment-transcoding

June 26

20:16 hashar: deleted deployment-syslog instance. It is of no use till we have a way to setup syslog server on labs bug 36748 (syslog-ng conflict with rsyslog from base::??? puppet class)
20:13 hashar: Removed misc::mediawiki-logger from deployment-feed. Was replaced by some new udp2log system I can't understand. So for now, -feed is locally hacked and does not rely on puppet anymore.
19:50 hashar: deployment-feed removed wireshark then ran 'apt-get auto remove' , various X11 packages got removed. Now up to 262MB free.
19:46 hashar: deployment-feed removed some old kernels apt-get remove --purge linux-image-2.6.32-318-ec2 linux-image-2.6.32-342-ec2 linux-image-2.6.32-38-virtual linux-image-2.6.32-34-virtual
19:39 hashar: deployment-feed is now out of disk space :-(
19:17 hashar: Removed role::cache::bits from deployment-cache-bits. Only work in production.
18:32 hashar: Uninstalled the pecl PHP parsekit extension, manually installed php5-parsekit package instead bug 37076
11:06 hashar: Files migrated. A copy of the old common is in /usr/local/apache/common-back
10:12 hashar: migrating beta to use operations/mediawiki-config
08:29 hashar: restarted several time the job runner on jobrunner05. It eventually started working again :-(
07:50 hashar: restarted udp2log
07:48 hashar: killing python demux on deployment-feed

June 25

15:14 hashar: Deleted InitialiseSettingsDeploy.php (no longer used). Replaced by InitialiseSettings-wmflabs.php
15:00 hashar: updating Ubuntu on deployment-transcoding

June 23

13:02 labs-logs-bottie: petrb: deploying 37852

June 22

13:18 labs-logs-bottie: petrb: updated /usr/local/apache/common/wmf-config/InitialiseSettingsDeploy.php to match the feed I just made :)
11:17 hashar: Created /etc/wikimedia-realm file containing 'labs' on -dbdump, -apache30, -apache31 and -jobrunner05. Related puppet change is https://gerrit.wikimedia.org/r/#/c/12377/
10:07 hashar: bug 37116 removed deployment-nfs-memc cronjob /var/fs which did some nasty recursing file changes. Has been disabled since May 25th anyway.
09:08 hashar: Deleting hostname mobile.beta.wmflabs.org and releasing 208.80.153.244

June 21

15:36 hashar: Closed bug 37217 - thumbnail extraction for videos needs newer ffmpeg
15:36 hashar: Closed bug 37500 - migrates Apaches boxes to precise
15:02 hashar: updating MediaWiki to 80fbb70 (latest master)

June 20

13:46 hashar: apache-31 : readding applicationserver::labs and imagescaler::labs
13:40 hashar: upgrading packages on -squid
13:38 hashar: updating package on -dbdump

June 18

15:37 hashar: running apt-get upgrade on apache30 and apache31

June 16

19:27 beta-logmsgbot: hashar: updating WikimediaMaintenance to get commits 1887339 913bcb8

June 14

15:46 hashar: redeleting deployment-apache20

June 13

16:42 hashar: squid: depooling apache 20 - 24, pooling apache 30 & 31
16:37 hashar: Disabled CheckUser extension again

June 12

21:11 hashar Rebooting apache30 and 31 so they apply pending package updates. Off for today.
21:07 hashar Configuring apache30 and 31 to use applicationserver::labs and imagescaler::labs

June 4

22:04 hashar: Made myself a steward using a database query on `labswiki` : insert into user_groups VALUES (183,'steward');
21:57 hashar: Reenabled the CheckUser extension on beta labs so we can actually use the checkuser audit function ;-)

June 3

13:20 labs-logs-bottie: petrb: installing git on a bastion

June 2

15:36 hashar: We ran out of beer, see bug 37307
13:21 labs-logs-bottie: petrb: disabling checkuser per request from Ryan
10:20 labs-logs-bottie: root: setting up bastion
09:16 hashar: Rebooting -dbdump it could not mount some NFS export and waiting for user input.

June 1

16:37 hashar: Created deployment-deb instance to build packages :D

May 31

08:18 hashar: Squid answering again :-D
08:00 hashar: Rebooting -squid using nova web interface
07:53 beta-logmsgbot: hashar: restart failed, puppet dead, various squid related process in zombie mode --> rebooting deployment-squid
07:47 beta-logmsgbot: hashar: restarting squid, seems stalled
07:43 beta-logmsgbot: hashar: restarted udp2log daemon on -feed (15 to 20 python <defunct> processes there

May 30

19:26 hashar: jobrunner05 is happy again. Hurrah
19:23 hashar: updating mediawiki/core to master 58f390e to finish job loop fix
19:23 hashar: updatiing mediawiki/core to master
19:09 hashar: jobrunner05 CPU usage is due to some job infinite loop. Working on it.
18:35 hashar: Sara made ganglia available on Ubuntu Precise and hence jobrunner05 show up http://ganglia.wmflabs.org/latest/?c=deployment-prep&h=deployment-jobrunner05&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2
14:54 hashar: Updating mediawiki/core to 9780085 (aka just https://gerrit.wikimedia.org/r/#/c/9397/ which fix a wrong class name in job system)
14:39 hashar: Migrating apaches from imagescaler class to imagescaler::labs
14:30 beta-logmsgbot: hashar: migrating apache boxes from applicationserver::homeless to the new applicationserver::labs
09:17 hashar: Restarted update.php in a screen session
09:15 beta-logmsgbot: hashar: foreachwiki update.php --quiet --quick
08:54 beta-logmsgbot: hashar: updating extensions
08:53 beta-logmsgbot: hashar: HEAD is now at 8c65834 Add new message 'brackets' and use it to kill some hardcoded []s.
08:49 beta-logmsgbot: hashar: updating core to 8c65834
08:43 hashar: bug 37199 going to upgrade core / extensions to latest master

May 29

21:17 hashar: Fixed Amazon Elastic Cloud ban. Properly fixing bug 37173 hopefully
20:41 hashar: rebooting jobrunner05 following some package installs made earlier by puppet
20:39 hashar: manually running puppet on jobrunner-05

May 26

18:14 beta-logmsgbot: hashar: killed webtranscode job on commons
05:59 beta-logmsgbot: hashar: Edited squid.conf to limit memory to 1G and restarted squid
05:42 beta-logmsgbot: hashar: squid was killed by linux OOM!

May 25

15:18 beta-logmsgbot: hashar: deleted jobrunner06 (precise), we just need one precise instance which will be jobrunner05 for now
14:45 beta-logmsgbot: hashar: installing applicationserver::homeless and applicationserver::jobrunner on jobrunner06
14:42 beta-logmsgbot: hashar: installing applicationserver::homeless and applicationserver::jobrunner on jobrunner05
14:27 beta-logmsgbot: hashar: installed jobrunner05 and 06 using Ubuntu precise. Should let get a 0.27 ffmpeg installation for bug 37043
08:38 beta-logmsgbot: root: on dbdump, deleted /etc/logrotate.d/mw-udp2log . Most probably in conflict with the one from deployment-feed which host the udp2log process
08:35 beta-logmsgbot: root: gzipped /home/wikipedia/logs/archive/*20120525 see bug 37012 :-(
08:23 beta-logmsgbot: hashar: killed stuck jobs on jobrunner 02 and 03. Restarted loop.

May 24

11:52 beta-logmsgbot: hashar: Rewrote log command to use dologmsg and the new beta-logmsgbot
11:51 beta-logmsgbot: hashar: yeah I do log
11:47 hashar: Moving /bin/log to /usr/local/bin/log
11:45 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit` on deployment-dbdump
09:43 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit` on deployment-dbdump
09:42 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit`
07:05 hashar: killed some stalled jobs on jobrunner02
02:56 labs-logs-bottie: jeremyb: foo

May 23

19:54 hashar: rebooting apache20 following installation of imagescaler puppet class
19:49 hashar: rebooting apache23 following installation of imagescaler puppet class
19:46 hashar: rebooting apache22 following installation of imagescaler puppet class
19:15 hashar: rebooting apache21 following installation of imagescaler puppet class
18:48 hashar: Adding puppet class 'imagescaler' on all deployment-apacheXX instances in an attempt to fix thumbnails
18:36 labs-logs-bottie: hashar: relocalisation cache done 367/367 languages rebuilt
18:28 labs-logs-bottie: hashar: running `mwscript rebuildLocalisationCache.php --wiki=aawiki` for bug 36806
16:22 labs-logs-bottie: hashar: delete all 3 webVideoTranscode jobs from enwiki database
16:10 hashar: rebooted jobrunner03 to check everything works fine there
15:41 hashar: deleting jobrunner01, it is crashed beyond repair. Will create a new one named jobrunner03
15:27 hashar: rebooting jobrunner01 to see how it goes
14:56 hashar: stopped job runner on jobrunner01, amounted /mnt/upload6 and /mnt/
14:37 hashar: running puppet on job runner to check change 8584 & 8585 worked

May 22

13:49 hashar: Deleting jobrunner03 and 04, not going to need them afterall
13:24 hashar: deleting refreshLinks2 jobs from enwiki database
12:52 hashar: deleting deployment-jobrunner{3,4} installation failed I got permission denied. Will recreate them using same hostname
11:55 hashar: create two more job runner instances
10:09 hashar: Remove deployment-webs instance which was meant to emulate the HTTPS access. Hacky and low priority for now, we will need to setup a nginx proxy one day to properly replicate the production infrastructure.
09:39 labs-logs-bottie: hashar: rebooting jobrunner02 just to be sure it is properly loaded up
09:30 labs-logs-bottie: hashar: jobrunner logs are available in /home/wikipedia/logs/runJobs.log now
09:25 hashar: Fixed udp2log not able to add new log files in /home/wikipedia/log , that dir need to be writable by udp2log user! See https://gerrit.wikimedia.org/r/8442 | https://bugzilla.wikimedia.org/37014
08:54 hashar: purged all logs from /home/wikipedia/logs/archive/ just to be safe
08:41 hashar: restarted upd2log on -feed (again)
08:23 hashar: started job loop on deployment-job-runner02
07:49 hashar: installing jobrunner2
05:11 hashar: creating a second job runner instance deployment-jobrunner02 . Will apply puppet classes later on.
03:47 labs-logs-bottie: hashar: (Bug 36870) deleting deployment-web{,3,4,5}

May 21

21:11 hashar: deployment-nfs-memc : fix user right for upload6 : chown apache /mnt/export/upload6
21:11 hashar: On deployment-nfs-memc : added apache (uid 48) entry in /etc/passwd
21:11 hashar: Adding Faidon and Platonides to default CC list of "depoyment prep (beta)" component
21:11 hashar: In Bugzilla, I have removed Petr Bena as a default assignee of bugs opened for "deployment-prep (beta)" component. Default is now "Nobody", Petr is on CC. That will makes bug triage a bit easier.
08:58 hashar: rerebooting deployment-feed
08:53 hashar: Looks like -feed is dead : EXT3-fs: INFO: recovery required on readonly filesystem.
08:49 hashar: rebooting deployment-feed
08:22 hashar: installing iotop on deployment-nfs-memc

May 20

02:02 hashar: on -nfs-memc, running 'chown -R 48 /mnt/export/upload6' so file get owned by user apache on apaches and job runner boxes
01:05 hashar: might have disabled IRC notification by setting wgRC2UDPAddress in InitialiseSettingsDeploy.php
00:46 labs-logs-bottie: hashar: jobrunner01 seems to start catching up with jobs
00:42 labs-logs-bottie: hashar: Created a dumb `aawiki` database

May 19

22:33 Platonides: Password changed for Platonides
22:33 Platonides: http://ee-prototype.wikipedia.beta.wmflabs.org/ fails with No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php.
11:46 labs-logs-bottie: petrb: creating aa wiki

May 18

17:39 labs-logs-bottie: j: add /apache symlin on deployment-transcoding
16:52 hashar: added 'aawiki' to all.dblist and made a symbolic to it named wmflabs.dblist
16:42 hashar: added fake 'aawiki' entry to wikiversions.data
16:37 hashar: started mediawiki job runner on -jobrunner01
16:27 hashar: Removed apache:: puppet class, uses the application:: ones instead
15:24 hashar: Well MaxSem fixed mobile Frontend :-D
15:22 hashar: rewinded MobileFrontend to before 9db8dc94b1b83999931fca3d0edf5e22ab1effb3 ( https://gerrit.wikimedia.org/r/#/c/7795/ )
15:05 hashar: Running: foreachwiki update.php --quiet --quick
15:00 labs-logs-bottie: hashar: rebooting jobrunner01
14:55 hashar: updating all extensions
14:48 hashar: /home/wikipedia/common/php-trunk now tracks mediawiki/core.git , branch master. So a simple 'git pull' will update it!
14:47 hashar: updating MediaWiki
14:31 hashar: adding puppet class applicationserver::jobrunner
11:51 hashar: puppet running again on -syslog \o/
10:55 hashar: ran apt-get clean on -syslog
10:46 hashar: On -feed, ran apt-get clean
10:42 labs-logs-bottie: hashar: update.php script ran on all wikis
10:23 hashar: Seems like mwmultiversion is back in function again :-]
10:23 hashar: Running 'foreachwiki update.php --quick'
10:23 hashar: updated enwiki database using 'mwscript update.php enwiki --quick'

May 17

10:29 labs-logs-bottie: hashar: afterall, made /mnt/export/upload6 to be world writable "sudo chmod -R 777 *"
10:20 labs-logs-bottie: hashar: Fixed filerepo backend by using chown -R www-data:depops /mnt/export/upload6/wikibooks
10:15 labs-logs-bottie: hashar: Fixed cluster which was giving blank page. Root cause was wmfUdp2logDest which must be <IP address:port> (aka not use a hostname
09:43 labs-logs-bottie: hashar: removed Draft extension for now.

May 16

16:46 hashar: cleaned up more of CommonSettings.php today. Moved some hacks to disable features as settings in InitialiseSettingsDeploy.php . See git log.
04:55 hashar: bug 36871 - deleting bz-dev instance

May 15

21:13 labs-logs-bottie: hashar: Managed to get wikiversions.cdb to be rebuild using /home/wikipedia/common/multiversion/refreshWikiversionsCDB
20:59 labs-logs-bottie: hashar: Cloning 1.20wmf2 and 1.20wmf3 in independant repos just like in production
20:58 labs-logs-bottie: hashar: opened several bugs, prepared for MWMultiversion
20:36 labs-logs-bottie: hashar: Insatlled multiversion using svn checkout https://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/mwmultiversion/multiversion
20:13 hashar: manually created /home/wikipedia/logs/archive from deployment-feed (pending https://gerrit.wikimedia.org/r/7746 )
19:48 labs-logs-bottie: hashar: restarted udp2log on deployment-feed, lot of zombie python processes there
12:22 hashar: replaced most occurrences of /mnt/upload to /mnt/upload6
09:38 labs-logs-bottie: hashar: Applying apache::service to dbdump
08:53 labs-logs-bottie: petrb: updating to head
07:56 hashar: Deleted all of deployment-nfs-memc:/mnt/export/upload-back , it contained only thumbs
07:51 hashar: cleaning out deployment-nfs-memc:/mnt/export/upload-back from thumb, lock dirs and related

May 14

19:30 labs-logs-bottie: hashar: Fixed X-Forwarded-For IP not being recognized
19:30 labs-logs-bottie: hashar: fo
19:30 hashar: Fixed IP :-D

May 11

19:13 hashar: Removed OnlineStatusBar extension. It is not in Gerrit / WMF
19:02 hashar: Removing misc::mediawiki-logger from dbdump, it is on 'feed'
18:35 hashar: restarting udp2-log on dbdump
18:27 hashar: deleting symlinks in /home/wikipedia to /data/project : breaks logging�
18:21 hashar: Replaced extensions with a fresh clone of mediawiki/extensions.git
17:33 hashar: restarted squid several time to fix some minor typos in conf
16:39 hashar: cloning mediawiki/extensions.git which has all extensions as submodules
16:37 hashar: updated MediaWiki up to 05e656a (aka master)
09:09 labs-logs-bottie: petrb: fixed nrpe on boxes where it was failing, we need to insert motd to puppet
04:37 jeremyb: [~2 hrs ago] 02:43:33 < hashar_> !deployment-prep setting up "apache20" instance by using only puppet. We will see what happens :-D
02:28 hashar_: deleted all remaining deployment-apache instances : : You don't have enough free space in /var/cache/apt/archives/. So we really want to use m1.large , not m1.tiny pretending to save disk space :-D
01:28 hashar_: Created upload2.beta.wmflabs.org to be the entry point for the "new" thumbnailing infrastructure
01:24 hashar_: moving upload.beta.wmflabs.org from the non working instances back to the main entry point

May 10

21:46 hashar: Creating a syslog server instance. I have a VERY nasty conflict between misc::syslog-server and misc::mediawiki-logger which tries to install conflicting packages ( syslog-ng / rsyslog )
20:36 Krinkle: fixing a few php notices and general logic problems in wmf-config
20:24 hashar: running 'apt-get install --reinstall apache2.2-common' to attempt to fix /var/log/apache2 rights (root:arm)
20:24 hashar: deployment-imagescaler01 apache does not log anymore :-(
10:09 labs-logs-bottie: petrb: fixed teh missing NOT FOUND error page

May 9

18:56 Ryan_Lane: added hostnames for associated IPs
18:56 Ryan_Lane: allocated three more IPs for upload, bits, and mobile
18:18 hashar: Made logs from wgCommandLine script to be redirected to /home/wikipedia/logs/cli.log instead of /home/wikipedia/logs/catchall.log

May 8

01:49 Ryan_Lane: rebooting deployment-squid
01:43 Ryan_Lane: restarting squid on deployment-squid

May 4

10:48 mutante: added class nfs::home::wikipedia to puppet group list in "beta-labs"
10:46 mutante: added myself to admin groups to add/change puppet groups
10:36 hashar: Creating 5 new m1.large instances hosting apaches and named deployment-apacheXXX

May 3

18:24 hashar: installed on dbdump misc::syslog-server
15:53 labs-logs-bottie: hashar: adding misc::mediawiki-logger and misc::scripts classes to deployment-dbdump
15:29 labs-logs-bottie: hashar: running puppet on apaches to have them send their syslog to deployment-dbdump (bug 36246)
11:13 labs-logs-bottie: petrb: removing logrotate from all apaches it broke central log
06:43 labs-logs-bottie: hashar: hashar: bug 36441, added ErrorDocument 404
06:02 jeremyb: [deployment-prep, deployment-nfs-memc] ran `for u in catrope hashar jeremyb johnduhart krinkle mah petrb platonides werdna; do sudo usermod -a -G depops $u; done`; krinkle was unable to modify files in wmf-config and I thought i saw why he couldn't but couldn't see why I could. turned out the groups on nfs-memc were the important ones and I was there. synced the 2 boxes with eachother and added krinkle to the list. some other deployment-prep boxes have different depops groups. (one empty with a different gid than the rest. one is same gid but just has petrb)
05:03 Krinkle: [deployment-dbdump] apt-get purged 'ack'; - On ubuntu ack is "ack-grep" which was already installed
04:59 Krinkle: [deployment-dbdump] apt-get installed 'ack'
04:48 jeremyb: [deployment-dbdump] (that was to address complaints about beta simplewiki appearing in #simple.wikipedia on irc.wikimedia.org)
04:47 jeremyb: [deployment-dbdump] changed all refs to IPs of prod hosts nfs-home and ekrem to be deployment-feed instead. and commited that to the local repo. (again not pushed anywhere yet)
04:44 jeremyb: [deployment-dbdump] did a checkpoint `git commit -a` on deploymentprep-conf (/usr/local/apache/common) (locally not pushed anywhere) because there were lots of changes on disk but not in the repo. but didn't add any new files to the repo. (so there's still stuff reported uncommited by `git status`)

May 2

15:14 labs-logs-bottie: petrb: making syslog on apaches be /data/project/apaches_log
11:52 labs-logs-bottie: petrb: changed sudo policies on web to test if puppet override it
11:52 labs-logs-bottie: j: add transcoding settings to CommonSettings.php again
11:46 labs-logs-bottie: petrb: purged stuff on transcoding and freed some 119468kb
11:08 labs-logs-bottie: hashar: install dsh package on deployment-dbdump
09:54 labs-logs-bottie: hashar: /usr/local/apache/conf is now an independant git repository

April 30

13:31 labs-logs-bottie: petrb: rebooting web5
13:27 labs-logs-bottie: petrb: web4 reboot for same reason
13:26 labs-logs-bottie: petrb: same for web
13:24 labs-logs-bottie: petrb: rebooting web3 broken /data/project/
11:57 labs-logs-bottie: petrb: fixed transcoding
11:28 j^: reboot deployment-transcoding(i-00000105)

April 25

17:01 labs-logs-bottie: j: Changing uid and group of apache user from 48 to 33 to match www-data on web3,web4,web5
16:44 Platonides: With the uid change to deployment-web, it is now writing into /data/project/errors.log
16:42 hashar: Apaches no more log anything. This is because rsyslog sends logs to a blackhole :-D
16:27 Platonides: Changing uid and group of apache user from 48 to 33 to match www-data
15:20 hashar: deleting testswarmmysqlconf , it is of no use :-/
14:43 hashar: Creating temporary instance to test a MySQL puppet snippet
13:01 labs-logs-bottie: hashar: I am a hero
13:00 labs-logs-bottie: petrb: test
12:24 hashar: added a basic puppet skeleton in manifests/labs/beta/ with https://gerrit.wikimedia.org/r/5790 (test branch)
11:39 hashar: manually purged debian package `ack`, installed `ack-grep`
11:22 hashar: puppet finished migration of web{3,4,5} to apache::service
11:19 hashar: changed squid visible name to squid001.beta.wmflabs.org
11:02 hashar: migrate web{3,4,5} from webserver::php5 to apaches::service
10:47 hashar: Cleaning out squid peers list
10:11 hashar: made /etc/ a git repository on deployment-squid and committed existing /etc/squid/
10:03 hashar: adding generic::packages::git-core on deployment-squid so we can track

April 24

23:21 labs-logs-bottie: petrb: disabled $wmgEnableCaptcha
18:54 hashar: wrap -dbdump motd to 80 chars
15:42 hashar: deployement-web host does work! :-]
15:41 hashar: made deployement-web host a wikimedia-task-appserver , add to create some apache2 configuration placeholder. Apache2 does launch but it is not working though (timeout)
15:40 petan|wk: I told hashar to log stuff, if he won't, slap him
13:18 hashar: added apache::service on deployment-web host
12:18 labs-logs-bottie: petrb: moved the log file storage to gluster
12:18 labs-logs-bottie: petrb: updated git and commited all changes

March 21

03:22 mutante: mysqld on deployment-sql is stopped - did not start it though after i heard petan is working on corrupted db's
03:06 mutante: added myself as a member just to see the instance names and check for the sql server...

March 20

16:16 labs-logs-bottie: petrb: it seems that corruption of db is worse than I expected, need to restore backup old few months
16:12 labs-logs-bottie: petrb: mysql is back up
15:41 labs-logs-bottie: petrb: getting sql server down I found a bunch of corrupted db's, rollback is necessary
15:41 labs-logs-bottie: j: install php-pear on deployment-web3/4/5 required by TMH

March 19

08:18 labs-logs-bottie: root: restoring sql tables from backup

March 15

20:17 labs-logs-bottie: root: restored ok
20:14 labs-logs-bottie: root: restoring database from backup
10:16 labs-logs-bottie: petrb: failed auth on db server reboot was required
09:53 labs-logs-bottie: petrb: scheduling auto replication of sql server

March 14

14:37 labs-logs-bottie: petrb: switching en.wikipedia to older previous v

March 11

22:14 Damianz: Increased nofile on deployment-squid and added max_filedesc option to squid config. Also installed squidclient.
04:36 Ryan_Lane: also deployment-web3
04:35 Ryan_Lane: also deployment-web
04:34 Ryan_Lane: make that deployment-web5
04:34 Ryan_Lane: rebooting deployment-web, it OOM'd

March 9

14:12 labs-logs-bottie: petrb: rebooting -nfs
13:52 labs-logs-bottie: root: updated apt on webs1
13:34 j^: add ppa:j/timedmediahandler and install ffmpeg on web3 and web5

March 6

22:21 labs-logs-bottie: petrb: some instances will need to reboot, however site seems to be ok now
22:09 labs-sexy-bottie: petrb: updating svn
22:07 labs-sexy-bottie: petrb: fixed squid a bit
15:23 labs-sexy-bottie: petrb: test
14:53 labs-sexy-bottie: root: disabling bot for a while
03:11 Andrew: facepalm: apparently all reboots are failing, so this will be down until Ryan brings it all back up tomorrow
02:57 Andrew: rebooting a few hosts, there is something seriously wrong with fetching resources at the moment

March 5

15:48 labs-sexy-bottie: petrb: temporary disabled ssl server
15:48 labs-sexy-bottie: petrb: reconfigured squid
15:36 labs-sexy-bottie: petrb: restarted servers
15:24 labs-sexy-bottie: petrb: temporary changed code of localsettings to debug site
15:20 labs-sexy-bottie: petrb: fixed broken memc :o
15:04 labs-sexy-bottie: petrb: inserted new wiki to sul
15:01 labs-sexy-bottie: petrb: please ignore some of the previous lines in log we were just testing bot
14:58 labs-sexy-bottie: petrb: updated live
14:58 labs-sexy-bottie: petrb: meh
14:36 labs-sexy-bottie: petrb: created a new log system, just type log message to log your change on prep
14:35 labs-sexy-bottie: petrb: this is test :o

March 4

12:00 Andrew: Finished deployment of het deploy, added a new ee.

March 1

16:35 Platonides: Installed dpkg-dev on deployment-dbdump
16:03 Platonides: Installed joe on deployment-dbdump
16:03 petan|wk: platonides needs to check the project name

February 27

08:20 petan: creating 2 more web servers to handle load
08:19 petan: rebooting both web servers, starting with web1

February 23

08:52 petan|wk: fixing the squid

February 22

01:03 Ryan_Lane: reconfiguring the web server instances to remove puppet classes that no longer exist

February 17

02:34 Andrew: Moving /usr/local/apache/common/live to /usr/local/apache/common/live-hom and symlinking live to live-hom
01:56 Andrew: running afl_rev_id patch on all wikis
01:52 Andrew: installing ack (source code search tool) on dbdump
00:59 petan: if anything is broken, it was me
00:51 petan: I broke it!
00:27 petan: switched to HEAD

February 16

10:11 j^: add video/webm to /etc/mime.types on web/webs1/web2

February 13

10:08 petan|wk: removing the puppetized memcached
08:54 petan|wk: removing some extensions from config which are missing in latest branch

January 30

18:46 petan: configuring some boxes for cluster to handle high load

January 29

01:13 hexmode: oom reboot -web

January 27

13:30 j^: install upstart script /etc/init/timedmediahandler.conf on deployment-transcoding and start service
13:05 j^: touch /etc/wikimedia-image-scaler on deployment-transcoding; transcoding needs more wgMaxShellMemory too
12:47 Platonides: updated /usr/local/apache/common/live/extensions/TimedMediaHandler to r110117 per j^request
10:11 j^: add-apt-repository ppa:j/timedmediahandler and update ffmpeg on deployment-web to support frame extraction from WebM videos
06:35 j^: update ffmpeg on deployment-transcoding (new security release from ppa)

January 26

00:00 petan: configured new firewall rule irc

January 25

23:52 petan: linked /usr/local/apache/common-local to /usr/local/apache/common
23:06 petan: updating svn
22:28 petan: reverted unlogged changes made to config which broke whole site
10:03 j^: installed ffmpeg on deployment-web (required by TMH to extract stills)

January 24

20:52 petan: created db user oren and new database for temporary wiki
13:53 petan|wk: reconfigured new instance and fixed some issues on puppet, no logs in sal regarding it
00:51 hexmode: svn up * updatedata

January 23

19:31 hexmode: restart memcache on nfs-memc
19:06 hexmode: aptitude update deployment-web

January 22

09:16 petan: configured nfs to listen for backup server
01:01 petan: configured firewall for backup instance
00:45 petan: creating a backup instance in -prepbackup project for online backup of mysql from deployment project + fs backup
00:30 petan: updating /live to head
00:21 petan: installed timedmediahandler (trunk) to commons

January 16

23:41 hexmode: to solve the trusted XFF problem, I installed tinycdb and created an 0 length file in the right place
20:35 Ryan_Lane: released unused IP address from project

January 15

15:45 petan: ran live/extensions/TrustedXFF/generate.php
11:20 petan: updated to latest head all wikis

January 14

18:49 petan: enabled global blocking
18:21 johnduhart: Removed myself from the project.
14:32 petan: separated common to own deployment file
14:30 hexmode: enabled webfonts for mywiki properly in IntialiseSettingsDeploy.php
14:25 johnduhart: Updated wmf-config/InitialiseSettings.php from production
14:25 johnduhart: Reverted change to wmf-config/InitialiseSettings.php
14:24 hexmode: enabled webfonts for mywiki

January 13

21:34 petan: assigning new dns
21:33 petan: moved deployment to beta.wmf...
13:07 petan|w: installed jdk on search

January 12

18:49 petan: installed all requested sw on search
18:46 petan: mounted conf files
18:38 petan: installed updates on new instances and rebooting it
06:48 Ryan_Lane: added nfs mounts to the fstab for deployment-web
06:47 Ryan_Lane: remounted /mnt/upload on deployment-web as nfs rather than nfs4
06:47 Ryan_Lane: modified export options on deployment-nfs-memc; removed nfs4 specific options, and removed other options not necessary for our environment.
00:38 johnduhart: Unmounted /mnt/export from /tmp on -web

January 11

23:02 hexmode: svn up live
03:19 johnduhart: Added production commons to ForeignFileRepos http://en.wikipedia.deployment.wmflabs.org/wiki/File:BradPittBAR08.jpg
02:28 johnduhart: Moved config to http://config.deployment.wmflabs.org/viewfile.php?file=CommonSettings.php
01:44 johnduhart: updated databases and interwiki
01:41 johnduhart: setup flagged revisions

January 10

23:06 petan: checks done
23:04 petan: disabled sql for fs checks
22:30 petan: created nfs:/mnt/export/backup use it for all files which aren't versioned
21:44 petan: deleting big db's expect db lags ^^
21:35 petan: maintenance on simple
21:17 petan: created squid box
19:05 petan: tweaked memcached and flushed cache
16:48 johnduhart: installing wikimedia-task-appserver on -web
16:29 johnduhart: live and extensions recheckedout into a new folder
16:04 johnduhart: Site broken, currently recreating live folder
12:27 johnduhart: Running updatedata (very slowly)
12:19 johnduhart: Fixed wmf-config permissions on -nfs-memc
05:32 johnduhart: thumbnails now working
05:30 johnduhart: Installed imagemagick on web
05:11 johnduhart: Adding apache config for upload.deployment.wmflabs.org
05:11 johnduhart: Made a quick stab at upload config
05:00 johnduhart: Mounted that export onto -web
05:00 johnduhart: Created nfs export /mnt/upload on deployment-nfs-memc
04:37 johnduhart: unmounted deployment-nfs-memc:/mnt/export on /mnt from deployment-web
03:32 johnduhart: Last update solves an issue where CentralAuth would make 70+ queries per page
03:31 johnduhart: Updated databases
03:26 johnduhart: svn up'd live
03:04 johnduhart: interwiki now works

January 9

21:29 johnduhart: Updated databases
21:29 johnduhart: Created test.wikimedia
21:25 petan: reconfigured global permissions
20:38 petan: created hi wiki + de wiki
20:00 petan: creating commons, de wiki, en_wiktionary etc. etc...
18:57 petan: configured ip back to -web and removed temporary ip
18:53 petan: disasociated 208.80.153.219
18:50 petan: turned off -test definitely and reconfiguring ip
18:47 petan: moved all stuff -web reconfiguring IP
18:18 petan: moving configuration of apache to web
16:40 petan|work: reconfigured apache on test
16:15 johnduhart: svn up /usr/local/apache/common/live
15:59 johnduhart: Ran update on metawiki
15:58 johnduhart: Recreated metawiki enwiki enwikibooks
15:57 johnduhart: Imported centralauth
15:55 johnduhart: Dropped all wikis except simplewiki
15:55 johnduhart:
15:53 petan|work: restarted memcached
15:24 johnduhart: Created enwiki and enwikibooks
15:20 johnduhart: Restarted memcached
15:17 johnduhart: Created simplewiki
15:04 petan|work: restarted memcached
15:02 johnduhart: Ran update.php on metawiki
14:59 johnduhart: Creating metawiki
14:55 petan|work: disabled current site
14:55 johnduhart: Creating centralauth db
14:52 johnduhart: DROPing new configuration tables, will recreate
14:38 johnduhart: Forget the metawiki dump
14:09 petan|work: created backup of broken db of meta and replaced it with auth db
13:52 petan|work: test is done, restored test SUL to previous state
13:43 petan|work: created backup of central auth and replace the testing SUL with current data, merged with current SUL so that we can use same logins on all sites
13:19 petan|work: updated svn
12:44 johnduhart: Ran update.php on metawiki
12:32 johnduhart: Created enwikibooks http://en.wikibooks.deployment.wmflabs.org/wiki/Main_Page
08:12 johnduhart: Starting import of metawiki
07:37 johnduhart: This only affects my new configuration though
07:36 johnduhart: WARNING: some how new users are showing up on some feed outside of labs, and was picked up on a monitoring robot. wtf.
07:15 johnduhart: Importing simplewikibooks
07:10 johnduhart: SiteMatrix is now working http://meta.wikimedia.deployment.wmflabs.org/wiki/Special:SiteMatrix
07:09 johnduhart: Created simple wikibooks http://simple.wikibooks.deployment.wmflabs.org/
07:09 johnduhart: Adding wikibooks configuration
06:31 johnduhart: Installing git on deployment-test
06:23 johnduhart: Created simplewiki http://simple.wikipedia.deployment.wmflabs.org/wiki/Main_Page
05:50 johnduhart: Downloaded metawiki dump ont dbdump, extracting now. poor labs.
05:36 johnduhart: Recreated hwiki, had a bad JohnTest account. Central auth now works fully
05:15 johnduhart: Running update.php on hiwiki
05:14 johnduhart: Importing prefstats table to hiwiki
04:41 johnduhart: Centralauth is working :)
04:28 johnduhart: Imported db schema to centralauth
04:25 johnduhart: Created centralauth database centralauth
04:18 johnduhart: Created metawiki http://meta.wikimedia.deployment.wmflabs.org/
04:18 johnduhart: http://meta.wikimedia.deployment.wmflabs.org/
04:09 johnduhart: Created meta docroot and added favicons to meta and wp
04:08 johnduhart: Adding remnant.conf to httpd.conf
04:07 johnduhart: Adding metawiki apache config to /usr/local/apache/conf/remnant.conf
04:03 johnduhart: Lot of configuration done, much tuning to come. Missing.php now works http://nope.wikipedia.deployment.wmflabs.org/wiki/Main_Page
01:24 johnduhart: Installed mysql-client on deployment-test

January 8

23:45 johnduhart: Installed php5-cli on deployment-test
23:21 johnduhart: Enabled rewrite rules on deployment-test
20:39 petan: created new group for /mnt/www
19:33 petan: clearing cache
18:47 hexmode: set up relative paths for config files
16:45 petan: moved memcached to deployment-nfs-memc and update global
14:04 petan: created temporary memcached instance for data
12:17 petan: restarted apache to fix some problem with config
00:11 hexmode: set config for timeline
00:11 hexmode: "apt-get install ploticus ttf-freefont" for timeline
00:04 jeremyb: booting memcached fixed it
00:02 jeremyb: booting memcache in case that fixes interwiki
00:02 jeremyb: [auth] added bugzilla to interwiki.sql and ran it, doesn't seem to be working

January 7

23:55 hexmode: Fix path for PoolCounter: PoolCounter.php -> PoolCounterClient.php
23:52 hexmode: Fix path for OAI: OAI.php -> OAIRepo.php
23:31 hexmode: set up git for /var/www/global
23:08 petan: disabled LQT on other wikis
23:07 petan: changed configuration of ajax
23:07 petan: fixed InitialiseSetting
22:42 petan: updated all wikis to latest head
22:36 petan: disabled LQT because it's broken
22:34 petan: ran update on auth db
22:10 petan: removed puppetized memcached, because its configuration suck

January 5

20:26 petan: importing MW ns full history to en_wikipedia
16:48 petan|w: created instance for dumps, current instance is overloaded
15:53 petan|w: import now running using mwimport

January 4

20:27 petan: opened port 80 for wide net
20:26 petan: registered deployment.wmflabs.org
20:24 petan: allocated ip208.80.153.215
20:22 mutante: raised floating IP quota to 1
16:10 petan|work: create instances for apache and mysql
16:08 petan|work: configured firewall for webserver
16:06 mutante: added members MarkAHershberger & Petrb - added them to sysadmin and netadmin roles
16:02 mutante: added new project deployment-prep for hexmode and petan