Nova Resource:Deployment-prep/SAL/Archive 1

From Wikitech

December 11

  • 10:44 hashar: deleted deployment-search01 and deployment-searchidx01 , Beta cluster has been migrated to ElasticSearch over the summer.

December 10

  • 23:18 bd808: Freed 4.2G on deployment-jobrunner08.pmtpa.wmflab by deleting files in /tmp
  • 23:17 bd808: deployment-jobrunner08.pmtpa.wmflabs is out of disk on /
  • 21:11 hashar: used git fetch && git reset --hard on Flow extension. Just to be sure

December 9

  • 09:52 hashar: added dan-nl so he can look at MediaWiki log files when playing with glamtoolset
  • 09:44 hashar: deleted old jobs from commonswiki job queue (up to timestamp 20130315031930)

December 8

  • 20:55 bd808: chmod -R a+w deployment-bastion:/data/project/upload7/private/gwtoolset

December 6

  • 22:22 hashar: rebooting deployment-cache-text1 (aka text varnish)
  • 22:11 hashar: upgrading packages on text cache / running puppet and rebooting it.
  • 21:15 MaxSem: Debugging apache on deployment-apache33, may look hung
  • 11:24 hashar: upgrading varnish on deployment-parsoidcache
  • 11:18 hashar: made sure puppet agent is enabled on varnish caches and reran it manually
  • 11:13 hashar: shut downing deployment-search01 and deployment-searchidx01. They were for Lucene search. We use Elastic search now.
  • 11:12 hashar: upgrading varnish on deployment-cache-mobile01
  • 11:12 hashar: upgrading varnish on deployment-cache-bits03
  • 11:09 hashar: upgrading varnish on deployment-cache-text1

December 3

  • 22:42 hashar: upgrading packages on deployment-cache-bits03 && rebooting
  • 22:36 hashar: rebooting deployment-cache-mobile01
  • 22:36 hashar: Parsoid got broken since last Friday (serving pages from production …) bug 57926
  • 22:34 hashar: rerunning puppet continuously on deployment-cache-mobile01 + apt-get upgrade of varnish

November 22

  • 19:48 hashar: mwscript update.php --wiki=labswiki --quick (for OAuth database updates
  • 19:47 hashar: manually fixed some permissions rights that prevented automatic deployed of mediawiki-config.it Been broken since rougly Nov 21st at 7pm UTC.

November 19

  • 16:17 hashar: applying role::ci::slave::labs::common class on deployment-parsoid2
  • 16:07 manybubbles: rebuilding elasticsearch indexes to suck up configuration changes
  • 16:03 manybubbles: running puppet on elasticsearch machines and restarting elasticsearch to suck up new configuration

November 18

  • 11:20 hashar: Cleaned out Parsoid submodule: sudo su - mwdeploy then cd /home/wikipedia/common/php-master/extensions/Parsoid && git reset --hard origin/master && cd .. && git submodule update --init Parsoid

November 15

  • 18:41 manybubbles: rebuilding Cirrus search indexes to have the 2 replicas like production
  • 14:51 manybubbles: rebuilding search indexes using jobs for testing
  • 14:09 hashar: rebooting both apaches
  • 14:08 hashar: rebooting sql and sql02
  • 14:05 hashar: upgrading mysql on -sql

November 14

  • 22:54 hashar: upgrading packages on -jobrunner08
  • 20:38 manybubbles: updating search indexes in labs
  • 00:44 MaxSem: Rebooting deployment-solr, jetty (or java?) is FUBAR

November 11

  • 14:18 hashar: Flow was no more functional due to some backtrace in Parsoid daemon (bug 56781). Solved by upgrading Parsoid, reinstalling its dependencies and restarting it. Test page is http://en.wikipedia.beta.wmflabs.org/wiki/Talk:Flow_QA
  • 14:14 hashar: deleting and reinstalling Parsoid node modules dependencies
  • 14:13 hashar: changing Parsoid from 4 months old cdbfdbb to 986c1e7
  • 13:47 hashar: upgrading varnish on all caches.

November 7

  • 09:39 hashar: rebooting apache33 for kernel upgrade
  • 09:38 hashar: rebooting apache32 for kernel upgrade
  • 09:19 hashar: reenabling puppet on deployment-apache33
  • 09:15 hashar: deleted sudo policy 'webadmins' only had petrb in it with no specific access.
  • 09:14 hashar: removed sudo group 'admin', removing root access from any volunteers
  • 09:08 hashar: Restarted bits varnish to clear out the cache.

November 6

  • 12:09 hashar: apt-get dist-upgrade on deployment-eventlogging
  • 11:38 hashar: upgrading packages on deployment-parsoid2

November 5

  • 21:39 hashar: applying role::logging::mediawiki::errors on deployment-fluoride. Should get a listener of some sort on port 8423 to receive fatal/exceptions
  • 16:16 hashar: fixed up mediawiki/extensions.git which still add the deleted extension WikibaseDatabase . That has been blocking code update since Oct 30th.

October 28

  • 13:16 manybubbles: restarted elasticsearch nodes to pick up new config

October 19

  • 20:29 wm-bot: petrb: did mwscript changePassword.php --wiki enwiki --user PiRSquared --password mooh

October 15

  • 21:04 hashar: -bastion rebooted, restarted udp2log : /etc/init.d/udp2log stop; /etc/init.d/udp2log-mw start
  • 21:03 hashar: rebooting deployment-bastion for NFS config fix.

October 14

  • 22:00 wm-bot: hashar: made /data/project/logs group writable, it belongs to nemobis :/
  • 10:12 hashar: purged varnishhtcpd on deployment-upload04 to make it start again.
  • 09:53 hashar: rebooting all varnish caches ( deployment-cache-text1 deployment-cache-upload04 deployment-cache-bits03 deployment-cache-mobile01 )
  • 09:47 hashar: mobile varnish frontend cache is not starting anymore : /usr/lib/x86_64-linux-gnu/varnish/vmods/libvmod_netmapper.so: cannot open shared object file: No such file or directory bug 55662

October 11

  • 10:48 hashar_: beta is back up :-]
  • 10:30 hashar: resyncronizing mediawiki/extensions.git submodules.
  • 10:16 hashar: git directory of mediawiki/extensions was borked following the NFS migration. Ffixing it up manually
  • 10:05 hashar: stopped udp2log, started udp2log-mw
  • 10:04 hashar: rebooting deployment-bastion
  • 10:04 hashar: Jenkins jobs failing, jenkins-deploy user apparently can't write to its home dir /home/jenkins-deploy/workspace

October 7

  • 10:48 hashar: applied iptables rules for bug 45868 on deployment-apache{32,33} and jobrunner08
  • 10:05 hashar: applied iptables NAT rules on deployment-bastion bug 45868

October 4

  • 19:32 MaxSem: Created table bug_54847_password_resets on all wikis

October 3

  • 13:22 manybubbles: finished rebuilding search indexes after cirrussearch update
  • 00:38 manybubbles: rebuilding search indecies after cirrussearch update

September 30

  • 08:16 hashar: upgrading and restarting memcached on memc0 and memc1 to let them limit their memory at 15GB instead of 89G bug 52378

September 24

  • 13:38 manybubbles: indecies finished rebuilding some time last night.

September 23

  • 16:26 manybubbles: rebuilding search indecies after new index config deployment

September 20

  • 13:39 manybubbles: rebuilt most search indecies in beta but commonswiki crashed late last night so it is half rebuilt. filing bug.

September 19

  • 19:26 manybubbles: elasticseach filled up the system disk on its hosts so I moved its data to /mnt with a symlink.
  • 18:37 manybubbles: rebuilding search indecies after a few merges in cirrussearch

September 17

  • 20:05 hashar: upgrading PHP on bastion, jobrunner and apaches to from 5.3.10-1ubuntu3.7+wmf1 to 5.3.10-1ubuntu3.8+wmf1
  • 19:00 manybubbles: upgraded elasticsearch in beta to 0.90.4
  • 18:08 manybubbles: upgrading elasticsearch in beta to 0.90.4 so we can make sure it works so we can use some new features in it

September 10

  • 16:29 hashar: rebooted bastion after some nfs outage. Stopped udp2log, started udp2log-mw

September 7

  • 01:07 manybubbles: rebuilding search indecies on beta after lots of updates

September 3

  • 14:59 hashar: upgrading PHP5 ( 5.3.10-1ubuntu3.7+wmf1 ) on deployment-apache32, deployment-apache33 and deployment-jobrunner08
  • 14:55 hashar: upgrading PHP5 package on deployment-bastion

August 26

  • 17:47 manybubbles: rebuilding search indecies to unbreak CirrusSearch....

August 20

  • 18:47 manybubbles: rebuild search indecies after some changes to indexing code.

August 19

  • 19:05 manybubbles: rebuilding the search indecies to pick up some recent changes

August 12

  • 19:10 manybubbles: rebuild search indecies for updates
  • 18:45 manybubbles: rebuilding all search indecies using updates
  • 18:45 manybubbles: unstuck CirrusSearch so it'd update.

August 8

  • 17:41 manybubbles: simplewiki's search index has completed building. All search indecies should now be up to date.
  • 15:53 manybubbles: reindexed all wikis to add accent squashing. simplewiki is still rebuilding but I reindexed what was complete and starting the rebuild again so it'd pick up accent squashing.
  • 11:55 manybubbles: all search indecies have finished building except simplewiki

August 7

  • 20:31 manybubbles: rebuild all the small search indecies. waiting on enwiki, enwikivoyage, simplewiki, and commonswiki.
  • 20:08 manybubbles: rebuild search indecies after large-ish code change to CirrusSearch
  • 07:27 andrewbogott: rebooted deployment-memc1 and deployment-memc0 (not at the same time) while freeing up space on virt servers.

August 4

  • 22:57 wm-bot: platonides: test
  • 22:56 wm-bot: platonides: git reset --hard to restore /data/project/apache/common-local/php-master/extensions/Translate/specials/SpecialManageGroups.php (bug 52534)
  • 22:54 wm-bot: platonides: --help

August 3

  • 17:59 manybubbles: looks like simplewiki's search index finally finished. party time.

August 1

  • 16:27 manybubbles: building search index for commonswiki and the other wikis that aren't in the main section of http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:SiteMatrix
  • 12:46 manybubbles|away: enwikivoyage's search index finished building over night. dewikivoyage seems to have stalled out. I'm going to profile it. simplewiki is still running and will need some love to finish more quickly.
  • 09:02 hashar: rebooted both memcached instances to be able to log on them. Apt upgrading both of them
  • 08:57 hashar: Deleting deployment-cache-upload03 , replaced by the fully puppetized instance deployment-cache-upload04
  • 08:57 hashar: Deleting the old squid instance since we run varnish cache for text nowadays

July 31

  • 21:37 ^d: Fixing permissions on /mnt/upload7/wikivoyage to be like the other domains
  • 21:24 manybubbles: dewikivoyage and enwikivoyage are still building. simplewiki crashed. https://bugzilla.wikimedia.org/show_bug.cgi?id=52353
  • 21:24 manybubbles: built and populated search indecies for all wikis exception dewikivoyage, enwikivoyage, and simplewiki.
  • 20:45 manybubbles: building search indecies for beta
  • 20:44 hashar: manybubbles never logs anything.
  • 11:00 hashar: Migrated beta code updater from shell to python. https://integration.wikimedia.org/ci/job/beta-code-update/

July 30

  • 22:05 ^d: Memcached moved off of the apache instances to their own dedicated hosts (-memc0 and -memc1). Should have a lot more memc storage now.

July 29

  • 23:44 hashar: fixed up timeline on beta, it never worked there. Thanks ^demon !
  • 13:29 hashar: rebuilding l10n cache, has been broken for a while

July 26

  • 21:21 hashar: applying misc::syslog-server on deployment-bastion to make it a syslog server bug 36748

July 24

  • 20:41 hashar: restarted memcached on both apache boxes. Might clear their caches.
  • 20:40 hashar: apt-get upgrading apache32 and apache33. Running puppet on them
  • 11:30 hashar: manually running sync-site-resources : su - apache -s /bin/bash then /usr/local/bin/sync-site-resources

July 23

  • 09:03 hashar: restarted varnish text cache

July 22

  • 07:57 hashar: deleting deployment-varnish-t3 , used as a mobile cache, now replaced by deployment-cache-mobile01
  • 07:56 hashar: deleting deployment-puptest , unused, no class applied

July 19

  • 19:55 hashar: rebooting deployment-cache-text01.pmtpa.wmflabs , can't access it

July 18

  • 12:41 hashar: Text cache was not in wgSquidNoPurge, that caused all requests to be interpreted as coming from the text cache causing misc issue (such as throttling account creation for everyone).

July 17

July 16

July 10

  • 08:57 hashar: rebooting -sql instance to make it use NFS as /home
  • 08:06 hashar: shutting down deployment-cache-upload03
  • 08:04 hashar: migrating upload.beta.wmflabs.org from cache-upload03 (lucid/squid) to cache-upload04 (precise/varnish)

July 9

  • 19:34 hashar: Attempting to reboot a bunch of instances prevent ssh access because /home is borked . uploadtest08 uploadtest07 -cache-upload04 -cache-text01 parsoid2 cache-mobile01 deployment-sql02 cache-upload03
  • 17:42 aude: added Yuvipanda to the project

July 8

  • 08:27 hashar: rebooting deployment-cache-text1 , maybe I can get ssh access this wa
  • 08:26 hashar: Set $wgLoadScript to points to bits instead of the wiki local docroot. 70322
  • 08:08 hashar: rebooting deployment-cache-upload04

July 3

July 2

  • 15:18 hashar: deleting deployment-searchidx02 , not being used
  • 13:38 hashar: restarted mw-cgroup upstart service on apaches box. That recreated the wgCgroup directory /sys/fs/cgroup/memory/mediawiki
  • 13:10 hashar: removed iptables 'nat' rule from deployment-upload
  • 13:10 hashar: pointed deployment-upload thumb handler to the varnish cache text instead of squid. Done by editing /data/project/upload7/scripts/thumb-handler.php
  • 12:58 hashar: installing iptables on deployment-upload
  • 12:57 hashar: Updating iptables rule that workaround NAT issue in beta. Applied on deployment-searchidx01 and deployment-upload : iptables -t nat -I OUTPUT --dest 208.80.153.219 -j DNAT --to-dest 10.4.1.133 See also bug 45868

July 1

  • 22:05 hashar: upgraded packages on deployment-eventlogging
  • 18:57 hashar: deleted deployment-nginx-test , not needed anymore, nginx proxies for mobile are working
  • 14:42 hashar: Migration to the new mobile instance was tracked by bug 49469
  • 14:40 hashar: shutdowning deployment-varnish-t3 (replaced by deployment-cache-mobile01
  • 14:40 hashar: binding mobile IP address 208.80.153.143 to deployment-cache-mobile01
  • 14:38 hashar: rebooting deployment-varnish-t3
  • 14:35 hashar: updated puppet repository on deployment-varnish-t3 and running puppet there
  • 14:34 hashar: applying role::protoproxy::ssl::beta on deployment-cache-mobile01 (attended to replace varnish-t3 for mobile caching)
  • 14:14 hashar: rebooting deployment-cache-mobile01
  • 13:54 hashar: attempting to enable HTTPS on the varnish text cache by applying role::protoproxy::ssl::beta
  • 12:28 hashar: restarted both apaches. Beta has been down for a couple hours due to a NFS issue on labstore3.
  • 08:43 hashar: Shutdowning deployment-squid , service migrated to deployment-cache-text01 (varnish).
  • 08:36 hashar: Switching the text cache traffic from deployment-squid to deployment-cache-text1 by reassociating the public IP 208.80.153.219

June 26

  • 09:22 hashar: Squid restarted properly, that fixed some stalled resource loader entries that were causing some outdated Javascript modules to be served. Fixed at least an inconsistency such as bug 49911
  • 09:17 hashar: recreated squid swap directories with `squid -z` restarting squid
  • 09:14 hashar: stopping squid and pruning cache

June 24

  • 22:01 hashar: clearing memcached , that might cleanup some resource loader cache causing bug 49911 "nab collapse missing in beta"
  • 15:42 hashar: restarted lucene-search-2 on searchidx01
  • 15:37 hashar: upgrading -searchidx01 and refreshing puppet manifests

June 20

  • 20:45 hashar: Jasper Deng joined in AbuseFilter manager group :)
  • 20:23 hashar: VisualEditor self updated on beta, it was stuck due to a misconfiguration in gerrit bug 49846

June 19

  • 08:36 hashar: Fixing up the abuse filter central DB to points to 'labswiki' instead of the non existent 'metawiki' 69461. Suggested by Steinsplitter :)

June 18

  • 22:15 hashar: Running /usr/local/bin/sync-site-resources 68309
  • 22:13 hashar: Applying MaxSem 'misc::beta::sync-site-resources' to deployment-bastion. That syncs .css articles from production to beta!

June 17

  • 16:25 hashar: Apache was down on apache32. Restarted it as well as on apache33.. Solved bug 49700
  • 16:22 hashar: varnish-t3 (mobile cache): cleaned up operations/puppet local repo and re ran puppet. Still blocked :/ bug 49700
  • 11:16 hashar: created /data/project/apache/uncommon/master , owned by mwdeploy:mwdeploy and mode 0755.

June 12

  • 08:04 hashar: Creating deployment-cache-upload04 using a Precise image. The aim is to replace deployment-cache-upload03 which runs Lucid (see also bug 49470

May 30

  • 08:27 hashar: Added Nikerabbit to the project. Will setup solr for translate

May 28

May 24

  • 12:24 hashar: creating a dumb proxy blocker touch /data/project/apache/common-local/php-master/../wmf-config/mwblocker.log
  • 12:20 hashar: mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki en special wikidatawiki wikidata.beta.wmflabs.org
  • 12:20 hashar: attempting to install wikidata

May 20

  • 18:30 hashar: Added Krenair to the project
  • 08:21 hashar: removing thumbnails from the Gluster shared directory: cd /data/project/upload7 && find -maxdepth 3 -wholename '*/thumb'|xargs -n1 -P4 rm -v -fR

May 16

  • 14:53 hashar: restarted job service on jobrunner08 : /etc/init.d/mw-job-runner restart . It was missing /usr/local/apache/common 64057 and 64065 fix it by using a symlink to /data/project/apache just like on apache webservers.

May 14

  • 12:53 hashar: deleting deployment-lucene, we are using search01 and searchidx01
  • 12:49 hashar: rebooting -cache-upload03 for kernel upgrade
  • 12:49 hashar: rebooting -sql for kernel / mysql upgrade
  • 12:47 hashar: rebooting -squid for kernel upgrade
  • 12:46 hashar: upgrading bunch of boxes.
  • 12:28 hashar: refreshing l10n cache
  • 12:28 hashar: fixed l10n cache ownership: chown -R l10nupdate:l10nupdate /data/project/apache/common-local/php-master/cache/l10n/
  • 11:58 hashar: l10n cache is broken since Apr 30th 15:21 utC
  • 11:44 hashar: We somehow have HTTPS on beta now! https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page still have to fix up the cert names though.
  • 11:20 hashar: deployment-squid applying role::protoproxy::ssl::beta
  • 11:15 hashar: deployment-varnish-t3 applying role::protoproxy::ssl::beta
  • 11:14 hashar: deployment-varnish-t3 : updating local puppet repo
  • 10:43 hashar: deployment-cache-bits03 + role::protoproxy::ssl::beta (should give us https on bits.beta.wmflabs.org

May 9

  • 21:32 hashar: added mattflaschen as a sysadmin

May 7

  • 08:38 hashar: Created deployment-nginx-test to try out the nginx manifests for SSL.

May 6

  • 09:36 hashar: Adding ArielGlenn as a member/sysadmin

May 3

  • 01:29 hashar: bastion: /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start
  • 01:18 Coren: rebooted deployment-bastion after manual workaround for a broken puppet run

May 1

  • 12:43 hashar: migrating jobruner08 and video05 to the NFS server
  • 12:43 hashar: updated puppet manifests on -video05 34ad3d6..32fef26

April 30

  • 21:35 hashar: Fixed the git path in mediawiki/extensions.git local copy of -bastion
  • 21:04 hashar: both apaches are now serving content from the NFS cluster.
  • 20:58 hashar: Migrating apache-33 to use the new NFS server
  • 20:58 hashar: apache-32 running with NFS went from 560ms to 260ms when serving pages \O/
  • 20:49 hashar: Recreated wikiversions.cdb on bastion for the new NFS home dir
  • 20:08 hashar: migrating apache32 to new NFS server
  • 20:06 hashar: root@deployment-bastion:~# /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start
  • 20:01 hashar: applying role::labsnfs::client on -bastion
  • 19:45 hashar: applying the very recent `role::labsnfs::client` class on deployment-integration
  • 19:43 hashar: Upgraded puppet manifests on deployment-integration and running puppet.
  • 19:21 hashar: Migrating homes to the new NFS server
  • 18:27 hashar: rsync to the NFS server are completed. There are most probably still some tiny files than need to be copied though
  • 16:46 hashar: Mounted new NFS server on /srv/project on instances: apache32, apache33, video05 and jobrunner08
  • 16:01 hashar: Clearing out years old backup from /data/project such as copy of extensions, databases dumps and some old instances backups.
  • 15:28 hashar: Copying l10n cache to the new NFS server: rsync -av /home/wikipedia/common/php-master/cache /srv/project/apache/common/php-master
  • 15:11 hashar: syncing upload data from the Gluster share to labnfs server: rsync -avv /data/project/upload7 /srv/project
  • 13:59 hashar: bastion: created NFS mount point thanks to Coren. echo 1 >/sys/module/nfs/parameters/nfs4_disable_idmapping  ; mount -t nfs -o nfsvers=4,port=0,hard,rsize=65535,wsize=65536 labnfs.pmtpa.wmnet:/deployment-prep/project /srv/project
  • 12:41 hashar: Refreshed most extensions and running mw-update-l10n

April 29

  • 21:00 hashar: updated MobileFrontend manually to 9356d00ac5

April 26

  • 15:56 MaxSem: Enabled GeoData cronjobs

April 24

  • 12:41 hashar: on searchidx01 iptables -t nat -I OUTPUT --dest 208.80.153.219 -j DNAT --to-dest 10.4.0.17 (see bug 45868 )

April 23

  • 08:32 MaxSem: Deployed GeoData

April 22

  • 20:49 hashar: Manually updating all mw extensions to make sure everything works fine.

April 21

  • 21:17 hashar: beta is up again. Apache2 could not start because the error log file was not accessible ( bug 47479 )
  • 20:33 hashar: Apache down on both apaches instances

April 19

  • 19:50 hashar: The l10n cache was stalled since Mar 22 13:08 at least. The files were owned by `mwdeploy` seems something changed and they are now owned by `l10nupdate` So I ran: chown l10nupdate -R /home/wikipedia/common/php-master/cache/l10n/
  • 19:46 hashar: Attempted to update the l10n cache (sudo -u mwdeploy mw-update-l10n ) got a permission deny on /home/wikipedia/common/php-master/cache/l10n
  • 19:43 hashar: Gluster is broken on beta. Extensions are no more updating nor the l10n update can run. bug 47425
  • 19:38 hashar: root@deployment-bastion:~# /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start
  • 19:37 hashar: Rebooting bastion. Seems GlusterFS can not allocate memory ( bug 47425 )
  • 19:18 hashar: manually updating mediawiki extensions
  • 11:52 hashar: Successfully added Mark Bergsma to deployment-prep.
  • 09:00 hashar: Updating puppet repositories on search01 and searchidx01. Running puppet on both of them.

April 18

  • 13:12 hashar: update mobile cache (varnish-t3) to latest puppet manifests.

April 16

  • 16:02 hashar: Updating mobile cache to use some mark change 59401
  • 09:35 hashar: applying role::cache::mobile to deployment-cache-mobile01 (that will replace deployment-varnish-t3 eventually)
  • 09:25 hashar: Updating mobile cache (deployment-varnish-t3) to patchset 47567/9 . Some puppet changes got merged in this morning :-]

April 15

  • 14:03 hashar: Rebooting the fresh deployment-cache-upload-test6 instance
  • 12:49 hashar: -cache-upload03 refreshing local puppet repo

April 10

  • 18:24 ^demon|sick: ran mergeMessageList.php for php-master wikis
  • 13:33 hashar: Restarted the database updating job https://integration.wikimedia.org/ci/job/beta-update-databases/374/
  • 13:32 hashar: switching udp2log on bastion: /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start (see bug 38995 )
  • 13:31 hashar: rebooting deployment-bastion too : gluster issue
  • 13:26 hashar: Cluster is back up :-]
  • 13:25 hashar: rebooting both apaches.
  • 13:24 hashar: Gluster failure again /data/project/apache/conf/ has some files missing: www.wikipedia.conf en2.conf wikimedia.conf
  • 13:23 hashar: apache2: Syntax error on line 324 of /etc/apache2/apache2.conf: Syntax error on line 9 of /etc/apache2/wmf/all.conf: Could not open configuration file /etc/apache2/wmf/www.wikipedia.conf: No such file or directory
  • 13:20 hashar: apt-get upgraded apache32 and apache33 . Note that apache is down on them.
  • 13:19 hashar: no pages being served. Most probably a PHP Fatal error
  • 13:13 hashar: reran Jenkins job https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update/ . Some git failures happened in /home/wikipedia/common .
  • 06:45 hashar: searchidx01 : restarted lucene-search-2 might have been killed by OOM killer (see bug 46459
  • 06:39 hashar: search01 : restarted lucene-search-2 , was not listening on port 8123.

April 8

  • 20:51 hashar: deployment-search01 : /usr/bin/java -Xmx2000m  :-]
  • 20:50 hashar: Changing lucene-search-2 memory usage from 20G to 2G by manually editing /etc/init.d/lucene-search-2 (see bug 46459 )
  • 20:00 hashar: deployment-search01 updating local puppet repo c345581..7d036cb
  • 19:57 hashar: deployment-searchidx01 updating local puppet repository 81f5a93..7d036cb

March 29

  • 12:00 hashar: MobileFrontend should not let user login again (bug 46649, the issue was most probably caused by the lack of commonswiki on beta.
  • 11:53 hashar: restoring commonswiki on beta 56593.
  • 10:40 hashar: rebooting jobrunner08 and bastion. High network use too.
  • 10:38 hashar: rebooting both apaches instances. They consume ton of network, most probably related to Gluster

March 25

  • 19:46 hashar: -bastion : restarting puppet. Restarting beta autoupdater.
  • 15:01 hashar: Updated database enwiki
  • 14:26 hashar: getting lazy, dropping Central Notice tables from enwiki and rerunning updater.
  • 14:16 hashar: Attempting to fix central notice database schema for enwiki
  • 11:46 hashar: removing local hack made to ArticleFeedback data/maintenance/DataModelPurgeCache.php
  • 11:44 hashar: /home/wikipedia/common/php-master/extensions  : git remote update && git reset --hard origin/master && git submodule update --init
  • 11:40 hashar: Resetting the extensions checkout. Been broken for a few days because of extension renaming.

March 22

  • 21:11 hashar: Search is back! Turns out that lucene-search2 service was not running on deployment-search01 despite puppet ensure => running on the service :( See also bug 46459
  • 21:03 hashar: Starting lucene-search-2 on deployment-search
  • 15:00 hashar: manually update puppet sources on -search01 and -searchidx01
  • 14:40 hashar: manually refreshing extensions on -bastion
  • 14:39 hashar: Updated mediawiki/extensions.git which was lacking the Thanks extension 55263
  • 14:39 hashar: I have setup a Jenkins job to automatically update mediawiki-config. Dashboard is https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update/

March 21

  • 22:37 hashar: -bastion : stopping udp2log that prevents udp2log-mw from running :/ See bug 38995
  • 22:35 hashar: both apaches gives out Error 500 so beta is now serving blank pages.

March 19

  • 09:23 hashar: created sudo policy for jenkins-deploy user. That is the user for the Jenkins slave running deployment-bastion

March 15

  • 13:41 hashar: -squid killed nrpe and restarted it (just to be sure)
  • 13:39 hashar: -squid started puppet service
  • 13:38 hashar: -squid ran puppet manually which deployed the new redirector from https://gerrit.wikimedia.org/r/53935
  • 13:34 hashar: -squid shows a ton of stalled `redirector` processes. Killed them all.

March 14

  • 22:20 hashar: deployment-bastion is now a jenkins slave of the production Jenkins machine
  • 22:13 hashar: manually installing openjdk-7-jre on -bastion
  • 22:02 hashar: Successfully added jenkins-deploy to deployment-prep.
  • 21:59 hashar: adding jenkins-deploy to the project
  • 21:37 hashar: removing restrictions from deployment-bastion . authorized_keys is not read when in labs :] (thx Ryan)
  • 21:16 hashar: -bastion changed restricted_to to (project-deployment-prep) (jenkins)
  • 21:15 hashar: on -bastion: Added group restrictions and set variable restricted_to = (project-jenkins) (jenkins) thanks ryan
  • 21:02 hashar: creating jenkins homedir manually on -bastion
  • 20:38 hashar: applying jenkins::user to deployment-bastion
  • 19:03 hashar: rebooting -bastion to find out whether the security rule is applied
  • 19:01 hashar: updated security rule to allow TCP port 22 connection from gallium.wikimedia.org [208.80.154.135/32]
  • 04:22 hashar: killed job runners on jobrunner08 and restarted service
  • 04:22 hashar: Restarted apache on apache32,33
  • 04:20 hashar: Upgrading apache32, apache33, video05 and jobrunner08
  • 02:13 hashar: Trying out geoip module from 53714 on deployment-integration

March 13

  • 04:49 hashar: rebooting deployment-integration
  • 04:47 hashar: rebooting deployment-lucene

March 12

  • 20:39 Chad: added port 1099 to search engine security group to allow RMI messaging to go through

March 11

  • 05:44 hashar: Running MediaWiki update.php on all databases

March 8

  • 20:15 hashar: The search backend is apparently working now !!! bug 34250
  • 00:46 hashar: upgrading all instances

March 7

  • 23:43 hashar: OAI repository set up on beta !!! bug 45814
  • 23:42 hashar: for squid login=PASSTHRU replaced by login=PASS. Reloaded squid.
  • 23:41 hashar: reloading squid
  • 23:41 hashar: setup squid to pass the WWW-Authorization headers to the Apache. Done by configuring login=PASSTHRU for each cache_peer (*crosses fingers*)
  • 22:15 hashar: Set up an OAI repository user for lucene search. Password in puppet.
  • 22:04 hashar: Restored mysql admin password on deployment-sql
  • 21:57 hashar: stopping mysql server on -sql
  • 21:31 hashar: Creating OAI repositories on sql and sql02 master databases
  • 21:11 hashar: updating mediawiki-config fc22500..71e689a

March 6

  • 20:21 hashar: creating deployment-searchidx02 wich has 16GB of RAM. deployment-searchidx01 does not have enough RAM :(
  • 19:59 hashar: rebooting apache33 : gluster mount is corrupted
  • 19:28 hashar: regenerating lucene prefixes
  • 19:02 hashar: refreshed wikiversions.cdb
  • 18:37 hashar: rebooting search indexer
  • 00:47 hashar: Trying to import enwiki database on the lucene search deployment-searchidx01 : sudo -u lsearch /a/search/lucene.jobs.sh import-db enwiki

March 5

  • 19:01 hashar: Log
  • 00:59 hashar: reinstall the search box packages to make sure they use /a as a mount of /dev/vdb
  • 00:09 hashar: rebooting deployment-search01, stuck somehow

March 4

  • 17:47 hashar: removing all 'aft%' tables to make sure ArticleFeedbackv5 database schema is valid bug 45318
  • 17:43 hashar: set a dummy value for wmgTranslationNotificationUserPassword

March 1

  • 20:39 hashar: Search boxes are now having 51677 patchset 5 applied. Still have to figure out how Lucene works though
  • 19:17 hashar: Applying puppetmaster::self to both search boxes
  • 18:35 hashar: Created deployment-search01 and deployment-searchidx01
  • 18:12 hashar: deleting -dbdump, migrated udp2log on -bastion

February 27

  • 17:34 hashar: updating mediawiki-config 8d1aac9..10bda3a
  • 17:22 hashar: foreachwikiindblist /home/wikipedia/common/all-labs.dblist update.php --quick --quiet

February 26

  • 22:48 hashar: running database update for enwiki
  • 18:50 hashar: adding ram and demon to the project

February 25

  • 17:45 hasharMeeting: mwscript update.php --wiki=testwiki

February 19

  • 15:54 hashar: applied role::cache::text on deployment-cache-text01

February 18

  • 18:03 hashar: running apt-get distupgrade on -cache-text01 , -sql04 and -sql03
  • 18:02 hashar: running apt-get distupgrade on -cache-upload04
  • 17:57 hashar: applying role::cache::upload to -cache-upload04
  • 15:31 hashar: mobile redirection is more or less in place on beta. Browsing with a mobile agent will redirect to the mobile version.
  • 14:32 hashar: apaches giving errors cause wikidatawiki is not configured
  • 14:22 hashar: wikidatawiki is missing oh no beta dead again
  • 14:13 hashar: fixed puppet on -squid, it was blocked by attempting to apply a non existent class: generic::package::git-core
  • 14:08 hashar: applying the new squid::redirector class to deployment-squid so we can handle mobile redirects
  • 11:35 hashar: Deleting -mc instance, memcached is now on apaches
  • 11:27 hashar: rebooting -bastion again
  • 11:26 hashar: got Gluster client upgraded on -bastion
  • 11:21 hashar: rebooting -bastion
  • 11:18 hashar: migrating memcached from -mc to the apaches boxes. 49261
  • 09:07 hashar: Running update.php on all databases.

February 15

  • 17:17 labs-logs-bottie: petrb: rebooting bastion to fix some issues with mw
  • 17:14 hashar: running "mw-update-l10n --verbose" on -bastion as mwdeploy user
  • 15:37 hashar: puppet properly start the apache2 service. Fixed bug 38996
  • 15:20 hashar: rebooting apaches box to find out whether apache2 service is bring up 47398

February 14

  • 14:32 hashar: git maintenance override. Now running: git submodule foreach 'git repack -a -d --depth=250 --window=250'
  • 14:30 hashar: doing some git maintenance: cd /home/wikipedia/common/php-master/extensions ; git submodule foreach 'git gc --aggressive && git repack -a'

February 13

  • 19:41 hashar: starting apaches manually
  • 19:37 hashar: rebooting both apaches. Gluster seems to be stalled
  • 19:32 hashar: rebooting bastion
  • 19:31 hashar: restarting squid
  • 19:20 hashar: updating mediawiki-config to latest master : Updated 70fec38..7c4810c
  • 15:48 hashar: on bastion: stopped puppet and wmf-beta-autoupdater , running git pull manually in php-master/extensions
  • 15:22 hashar: Running mw-update-l10n manually as user mwdeploy. Should regenerate the l10n cache
  • 15:21 hashar: chown -R l10nupdate:l10nupdate /data/project/apache/common-local/php-master/cache/l10n
  • 15:21 hashar: Mutante merged a sudo right change that would unblock the beta auotupdater ( see https://gerrit.wikimedia.org/r/#/c/47795/ )

February 11

  • 11:02 hashar: Granted anth1y sudo access on the project so he can plays with Lucene
  • 11:02 hashar: Added anth1y to the project, he is interested in Lucene / Swift stuff :-D

February 4

  • 14:24 hashar: Started the over long l10n cache rebuild in a screen on deployment-bastion
  • 14:09 hashar: -bastion applying misc::deployment::scap_scripts
  • 14:07 hashar: -bastion removing role::deployment::deployment_servers::labs
  • 13:35 hashar: Applying role::memcached to apache32 and apache33
  • 13:24 hashar: manually updating extensions to make sure the beta autoupdater works properly
  • 13:19 hashar: the infamous beta auto updater is back in action on deployment-bastion
  • 13:07 hashar: starting apache2 on apache32 and apache33
  • 13:06 hashar: multiversion/refreshWikiversionsCDB
  • 13:05 hashar: refreshing /home/wikipedia/common from latest master (no more newdeploy branch)
  • 13:04 hashar: REVERTED GIT-DEPLOY!!!!! rm /data/project/apache/common-local (symlink) and restored backup: mv /data/project/apache/common-local.pre-git-deploy /data/project/apache/common-local
  • 13:00 hashar: rebooting apache32 (locked / can't login)
  • 12:52 hashar: rebasing /srv/deployment/mediawiki/common
  • 12:41 hashar: -dbdump : stopping udp2log, starting udp2log-mw
  • 09:24 hashar: upgrading / rebooting all instances
  • 09:18 hashar: Beta is broken in some random and creative ways AGAIN. /home on bastion is corrupted, some instances do not let us connect anymore, apache docroot disappeared.

February 1

  • 10:21 hashar: nslcd probably points to a wrong LDAP or has a faulty DNS configuration. Can't login on it anymore :/
  • 10:12 hashar: rebooting the varnish-t3 instance, nslcd can't resolve somepath

January 31

  • 15:49 hashar: Deleting out /data/project/squid1 which has been migrated to /mnt/squid_cache. The gluster volume for data-project is corrupted on beta so we don't want to use it anymore.
  • 15:46 hashar: stoping squid, migrating ufs cache from /data/project/squid1 (gluster) to /mnt/squid_cache
  • 15:42 hashar: cleaned out deployment-squid:/mnt/ (add an old enwiki dump and some squid files
  • 15:19 hashar: restarting squid process on deployment-squid
  • 15:18 hashar: starting apache2 on -apache32
  • 14:53 petan: restarted squid and rebooted apache32

January 30

  • 15:17 hashar: removing -cache-bits-02 (been replaced a long time ago by -cache-bits-03)

January 21

  • 12:42 hashar: -varnish-t3 : removing /dev/sda* entries from /etc/fstab , applying 44709 ps 6 and rerunning puppet
  • 12:28 hashar: applying role::cache::mobile on deployment-varnish-t3
  • 11:27 hashar: created deployment-varnish-t3 , deleted deployment-varnish-t2
  • 10:48 hashar: moved 208.80.153.143 from deployment-varnish-t to deployment-varnish-t2 (IP is in DNS as *.m.beta.wmflabs.org )
  • 10:38 hashar: creating deployment-varnish-t2 to replace broken deployment-varnish-t
  • 10:25 hashar: re rebooting dpeloyment-varnish-t
  • 09:58 hashar: Rebooting deployment-varnish-t from labsconsole. I guess there is a mount for /dev/sda* :(
  • 09:51 hashar: rebooting deployment-varnish-t to find out how well it goes on restart :-]

January 18

  • 21:20 hashar: deployment-varnish-t : apt-get upgrade
  • 21:18 hashar: running puppet on cache-bits03 to find out whether role::cache::bits cleanly apply there.

January 16

  • 02:07 Reedy: Created geo_tags tables on all deployment-prep wikis

January 15

  • 21:49 hashar: ln -s /srv/deployment/mediawiki/common /data/project/apache/common-local
  • 21:49 hashar: renamed /data/project/apache/common-local to common-local.pre-git-deploy

January 14

  • 22:58 hashar: renamed php-1.21wmf{6,7} with a -back prefix. Created symbolic links to the git-deploy slots: ln -s /srv/deployment/mediawiki/slot1 php-1.21wmf6 and /srv/deployment/mediawiki/slot0 php-1.21wmf7
  • 22:56 hashar: updating mediawiki-config fd29e6a..329113f

January 11

  • 17:08 wm-bot: this is a creepy log with | and such shitty chars $@#% 6346 w@#%^@# 6bla

January 10

  • 08:29 Ryan_Lane: deployed all repos to destination hosts
  • 08:29 Ryan_Lane: made deployment-bastion a git-deploy deployment host
  • 08:18 hashar: removed misc::deployment::scripts from -bastion, already provided by misc::deployment::scap_scripts
  • 08:09 hashar: put back role::beta::autoupdater on -bastion

January 9

  • 21:47 hashar: running puppet on apache boxes to get the new role::applicationserver::appserver::beta class
  • 21:40 hashar: migrating apaches box to the new role::applicationserver::appserver::beta (replaces both appserver and imagescaler)
  • 20:40 hashar: removing the phased out imagescaler::labs from apaches in favor of role::applicationserver::imagescaler
  • 20:20 hashar: Migrated Apache box to use role::applicationserver::appserver instead of the old (and no more existent) role::applicationserver
  • 16:04 jeremyb: the recent (today at least, but probably most of the earlier ones too) !logs from wm-bot are really from hashar. in case you were looking for the source.
  • 16:02 hashar: enwiktionary beta now running 1.21wmf6 http://en.wiktionary.beta.wmflabs.org/wiki/Special:Version
  • 15:59 wm-bot: cp php-master/cache/trusted-xff.cdb php-1.21wmf7/ca�che/
  • 15:59 wm-bot: cp php-master/cache/trusted-xff.cdb php-1.21wmf6/cache/
  • 15:56 hashar: Refreshing the TrustedXFF cache: cd /home/wikipedia/common/php-master/extensions/TrustedXFF && mwscript extensions/TrustedXFF/generate.php --wiki=aawiki ../../cache/trusted-xff.cdb
  • 15:46 hashar: Running mw-update-l10n on deployment-bastion in screen 16609.pts-0.i-00000390
  • 15:32 wm-bot: -video05 : restarted puppet and puppetmaster, killed stalled puppet processes. Rerunning puppet manually
  • 15:04 hashar: -video05: running apt-get upgrade
  • 15:04 wm-bot: -video05 : trimmed /var/log/glusterfs/data-project.log file
  • 14:55 wm-bot: reloaded udp2log-mw on -dbdump
  • 14:33 hashar: Made enwiktionary to use 1.21wmf6 and enwikibooks to use 1.21wmf7 42951

January 7

  • 15:04 wm-bot: apt updated and upgraded apache32, apache33 and jobrunner08
  • 14:27 hashar: apache32 / apache33 filling is logged as bug 43703. This is caused by gluster client log files not being rotated which is bug 41104
  • 14:24 wm-bot: manually emptied out /var/log/glusterfs/data-project.log on apache32 and apache33.
  • 14:20 wm-bot: apache32 and apache33 have disk full again.
  • 11:01 hashar: Fixed up extension static assets ( bug 43692 ).
  • 10:36 wm-bot: updating mediawiki config to latest master

December 21

  • 20:18 wm-bot: rebooting some instances so they get the new /home

December 20

  • 08:58 wm-bot: manually updated git puppet repo on deployment-video05

December 19

  • 20:52 hashar: Granted MaxSem and Mgrover sysadmin rights. They are WMF contractors going to work on setting up MobileFrontend on beta.
  • 11:37 wm-bot: finally had GettingStarted extension installed.
  • 10:37 hashar: /home/wikipedia/common/php-master/extensions/.git/FETCH_HEAD gave I/O error. I have deleted it and reran git pull + git submodule update --init aka : UPDATED ALL EXTENSIONS TO THEIR LATEST master VERSION.
  • 10:32 wm-bot: removing live hack on UserMerge extension (attempted to grant some user right to bureaucrat, that should be done in CommonSettings.php )
  • 10:31 wm-bot: manually running 'git submodule update --init' under extensions directory to find out what is going on there
  • 10:10 wm-bot: rebooting apache32 and apache33 to get new /home
  • 09:50 wm-bot: updating mediawiki-config
  • 09:46 hashar: rebooting -bastion to get the new /home

December 4

  • 12:27 hashar: Apache boxes seems to be running again. Had to manually restart apache on apache33.
  • 08:52 hashar: Apache32 is somehow up
  • 08:42 hashar: on apache33 : removed /var/log symlink, recreated directory, restarted gluster, moving files form /data/project/apache33
  • 08:32 hashar: rebooting apache32 so all its service knows about /var/log :-]
  • 08:30 hashar: on apache32 : removed /var/log symlink, recreated directory, restarted gluster, moving files from /data/project/apache32

November 6

  • 15:32 hashar: fixed up beta by repelling EventLogging extension which was in a weiiiiird stat
  • 15:12 hashar: Resetted all extensions to their latest master version...
  • 14:51 hashar: blank pages on beta are caused by the EventLogging extensions being required although it is not pulled.
  • 14:11 hashar_: configured apaches to send their errors log in /home/wikipedia/logs (conf file is /data/project/apache/conf/wmflabs-logging.conf )
  • 10:45 wm-bot: stashing change in /h/w/c
  • 10:14 hashar: made mwdeploy gitconfig file to support color + added the 'git lg' and 'git lg2' aliases which gives a nice + concise log
  • 08:45 hashar: manually running the beta auto updater from -bastion instance

November 5

  • 23:11 hashar: applied the new role::beta::autoupdater class to -bastion.
  • 21:00 hashar: changing ownership of all files in /home/wikipedia/common to mwdeploy:mwdeploy as per deployment-bastion GID/UID. Running as root in screen 27986.pts-1.i-00000390 .
  • 20:56 hashar: rebooting jobrunner06 to ensure that wmf-beta-autoupdater is gone
  • 20:37 hashar: uninstalling beta updater from jobrunner06 , will be deployed on -bastion
  • 20:16 hashar: rebooting deployment-dbump (stalled git processes)

November 2

  • 16:14 hashar: started mw-job-runner on jobrunner08
  • 15:51 hashar: applying role::applicationserver::jobrunner to jobrunner08
  • 13:47 hashar: created a second job runner instance: deployment-jobrunner07
  • 13:18 labs-logs-bottie: petrb: rebooting -bastion to install updates
  • 13:12 anomie: Rebooted deployment-dbdump to clear up hung processes, hopefully clear up NFS weirdness

October 31

  • 09:06 wm-bot: running apt-get upgrade on -dbdump

October 30

  • 15:47 wm-bot: running apt-get upgrade on -cache-upload3 and -sql
  • 15:45 wm-bot: running apt-get upgrade on apaches boxes + squid

October 29

  • 22:11 hashar: Added anomie as a sysadmin (for sudo) and netadmin

October 28

  • 01:43 jeremyb: 23:35:40 < beta-logmsgbot> !log deployment-prep csteipp: running mwscript update.php dewikivoyage to update from 1.13 import
  • 01:43 jeremyb: 23:31:42 < beta-logmsgbot> !log deployment-prep csteipp: added dewikivoyage config to db-wmflabs.php (temp hack)
  • 01:43 jeremyb: 23:30:46 < beta-logmsgbot> !log deployment-prep csteipp: added dewikivoyage to all-wmflabs.dblist
  • 01:43 jeremyb: 23:23:35 < beta-logmsgbot> !log deployment-prep csteipp: ran mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki de wikivoyage dewikivoyage de.wikivoyage.beta.wmflabs.org

October 23

  • 10:00 labs-logs-bottie: j: update DocumentRoot of upload.beta.wmflabs.org to /mnt/upload7 in /usr/local/apache/conf/upload.conf (fixes 403 for videos)

October 22

  • 23:14 hashar: getting to bed :-]
  • 23:14 hashar: Applying database updates to all wiki (in a screen on -dbdump)
  • 23:14 hashar: Started a manual l10n rebuild in a screen on -dbdump
  • 23:13 hashar: Fixed the jobrunner spamming dberror.log (the all-wmflabs.dblist contained databases from production)
  • 23:13 hashar: log

October 21

  • 17:50 beta-logmsgbot: csteipp: reimporting enwikivoyage database

October 19

  • 20:27 Damianz: csteipp: ran git pull of master, kept local dblist conflicts

October 17

  • 20:43 hashar: applying nfs::upload::labs to apache32 and 33. It is not more applied by the role::applicationserver class (prod apply nfs::upload directly on nodes)
  • 20:30 hashar: moving /data/project/upload6 to /data/project/upload7 to match production. See bug 41121
  • 13:44 hashar: -sql02 removed ganglia from host and reran puppet.
  • 13:41 hashar: Added CSteipp and Reedy as sudoers
  • 12:03 hashar: Fixed assets on bits. The static-master symbolic links got removed at some point. See 28337
  • 11:22 hashar: updated mediawiki-config : 6bbf8f2..7caabad
  • 11:18 hashar: emptied /var/log/glusters/data-deployment.log huge files on several instances
  • 10:34 hashar: deployment-jobrunner06 / was filled out by Gluster logs /var/log/glusterfs/data-project.log filled it all :(
  • 10:28 labs-logs-bottie: j: create new videoscaler instance deployment-video05 this time with sql access
  • 08:31 hashar: -sql02 : manually installed mysql server using /mnt/mysql as datadir.
  • 08:17 hashar: removed role::db::core from -sql02 : class is not meant for labs :-]
  • 08:04 hashar: attempting to deploy role::db::core class on deployment-sql02

October 15

  • 14:40 labs-logs-bottie: root: 4% freed of /home weeeee
  • 14:37 labs-logs-bottie: root: moving /home/wikipedia/logs/ to /data/project/logs
  • 14:32 labs-logs-bottie: root: moving /home/johnduhart/ to /data/project/old/h/johnduhart/
  • 10:14 labs-logs-bottie: j: create new videoscaler instance deployment-video04

October 9

  • 22:01 hashar: applying misc::beta::scripts on a precise instance: -jobrunner06
  • 20:46 hashar: moving misc::beta::scripts from -integration to -dbdump
  • 20:39 hashar: Updating all DB following the merge of content handler patch : foreachwiki update.php --quick

September 24

  • 18:43 wm-bot: rebooting -jobrunner06 and
  • 18:41 wm-bot: stopped beta auto updater on -integration and running apt-get dist-upgrade
  • 18:39 wm-bot: stopped jobrunner and killed -9 PHP processes on -jobrunner06
  • 18:39 hashar: running dist-upgrade on -jobrunner06
  • 13:25 Damianz: abcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmnopqrstuvqxyzabcdefghijklmno
  • 13:25 hashar: shutdowned deployment-cache-bits02
  • 13:24 Damianz: migrating bits from deployment-cache-bits02 to deployment-cache-bits03
  • 13:23 hashar: log me please
  • 13:22 wm-bot: foo

September 14

  • 17:25 wm-bot: updated mediawiki-config : Updating 4d12ee3..2c14daf
  • 17:22 wm-bot: removed a live hack enabling AFTv5 on all wikis

September 13

  • 23:26 hashar: migrate -dbdump misc::scripts to misc::deployment::scripts

September 7

  • 16:04 wm-bot: updating mediawiki configuration

August 31

  • 13:04 wm-bot: ran git pull in /home/wikipedia/common/php/extensions , bringing up a ton of forgotten exts. Will update autoupdater
  • 12:52 wm-bot: rebuilding l10n cache
  • 12:52 labs-logs-bottie: petrb: fixing repo for OSB
  • 09:22 labs-logs-bottie: petrb: rebuilding localization
  • 09:13 labs-logs-bottie: petrb: OSB doesn't seem to be installed properly, investigating
  • 09:08 labs-logs-bottie: petrb: deployed OSB to enwiki
  • 09:07 labs-logs-bottie: petrb: inserted OSB to update-extensions.sh and extensions

August 30

  • 21:22 hashar: Deployed the automatic code updater on beta. It is running on deployment-integration, service is wmf-beta-autoupdate managed by puppet to always run.
  • 20:55 wm-bot: applying beta::scripts to deployment-integration
  • 20:38 hashar: trying out 22116 on deployment-integration (that is the beta auto upda�her)
  • 16:39 wm-bot: updating all extensions and core to their latest master version
  • 09:00 labs-logs-bottie: petrb: php multiversion/MWScript.php changePassword.php --wiki enwiki --user Petrb --password needed to change
  • 08:12 labs-logs-bottie: petrb: /home/wikipedia/common/php git pull

August 29

  • 08:05 wm-bot: deployment-integration : update puppet git repo to latest master

August 28

  • 15:03 labs-logs-bottie: petrb: updated puppet
  • 14:58 labs-logs-bottie: petrb: we have a new bastion :D
  • 14:58 labs-logs-bottie: petrb: fixed mounts
  • 10:20 labs-logs-bottie: root: rebooting bastion
  • 10:14 labs-logs-bottie: root: test
  • 07:48 wm-bot: migrated Apaches boxes from applicationserver::labs to role::applicationserver
  • 07:41 wm-bot: restarted apache process on Apaches boxes

August 27

  • 15:43 wm-bot: l10n cache rebuild
  • 13:31 wm-bot: rebuilding list of extension messages and rebuilding localization cache
  • 08:14 wm-bot: reverted live hacks made to ConfirmEdit extension. Fix saved in file PlatonidesPatch and in git stash
  • 08:05 hashar: Updating all extensions to latest master
  • 08:05 hashar: updating MediaWiki core : Updating d47c1e9..1c00630

August 20

  • 23:37 j^: replace deployment-video02 with deployment-video03 using rolepuppet class from git (role::jobrunner::videoscaler)

August 17

  • 11:03 Platonides: Logging captcha results at /data/project/bug39446.log with live hack for bug 39446

August 3

  • 15:57 hashar: cache-bits02 would need 13304 to be merged in manually whenever the change is completed. Make sure to report any issue to mark :-]
  • 15:31 hashar: updating cache-bits02 puppet to gerrit 13304/21
  • 13:47 hashar: deployed 13304 PS 14 on cache-bits02
  • 13:38 hashar: cache-bits02 : cd /var/lib/git/operations/puppet sudo GIT_SSH=/var/lib/git/ssh git fetch origin refs/changes/04/13304/14 && git checkout -b 13304/14 FETCH_HEAD (aka deploying 13304 patchset 14
  • 10:21 wm-bot: running 'git submodule foreach git gc --aggressive' in screen 2042
  • 10:11 wm-bot: rebooting swapped -dbdump
  • 10:10 wm-bot: Probably killed -dbdump by launching 4 instances of git gc --aggressive :-(((((((((((
  • 10:09 wm-bot: hashar is going to kill us all
  • 10:07 wm-bot: running massive git gc --aggressive on all extensions
  • 09:54 wm-bot: Updating extensions to latest master: git submodule foreach git pull
  • 09:54 wm-bot: hashar :P
  • 09:53 hashar: Updating mediawiki core to latest master: Updating 0c1471c..d47c1e9 (fast forwarded)
  • 09:49 hashar: On extensions: git submodule foreach git checkout master
  • 09:13 hashar: deleting deployment-bastion, it lacks a DNS entry bug 38846
  • 09:12 hashar: running git gc --aggressive in /home/wikipedia/common/php-master
  • 08:48 hashar: making sure perms are correct in /data/project/apache/common-local/php-master : chmod -R g+w . ; chown mwdeploy:svn -R * .*
  • 08:45 hashar: On -dbdump /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start bug 38995
  • 08:35 hashar: running puppet on Apache32 / 33
  • 08:12 hashar: dist-upgrading all instances to get the latest GlusterFS version (3.3.0)
  • 08:11 hashar: Dist upgrading apache32 and 33 and rebooting them

August 2

  • 18:06 andrewbogott: Migrated all instances to new hardware
  • 15:14 hashar: yeah we lost udp2log again! -dbdump : /etc/init.d/udp2log-mw restart
  • 07:58 hashar: bug 38748 deleting unused/corrupted deployment-wmsearch instance. (had stuff like: -bash: /usr/bin/groups: cannot execute binary file. Connection to deployment-wmsearch.pmtpa.wmflabs closed.)

July 30

  • 15:00 hashar: deployment-bastion does not let us log in despite being a fresh instance. Logged as bug 38846
  • 14:47 hashar: rebooting -bastion
  • 12:48 hashar: Shutdowning deployment-nfs-memc for a while, will see if it is still needed around or if we can safely delete it (see bug 38084). All data should be on /data/project .
  • 12:43 hashar: Recreating deployment-bastion using a Precise image and s1.small (1CPU, 1GB RAM, 80G storage)
  • 12:40 hashar: Deleting -bastion , was corrupted.

July 28

  • 03:38 labs-logs-bottie: j: only have one commons wgForeignFileRepos: wikimediacommons at commons.wikimedia.beta.wmflabs.org (/data/project/apache/common/wmf-config/filebackend-wmflabs.php)
  • 01:09 Platonides: testwiki is now showing random captchas
  • 01:09 Platonides: moved /mnt/upload6/captcha/random to /mnt/upload6/private/captcha/random
  • 00:22 Platonides: generating random-challenge captchas at /mnt/upload6/captcha/random

July 27

  • 23:01 Platonides: Running time python captcha.py --output /mnt/upload6/private/captcha --font /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf --count 500 --dirs 3 --key "$(grep -Po '(?<=wmgCaptchaSecret = (["'"'"'])).*(?=\1)' /data/project/apache/common-local/wmf-config/PrivateSettings.php)" --wordlist <( < /usr/share/dict/american-english tr '[A-Z]' '[a-z]' | grep -E '^.{4,5}$' | grep -vE '(.)\1$' | grep -vE '^(.)\1' | LANG=C gre
  • 22:50 Platonides: Filesystem corruption signs in deployment-bastion, most debconf backends /usr/share/perl5/Debconf/FrontEnd are zeroed files. This explains some of the earlier apt-get problems, and maybe also bug 38747 (aka. magic tail -f fix)
  • 22:49 Damianz: Thumbnails are currently broken - thumbs folder seems empty, images are there as expected - beta quirk or need rebuilding?
  • 22:22 Damianz: DocumentRoot got wiped, broken all the sites - fixed the broken symlink on bastion for /data/project/apache to /usr/local/apache. Also fixed common-local to common symlink.
  • 21:56 Platonides: apt-get removed wikimedia-task-appserver in deployment-bastion :(
  • 21:42 Platonides: removed generated captcha files
  • 21:33 Platonides: installed in deployment-bastion the packages joe, python-imaging and wamerican
  • 20:29 labs-logs-bottie: j: deployment-bastion: removing all deployment-nfs-memc entries from /etc/fstab
  • 13:14 hashar: restarted job runner on jobrunner06
  • 13:11 hashar: removing all deployment-nfs-memc entries from /etc/fstab
  • 09:51 hashar: bug 38749 jobrunner06 : removed /usr/local/apache and /mnt/upload6 empty dir. Downgraded PHP manually. Rerunning puppet.
  • 08:46 hashar: migrating jobrunner06 to use the /data/project for uploads

July 26

  • 19:27 labs-logs-bottie: j: delete broken deployment-video01 instance
  • 18:42 labs-logs-bottie: j: new instance deployment-video02 with videoscaler with access to gluster instead of nfs
  • 16:15 hashar: applying nfs::apache::labs on -dbdump to get /usr/local/apache from /data/project
  • 16:12 hashar: -dbdump umounted deployment-nfs-memc:/mnt/export/apache on /usr/local/apache
  • 13:56 hashar: archiving -nfs-memc:/mnt/export : root@deployment-nfs-memc:/mnt# mv export /data/project/deployment-nfs-memc_mnt-export_backup
  • 13:46 hashar: seems to work fine with /data/project now.
  • 13:46 hashar: rsync finished for both apache and upload6. Remounting and restarting apaches
  • 13:35 hashar: rsync from nfs-memc:/mnt/export/upload6 to /data/project/upload6 completed. YEAHHH
  • 13:32 hashar: on -dbdump, unmounted /mnt/upload (from nfs-memc). Please use the /mnt/upload6 -> /data/project/upload6
  • 13:29 hashar: root@deployment-nfs-memc:/mnt/export# rsync -a --progress --delete --inplace /mnt/export/apache /data/project
  • 13:22 hashar: applying nfs::upload::labs to -dbdump so it uses /data/project/upload6 at /mnt/upload6
  • 13:15 hashar: manually umount /mnt/upload6 on apaches
  • 13:14 hashar: stopping apache backends
  • 13:10 hashar: Manually running puppet for 15545 which should fix bug 38084 uses /data/project instead of NFS instance
  • 09:35 hashar: Regenerating captcha with the new shared key fixed bug 38699
  • 09:26 hashar: Deleted all captchas in /mnt/upload6/private/captcha and regenerate them using 1000 and dirs=3
  • 09:24 hashar: forgot to set a directory level. python php-master/extensions/ConfirmEdit/captcha.py --wordlist=/usr/share/dict/words --font=/usr/share/fonts/truetype/ttf-dejavu/DejaVuSansMono.ttf --key=******* --output=/mnt/upload/private/captcha --count=1000 --dirs=3
  • 09:22 hashar: Set $wmgCaptchaSecret in the local file common/wmf-config/PrivateSettings.php and used that value in captcha.py
  • 09:21 hashar: On dbdump, regenerating captcha using: python php-master/extensions/ConfirmEdit/captcha.py --wordlist=/usr/share/dict/words --font=/usr/share/fonts/truetype/ttf-dejavu/DejaVuSansMono.ttf --key=********* --output=/mnt/upload/private/captcha --count=1000

July 25

  • 17:00 labs-logs-bottie: j: clear messages after updating localization(cache/l10n) to get new messages in TMH: php MWScript.php ../php/extensions/WikimediaMaintenance/clearMessageBlobs.php --wiki=aawiki
  • 14:43 hashar: rebooting bits02
  • 14:39 hashar: dist-upgrade on cache-bits02, will reboot after that (bits.beta.wmflabs.org will be disabled while it reboot)
  • 14:24 hashar: fixed puppet on cache-bits02 : ln -s /var/lib/git/operations/puppet/modules /etc/puppet/modules . That was an empty directory, thus prevented puppet to find the modules and made it breaking when trying to install ntp::client
  • 14:19 hashar: root@deployment-cache-bits02:/var/lib/git/operations/puppet(git:13304/10)# git fetch anonymous refs/changes/04/13304/12 && git checkout -b 13304/12 FETCH_HEAD
  • 13:48 hashar: rebooting apache33 so it can fsck /dev/vdb
  • 00:02 labs-logs-bottie: j: run mwscript rebuildLocalisationCache.php --wiki=commonswiki --threads=2

July 24

  • 21:58 labs-logs-bottie: j: run mwscript rebuildLocalisationCache.php --wiki=aawiki --threads=2
  • 18:25 hashar: Instances send their syslog again! To deployment-dbdump for now 14090
  • 15:42 hashar: on deployment-integration, applied 15545 patchset 7 to test out the symlinks from /data/project/upload6 to /mnt/upload6 .
  • 15:41 hashar: on deployment-integration, applied 15545 patchset 7 to te
  • 15:32 hashar: rerunning rsync with --delete : root@deployment-nfs-memc:/mnt/export# rsync -a --progress --delete --inplace /mnt/export/upload6 /data/project
  • 15:26 hashar: root@deployment-nfs-memc:/mnt/export# rsync -a --progress --inplace /mnt/export/upload6 /data/project
  • 15:00 hashar: banned another /22 at squid level.
  • 14:45 hashar: banned, at squid level, a crawler hosted on OVH. Just added the IP to squid.conf blacklist :)
  • 13:42 hashar: Ran dist-upgrade on deployment-dbdump and rebooting. Will break udp2log loggers.
  • 13:40 hashar: Rebooting all apaches
  • 13:35 hashar: Running "apt-get dist-upgrade" on apache{32,33} to fix PHP5 using ubuntu packages instead of wmf packages. Upgrade kernel.

July 23

  • 16:03 hashar: hopefully half fixed the udp2log on deployment-dbdump . Need several changes in the puppet files though cause the udp2log-mw init script seems to conflict with the udp2log one :/
  • 13:41 hashar: rebooting -dbdump to make sure everything works fine :D
  • 13:40 hashar: udp2log restored on beta!!! Still in /home/wikipedia/logs/ and logged by deployment-dbdump
  • 13:11 hashar: applying role::logging::mediawiki to -dbdump (will bring log2udp)
  • 09:13 hashar: updating MediaWiki extensions
  • 09:11 hashar: updated mediawiki/core: Updating ef3132f..f8de6a7
  • 09:10 hashar: updating core + extensions to their lastest master versions
  • 09:09 hashar: Updated mediawiki-config Updating 96ba09e..66ca8b0

July 18

  • 14:38 hashar: New / rebooted instances are no more accessible : bug 38473 - instances can not boot / reboot anymore
  • 13:46 hashar: deleting upload01 (screwed somehow)
  • 13:46 hashar: creating deployment-cache-upload03 to replace upload01
  • 13:44 hashar: deployment-cache-upload01 seems screwed : waiting for metadata service at http://169.254.169.254/2009-04-04/meta-data/instance-id . Failed DHCP acquisition ? => rebooting
  • 13:08 hashar: deployment-cache-upload01 : running apt-get upgrade / dist-upgrade and rebooting
  • 10:22 hashar: copying apache dir to /data/project . Run as root@deployment-nfs-memc in a screen session
  • 09:02 hashar: adding nfs::apache::labs and nfs::upload::labs to deployment-integration
  • 08:59 hashar: Applying 15545 to deployment-integration
  • 08:26 hashar: Created deployment-integration to be used as a puppetmaster::self host

July 17

  • 21:26 Platonides: installed python-imaging and wamerican on deployment-dbdump
  • 21:12 beta-logmsgbot: petrb: updating ArticleFeedbackv5 extension
  • 19:51 hashar: 369 languages rebuilt out of 369
  • 19:45 hashar: rebuilding l10n cache: mwscript rebuildLocalisationCache.php --wiki=aawiki --threads=2
  • 19:39 hashar: beta broken by PAGEID magic word introduced with 0a7cf03 / I11d42ca7 9858
  • 19:32 hashar: running git bisect of core 80fbb70..ef3132f
  • 19:26 hashar: upgrading MediaWiki core 80fbb70..ef3132f
  • 19:20 hashar: updated AFTv5: f97811f..d3bd97f
  • 19:04 hashar: updated robots.txt to specify a user-agent. Will definitely prevents Google from killing beta :)
  • 18:54 hashar: squid resumed. The swap files got corrupted somehow, needed to delete them entirely to start again. Squud storing again.
  • 18:40 hashar: -squid bah doing rm -fR /data/project/squid1/*
  • 18:39 hashar: installed `tree` on deployment-squid
  • 18:38 hashar: removing swap files in /data/project/squid1
  • 18:36 hashar: Squid is bugged as hell : 2012/07/17 18:36:13| Store rebuilding is -0.1% complete and looping
  • 18:21 beta-logmsgbot: hashar: rebooting squid glusterfs gone wild apparently
  • 14:45 hashar: Blacklisted user agents matchin /.*Googlebot.*/
  • 13:45 hashar: Manually restarted apaches
  • 13:44 hashar: Imported all.conf apache conf from production
  • 13:29 hashar: err: /Stage[main]/Mediawiki::Sync/Exec[mw-sync]: Failed to call refresh: Command exceeded timeout at /etc/puppet/manifests/mediawiki.pp:24
  • 13:28 hashar: All apaches are dead :/
  • 09:26 hashar: Adding class role::applicationserver::jobrunner
  • 09:20 hashar: sync upload6 dirs again. root@deployment-nfs-memc:$ rsync -a --progress --inplace /mnt/export/upload6 /data/project/upload6

July 16

  • 19:36 hashar: rebuilding localisation cache
  • 19:33 hashar: Updated ArticleFeedback and ArticleFeedbackv5 to latest master. Dropped their tables and ran the updater ( mwscript update.php --quick --wiki=enwiki ). Solves bug 38422 - trash and redo ArticleFeedbackv5 on beta enwiki. See http://en.wikipedia.beta.wmflabs.org/wiki/Special:ArticleFeedbackv5
  • 15:55 hashar: Updated ArticleFeedbackv5 778f089..ccbc585

July 11

  • 16:04 hashar: Created upload.beta.wmflabs.org to points to deployment-cache-upload01 ( 208.80.153.242 )
  • 15:46 hashar: just logging the rsync command: root@deployment-nfs-memc:/data/project/upload6# rsync -a --progress --inplace /mnt/export/upload6 /data/project/upload6
  • 15:41 hashar: started rsync of /mnt/export/upload6 to /data/project/upload6
  • 14:55 hashar: deployment-cache-bits02 now serves bits.beta.wmflabs.org using pending gerrit changes 15445 and 13304 (using varnish configuration from production)
  • 12:51 hashar: created deployment-cache-upload01 a Lucid instane to serve http://upload.beta.wmflabs.org/
  • 12:49 hashar: deleting cache-upload2 , need a Lucid instance.
  • 12:48 hashar: cache-upload2 set squid_coss_disks to vdb
  • 12:44 hashar: applying role::cache::upload to deployment-cache-upload02
  • 12:27 hashar: Creating deployment-cache-upload02 to replace deployment-cache-upload and serves http://upload.beta.wmflabs.org/
  • 12:21 hashar: Deleting deployment-cache-bits which was corrupted and replaced it by deployment-cache-bits02
  • 12:18 hashar: moved bits.beta.wmflabs.org from deployment-cache-bits to deployment-cache-bits02
  • 12:09 hashar: Enabling puppetmaster::self on deployment-cache-bits02 to get Varnish config (13304)
  • 09:55 hashar: Applying role::cache::bits::labs to deployment-cache-bits02
  • 09:42 hashar: Manually started mw-job-runner on jobrunner06
  • 09:41 hashar: pointed common/php from 'php-1.20wmf6' to 'php-master'
  • 09:25 hashar: updated mediawiki-config on dbdump to latest version
  • 08:55 hashar: add applicationserver::homeless to jobrunner06
  • 08:16 hashar: Squid updated to uses apache 32 and 33. Deleted Apaches 30 & 31
  • 08:01 hashar: Despooling apaches 30 and 31, spooling apaches 32 and 33

July 10

  • 19:22 hashar: rebooting apache32 and apache33 (puppet run finished)
  • 19:02 hashar: running puppets -tv on apache32 and apache33. Should make them able to serve Apache traffic after reboot.
  • 17:32 hashar: creating apache32 and apache33 to replace the corrupted apache30 and apache31 instances

July 5

  • 12:15 hashar: Did some documentation work on Deployment/Overview
  • 11:20 hashar: added a bunch of spamers in /home/wikipedia/common/wmf-config/mwblocker.log which would block them

July 3

July 2

  • 20:50 hashar: restarting squid to purge whole cache (yeah I know that is lame)
  • 20:47 hashar: Removed Hydriz from deployment-prep. Messed up the whole dblist files :-D Contact me! ;)
  • 20:46 hashar: set back robots.txt to disallow /
  • 20:28 hashar: deployed a hack in mw config using https://gerrit.wikimedia.org/r/#/c/13932/ . That is a simply git fetch, pending review.
  • 19:47 hashar: hacking to get files on bits.beta
  • 10:19 hashar: Easy fix for the leap second bug: /etc/init.d/ntp stop; date `date +"%m%d%H%M%C%y.%S"`; /etc/init.d/ntp start
  • 10:06 Hydriz: ukwiki was giving errors regarding flaggedrevs's flaggedpages table not existing. Fixed it by running mwscript update.php ukwiki.
  • 07:50 hashar: Gave access to the cluster to Hydriz

July 1

  • 17:43 hashar: manually rebooted most servers
  • 10:00 hashar: rebooted apache31 due to leap second bug. Stopped mysql on apache30 which was using 100%CPU
  • 06:59 hashar: rebooting all boxes

June 29

  • 15:07 hashar: Removing thumbnails that not have been access for the last 15 days : sudo find . -atime +15 -wholename '*/thumb/*' -exec rm {} \;
  • 15:02 hashar: deleted .nfs** files in /mnt/export/upload6/
  • 08:58 hashar: restarted jobrunner service (had a wrong path pointing to common-backup)

June 28

  • 16:07 hashar: apache conf uses site.conf :-(((( need to puppetize that one day
  • 15:47 hashar: updating mediawiki-config to latest master
  • 14:10 hashar: running puppet on apache{30,31}. The /etc/sudoers conflict has been merged in :)
  • 11:03 hashar: migrated all wiki from php-trunk to php-master by editing wikiversions.dat. Refreshed wikiversions.cdb and renamed ExtensionMessages-trunk.php to ExtensionMessages-master.php
  • 10:59 hashar: set group write on deployment-nfs-memc:/mnt/export/apache/common-local would let us rewrite the wikiversions.cdb file

June 27

  • 10:40 hashar: made deployment-mc to use 'memcached' puppet class. Now uses 2000MB apparently
  • 10:25 hashar: removed memcached from deployment-nfs-memc , it is running on deployment-mc nowadays.
  • 10:21 hashar: rebooting deployment-mc for kernel upgrade
  • 10:07 hashar: updating packages on deployment-cache-bits
  • 09:42 hashar: deleted deployment-thumbproxy instance. We are not going to replicate the production thumbnailing architecture
  • 09:16 hashar: -transcoding : dpkg --purge linux-image-2.6.32-37-virtual linux-image-2.6.32-318-ec2 linux-image-2.6.32-34-virtua
  • 09:14 hashar: upgrading deployment-transcoding

June 26

  • 20:16 hashar: deleted deployment-syslog instance. It is of no use till we have a way to setup syslog server on labs bug 36748 (syslog-ng conflict with rsyslog from base::??? puppet class)
  • 20:13 hashar: Removed misc::mediawiki-logger from deployment-feed. Was replaced by some new udp2log system I can't understand. So for now, -feed is locally hacked and does not rely on puppet anymore.
  • 19:50 hashar: deployment-feed removed wireshark then ran 'apt-get auto remove' , various X11 packages got removed. Now up to 262MB free.
  • 19:46 hashar: deployment-feed removed some old kernels apt-get remove --purge linux-image-2.6.32-318-ec2 linux-image-2.6.32-342-ec2 linux-image-2.6.32-38-virtual linux-image-2.6.32-34-virtual
  • 19:39 hashar: deployment-feed is now out of disk space :-(
  • 19:17 hashar: Removed role::cache::bits from deployment-cache-bits. Only work in production.
  • 18:32 hashar: Uninstalled the pecl PHP parsekit extension, manually installed php5-parsekit package instead bug 37076
  • 11:06 hashar: Files migrated. A copy of the old common is in /usr/local/apache/common-back
  • 10:12 hashar: migrating beta to use operations/mediawiki-config
  • 08:29 hashar: restarted several time the job runner on jobrunner05. It eventually started working again :-(
  • 07:50 hashar: restarted udp2log
  • 07:48 hashar: killing python demux on deployment-feed

June 25

  • 15:14 hashar: Deleted InitialiseSettingsDeploy.php (no longer used). Replaced by InitialiseSettings-wmflabs.php
  • 15:00 hashar: updating Ubuntu on deployment-transcoding

June 23

  • 13:02 labs-logs-bottie: petrb: deploying 37852

June 22

  • 13:18 labs-logs-bottie: petrb: updated /usr/local/apache/common/wmf-config/InitialiseSettingsDeploy.php to match the feed I just made :)
  • 11:17 hashar: Created /etc/wikimedia-realm file containing 'labs' on -dbdump, -apache30, -apache31 and -jobrunner05. Related puppet change is https://gerrit.wikimedia.org/r/#/c/12377/
  • 10:07 hashar: bug 37116 removed deployment-nfs-memc cronjob /var/fs which did some nasty recursing file changes. Has been disabled since May 25th anyway.
  • 09:08 hashar: Deleting hostname mobile.beta.wmflabs.org and releasing 208.80.153.244

June 21

  • 15:36 hashar: Closed bug 37217 - thumbnail extraction for videos needs newer ffmpeg
  • 15:36 hashar: Closed bug 37500 - migrates Apaches boxes to precise
  • 15:02 hashar: updating MediaWiki to 80fbb70 (latest master)

June 20

  • 13:46 hashar: apache-31 : readding applicationserver::labs and imagescaler::labs
  • 13:40 hashar: upgrading packages on -squid
  • 13:38 hashar: updating package on -dbdump

June 18

  • 15:37 hashar: running apt-get upgrade on apache30 and apache31

June 16

  • 19:27 beta-logmsgbot: hashar: updating WikimediaMaintenance to get commits 1887339 913bcb8

June 14

  • 15:46 hashar: redeleting deployment-apache20

June 13

  • 16:42 hashar: squid: depooling apache 20 - 24, pooling apache 30 & 31
  • 16:37 hashar: Disabled CheckUser extension again

June 12

  • 21:11 hashar Rebooting apache30 and 31 so they apply pending package updates. Off for today.
  • 21:07 hashar Configuring apache30 and 31 to use applicationserver::labs and imagescaler::labs

June 4

  • 22:04 hashar: Made myself a steward using a database query on `labswiki` : insert into user_groups VALUES (183,'steward');
  • 21:57 hashar: Reenabled the CheckUser extension on beta labs so we can actually use the checkuser audit function ;-)

June 3

  • 13:20 labs-logs-bottie: petrb: installing git on a bastion

June 2

  • 15:36 hashar: We ran out of beer, see bug 37307
  • 13:21 labs-logs-bottie: petrb: disabling checkuser per request from Ryan
  • 10:20 labs-logs-bottie: root: setting up bastion
  • 09:16 hashar: Rebooting -dbdump it could not mount some NFS export and waiting for user input.

June 1

  • 16:37 hashar: Created deployment-deb instance to build packages :D

May 31

  • 08:18 hashar: Squid answering again :-D
  • 08:00 hashar: Rebooting -squid using nova web interface
  • 07:53 beta-logmsgbot: hashar: restart failed, puppet dead, various squid related process in zombie mode --> rebooting deployment-squid
  • 07:47 beta-logmsgbot: hashar: restarting squid, seems stalled
  • 07:43 beta-logmsgbot: hashar: restarted udp2log daemon on -feed (15 to 20 python <defunct> processes there

May 30

  • 19:26 hashar: jobrunner05 is happy again. Hurrah
  • 19:23 hashar: updating mediawiki/core to master 58f390e to finish job loop fix
  • 19:23 hashar: updatiing mediawiki/core to master
  • 19:09 hashar: jobrunner05 CPU usage is due to some job infinite loop. Working on it.
  • 18:35 hashar: Sara made ganglia available on Ubuntu Precise and hence jobrunner05 show up http://ganglia.wmflabs.org/latest/?c=deployment-prep&h=deployment-jobrunner05&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2
  • 14:54 hashar: Updating mediawiki/core to 9780085 (aka just https://gerrit.wikimedia.org/r/#/c/9397/ which fix a wrong class name in job system)
  • 14:39 hashar: Migrating apaches from imagescaler class to imagescaler::labs
  • 14:30 beta-logmsgbot: hashar: migrating apache boxes from applicationserver::homeless to the new applicationserver::labs
  • 09:17 hashar: Restarted update.php in a screen session
  • 09:15 beta-logmsgbot: hashar: foreachwiki update.php --quiet --quick
  • 08:54 beta-logmsgbot: hashar: updating extensions
  • 08:53 beta-logmsgbot: hashar: HEAD is now at 8c65834 Add new message 'brackets' and use it to kill some hardcoded []s.
  • 08:49 beta-logmsgbot: hashar: updating core to 8c65834
  • 08:43 hashar: bug 37199 going to upgrade core / extensions to latest master

May 29

  • 21:17 hashar: Fixed Amazon Elastic Cloud ban. Properly fixing bug 37173 hopefully
  • 20:41 hashar: rebooting jobrunner05 following some package installs made earlier by puppet
  • 20:39 hashar: manually running puppet on jobrunner-05

May 26

  • 18:14 beta-logmsgbot: hashar: killed webtranscode job on commons
  • 05:59 beta-logmsgbot: hashar: Edited squid.conf to limit memory to 1G and restarted squid
  • 05:42 beta-logmsgbot: hashar: squid was killed by linux OOM!

May 25

  • 15:18 beta-logmsgbot: hashar: deleted jobrunner06 (precise), we just need one precise instance which will be jobrunner05 for now
  • 14:45 beta-logmsgbot: hashar: installing applicationserver::homeless and applicationserver::jobrunner on jobrunner06
  • 14:42 beta-logmsgbot: hashar: installing applicationserver::homeless and applicationserver::jobrunner on jobrunner05
  • 14:27 beta-logmsgbot: hashar: installed jobrunner05 and 06 using Ubuntu precise. Should let get a 0.27 ffmpeg installation for bug 37043
  • 08:38 beta-logmsgbot: root: on dbdump, deleted /etc/logrotate.d/mw-udp2log . Most probably in conflict with the one from deployment-feed which host the udp2log process
  • 08:35 beta-logmsgbot: root: gzipped /home/wikipedia/logs/archive/*20120525 see bug 37012 :-(
  • 08:23 beta-logmsgbot: hashar: killed stuck jobs on jobrunner 02 and 03. Restarted loop.

May 24

  • 11:52 beta-logmsgbot: hashar: Rewrote log command to use dologmsg and the new beta-logmsgbot
  • 11:51 beta-logmsgbot: hashar: yeah I do log
  • 11:47 hashar: Moving /bin/log to /usr/local/bin/log
  • 11:45 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit` on deployment-dbdump
  • 09:43 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit` on deployment-dbdump
  • 09:42 labs-logs-bottie: hashar: installing php5-dev and running `pecl install parsekit`
  • 07:05 hashar: killed some stalled jobs on jobrunner02
  • 02:56 labs-logs-bottie: jeremyb: foo

May 23

  • 19:54 hashar: rebooting apache20 following installation of imagescaler puppet class
  • 19:49 hashar: rebooting apache23 following installation of imagescaler puppet class
  • 19:46 hashar: rebooting apache22 following installation of imagescaler puppet class
  • 19:15 hashar: rebooting apache21 following installation of imagescaler puppet class
  • 18:48 hashar: Adding puppet class 'imagescaler' on all deployment-apacheXX instances in an attempt to fix thumbnails
  • 18:36 labs-logs-bottie: hashar: relocalisation cache done 367/367 languages rebuilt
  • 18:28 labs-logs-bottie: hashar: running `mwscript rebuildLocalisationCache.php --wiki=aawiki` for bug 36806
  • 16:22 labs-logs-bottie: hashar: delete all 3 webVideoTranscode jobs from enwiki database
  • 16:10 hashar: rebooted jobrunner03 to check everything works fine there
  • 15:41 hashar: deleting jobrunner01, it is crashed beyond repair. Will create a new one named jobrunner03
  • 15:27 hashar: rebooting jobrunner01 to see how it goes
  • 14:56 hashar: stopped job runner on jobrunner01, amounted /mnt/upload6 and /mnt/
  • 14:37 hashar: running puppet on job runner to check change 8584 & 8585 worked

May 22

  • 13:49 hashar: Deleting jobrunner03 and 04, not going to need them afterall
  • 13:24 hashar: deleting refreshLinks2 jobs from enwiki database
  • 12:52 hashar: deleting deployment-jobrunner{3,4} installation failed I got permission denied. Will recreate them using same hostname
  • 11:55 hashar: create two more job runner instances
  • 10:09 hashar: Remove deployment-webs instance which was meant to emulate the HTTPS access. Hacky and low priority for now, we will need to setup a nginx proxy one day to properly replicate the production infrastructure.
  • 09:39 labs-logs-bottie: hashar: rebooting jobrunner02 just to be sure it is properly loaded up
  • 09:30 labs-logs-bottie: hashar: jobrunner logs are available in /home/wikipedia/logs/runJobs.log now
  • 09:25 hashar: Fixed udp2log not able to add new log files in /home/wikipedia/log , that dir need to be writable by udp2log user! See https://gerrit.wikimedia.org/r/8442 | https://bugzilla.wikimedia.org/37014
  • 08:54 hashar: purged all logs from /home/wikipedia/logs/archive/ just to be safe
  • 08:41 hashar: restarted upd2log on -feed (again)
  • 08:23 hashar: started job loop on deployment-job-runner02
  • 07:49 hashar: installing jobrunner2
  • 05:11 hashar: creating a second job runner instance deployment-jobrunner02 . Will apply puppet classes later on.
  • 03:47 labs-logs-bottie: hashar: (Bug 36870) deleting deployment-web{,3,4,5}

May 21

  • 21:11 hashar: deployment-nfs-memc : fix user right for upload6 : chown apache /mnt/export/upload6
  • 21:11 hashar: On deployment-nfs-memc : added apache (uid 48) entry in /etc/passwd
  • 21:11 hashar: Adding Faidon and Platonides to default CC list of "depoyment prep (beta)" component
  • 21:11 hashar: In Bugzilla, I have removed Petr Bena as a default assignee of bugs opened for "deployment-prep (beta)" component. Default is now "Nobody", Petr is on CC. That will makes bug triage a bit easier.
  • 08:58 hashar: rerebooting deployment-feed
  • 08:53 hashar: Looks like -feed is dead : EXT3-fs: INFO: recovery required on readonly filesystem.
  • 08:49 hashar: rebooting deployment-feed
  • 08:22 hashar: installing iotop on deployment-nfs-memc

May 20

  • 02:02 hashar: on -nfs-memc, running 'chown -R 48 /mnt/export/upload6' so file get owned by user apache on apaches and job runner boxes
  • 01:05 hashar: might have disabled IRC notification by setting wgRC2UDPAddress in InitialiseSettingsDeploy.php
  • 00:46 labs-logs-bottie: hashar: jobrunner01 seems to start catching up with jobs
  • 00:42 labs-logs-bottie: hashar: Created a dumb `aawiki` database

May 19

  • 22:33 Platonides: Password changed for Platonides
  • 22:33 Platonides: http://ee-prototype.wikipedia.beta.wmflabs.org/ fails with No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php.
  • 11:46 labs-logs-bottie: petrb: creating aa wiki

May 18

  • 17:39 labs-logs-bottie: j: add /apache symlin on deployment-transcoding
  • 16:52 hashar: added 'aawiki' to all.dblist and made a symbolic to it named wmflabs.dblist
  • 16:42 hashar: added fake 'aawiki' entry to wikiversions.data
  • 16:37 hashar: started mediawiki job runner on -jobrunner01
  • 16:27 hashar: Removed apache:: puppet class, uses the application:: ones instead
  • 15:24 hashar: Well MaxSem fixed mobile Frontend :-D
  • 15:22 hashar: rewinded MobileFrontend to before 9db8dc94b1b83999931fca3d0edf5e22ab1effb3 ( https://gerrit.wikimedia.org/r/#/c/7795/ )
  • 15:05 hashar: Running: foreachwiki update.php --quiet --quick
  • 15:00 labs-logs-bottie: hashar: rebooting jobrunner01
  • 14:55 hashar: updating all extensions
  • 14:48 hashar: /home/wikipedia/common/php-trunk now tracks mediawiki/core.git , branch master. So a simple 'git pull' will update it!
  • 14:47 hashar: updating MediaWiki
  • 14:31 hashar: adding puppet class applicationserver::jobrunner
  • 11:51 hashar: puppet running again on -syslog \o/
  • 10:55 hashar: ran apt-get clean on -syslog
  • 10:46 hashar: On -feed, ran apt-get clean
  • 10:42 labs-logs-bottie: hashar: update.php script ran on all wikis
  • 10:23 hashar: Seems like mwmultiversion is back in function again :-]
  • 10:23 hashar: Running 'foreachwiki update.php --quick'
  • 10:23 hashar: updated enwiki database using 'mwscript update.php enwiki --quick'

May 17

  • 10:29 labs-logs-bottie: hashar: afterall, made /mnt/export/upload6 to be world writable "sudo chmod -R 777 *"
  • 10:20 labs-logs-bottie: hashar: Fixed filerepo backend by using chown -R www-data:depops /mnt/export/upload6/wikibooks
  • 10:15 labs-logs-bottie: hashar: Fixed cluster which was giving blank page. Root cause was wmfUdp2logDest which must be <IP address:port> (aka not use a hostname
  • 09:43 labs-logs-bottie: hashar: removed Draft extension for now.

May 16

  • 16:46 hashar: cleaned up more of CommonSettings.php today. Moved some hacks to disable features as settings in InitialiseSettingsDeploy.php . See git log.
  • 04:55 hashar: bug 36871 - deleting bz-dev instance

May 15

  • 21:13 labs-logs-bottie: hashar: Managed to get wikiversions.cdb to be rebuild using /home/wikipedia/common/multiversion/refreshWikiversionsCDB
  • 20:59 labs-logs-bottie: hashar: Cloning 1.20wmf2 and 1.20wmf3 in independant repos just like in production
  • 20:58 labs-logs-bottie: hashar: opened several bugs, prepared for MWMultiversion
  • 20:36 labs-logs-bottie: hashar: Insatlled multiversion using svn checkout https://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/mwmultiversion/multiversion
  • 20:13 hashar: manually created /home/wikipedia/logs/archive from deployment-feed (pending https://gerrit.wikimedia.org/r/7746 )
  • 19:48 labs-logs-bottie: hashar: restarted udp2log on deployment-feed, lot of zombie python processes there
  • 12:22 hashar: replaced most occurrences of /mnt/upload to /mnt/upload6
  • 09:38 labs-logs-bottie: hashar: Applying apache::service to dbdump
  • 08:53 labs-logs-bottie: petrb: updating to head
  • 07:56 hashar: Deleted all of deployment-nfs-memc:/mnt/export/upload-back , it contained only thumbs
  • 07:51 hashar: cleaning out deployment-nfs-memc:/mnt/export/upload-back from thumb, lock dirs and related

May 14

  • 19:30 labs-logs-bottie: hashar: Fixed X-Forwarded-For IP not being recognized
  • 19:30 labs-logs-bottie: hashar: fo
  • 19:30 hashar: Fixed IP :-D

May 11

  • 19:13 hashar: Removed OnlineStatusBar extension. It is not in Gerrit / WMF
  • 19:02 hashar: Removing misc::mediawiki-logger from dbdump, it is on 'feed'
  • 18:35 hashar: restarting udp2-log on dbdump
  • 18:27 hashar: deleting symlinks in /home/wikipedia to /data/project : breaks logging�
  • 18:21 hashar: Replaced extensions with a fresh clone of mediawiki/extensions.git
  • 17:33 hashar: restarted squid several time to fix some minor typos in conf
  • 16:39 hashar: cloning mediawiki/extensions.git which has all extensions as submodules
  • 16:37 hashar: updated MediaWiki up to 05e656a (aka master)
  • 09:09 labs-logs-bottie: petrb: fixed nrpe on boxes where it was failing, we need to insert motd to puppet
  • 04:37 jeremyb: [~2 hrs ago] 02:43:33 < hashar_> !deployment-prep setting up "apache20" instance by using only puppet. We will see what happens :-D
  • 02:28 hashar_: deleted all remaining deployment-apache instances : : You don't have enough free space in /var/cache/apt/archives/. So we really want to use m1.large , not m1.tiny pretending to save disk space :-D
  • 01:28 hashar_: Created upload2.beta.wmflabs.org to be the entry point for the "new" thumbnailing infrastructure
  • 01:24 hashar_: moving upload.beta.wmflabs.org from the non working instances back to the main entry point

May 10

  • 21:46 hashar: Creating a syslog server instance. I have a VERY nasty conflict between misc::syslog-server and misc::mediawiki-logger which tries to install conflicting packages ( syslog-ng / rsyslog )
  • 20:36 Krinkle: fixing a few php notices and general logic problems in wmf-config
  • 20:24 hashar: running 'apt-get install --reinstall apache2.2-common' to attempt to fix /var/log/apache2 rights (root:arm)
  • 20:24 hashar: deployment-imagescaler01 apache does not log anymore :-(
  • 10:09 labs-logs-bottie: petrb: fixed teh missing NOT FOUND error page

May 9

  • 18:56 Ryan_Lane: added hostnames for associated IPs
  • 18:56 Ryan_Lane: allocated three more IPs for upload, bits, and mobile
  • 18:18 hashar: Made logs from wgCommandLine script to be redirected to /home/wikipedia/logs/cli.log instead of /home/wikipedia/logs/catchall.log

May 8

  • 01:49 Ryan_Lane: rebooting deployment-squid
  • 01:43 Ryan_Lane: restarting squid on deployment-squid

May 4

  • 10:48 mutante: added class nfs::home::wikipedia to puppet group list in "beta-labs"
  • 10:46 mutante: added myself to admin groups to add/change puppet groups
  • 10:36 hashar: Creating 5 new m1.large instances hosting apaches and named deployment-apacheXXX

May 3

  • 18:24 hashar: installed on dbdump misc::syslog-server
  • 15:53 labs-logs-bottie: hashar: adding misc::mediawiki-logger and misc::scripts classes to deployment-dbdump
  • 15:29 labs-logs-bottie: hashar: running puppet on apaches to have them send their syslog to deployment-dbdump (bug 36246)
  • 11:13 labs-logs-bottie: petrb: removing logrotate from all apaches it broke central log
  • 06:43 labs-logs-bottie: hashar: hashar: bug 36441, added ErrorDocument 404
  • 06:02 jeremyb: [deployment-prep, deployment-nfs-memc] ran `for u in catrope hashar jeremyb johnduhart krinkle mah petrb platonides werdna; do sudo usermod -a -G depops $u; done`; krinkle was unable to modify files in wmf-config and I thought i saw why he couldn't but couldn't see why I could. turned out the groups on nfs-memc were the important ones and I was there. synced the 2 boxes with eachother and added krinkle to the list. some other deployment-prep boxes have different depops groups. (one empty with a different gid than the rest. one is same gid but just has petrb)
  • 05:03 Krinkle: [deployment-dbdump] apt-get purged 'ack'; - On ubuntu ack is "ack-grep" which was already installed
  • 04:59 Krinkle: [deployment-dbdump] apt-get installed 'ack'
  • 04:48 jeremyb: [deployment-dbdump] (that was to address complaints about beta simplewiki appearing in #simple.wikipedia on irc.wikimedia.org)
  • 04:47 jeremyb: [deployment-dbdump] changed all refs to IPs of prod hosts nfs-home and ekrem to be deployment-feed instead. and commited that to the local repo. (again not pushed anywhere yet)
  • 04:44 jeremyb: [deployment-dbdump] did a checkpoint `git commit -a` on deploymentprep-conf (/usr/local/apache/common) (locally not pushed anywhere) because there were lots of changes on disk but not in the repo. but didn't add any new files to the repo. (so there's still stuff reported uncommited by `git status`)

May 2

  • 15:14 labs-logs-bottie: petrb: making syslog on apaches be /data/project/apaches_log
  • 11:52 labs-logs-bottie: petrb: changed sudo policies on web to test if puppet override it
  • 11:52 labs-logs-bottie: j: add transcoding settings to CommonSettings.php again
  • 11:46 labs-logs-bottie: petrb: purged stuff on transcoding and freed some 119468kb
  • 11:08 labs-logs-bottie: hashar: install dsh package on deployment-dbdump
  • 09:54 labs-logs-bottie: hashar: /usr/local/apache/conf is now an independant git repository

April 30

  • 13:31 labs-logs-bottie: petrb: rebooting web5
  • 13:27 labs-logs-bottie: petrb: web4 reboot for same reason
  • 13:26 labs-logs-bottie: petrb: same for web
  • 13:24 labs-logs-bottie: petrb: rebooting web3 broken /data/project/
  • 11:57 labs-logs-bottie: petrb: fixed transcoding
  • 11:28 j^: reboot deployment-transcoding(i-00000105)

April 25

  • 17:01 labs-logs-bottie: j: Changing uid and group of apache user from 48 to 33 to match www-data on web3,web4,web5
  • 16:44 Platonides: With the uid change to deployment-web, it is now writing into /data/project/errors.log
  • 16:42 hashar: Apaches no more log anything. This is because rsyslog sends logs to a blackhole :-D
  • 16:27 Platonides: Changing uid and group of apache user from 48 to 33 to match www-data
  • 15:20 hashar: deleting testswarmmysqlconf , it is of no use :-/
  • 14:43 hashar: Creating temporary instance to test a MySQL puppet snippet
  • 13:01 labs-logs-bottie: hashar: I am a hero
  • 13:00 labs-logs-bottie: petrb: test
  • 12:24 hashar: added a basic puppet skeleton in manifests/labs/beta/ with https://gerrit.wikimedia.org/r/5790 (test branch)
  • 11:39 hashar: manually purged debian package `ack`, installed `ack-grep`
  • 11:22 hashar: puppet finished migration of web{3,4,5} to apache::service
  • 11:19 hashar: changed squid visible name to squid001.beta.wmflabs.org
  • 11:02 hashar: migrate web{3,4,5} from webserver::php5 to apaches::service
  • 10:47 hashar: Cleaning out squid peers list
  • 10:11 hashar: made /etc/ a git repository on deployment-squid and committed existing /etc/squid/
  • 10:03 hashar: adding generic::packages::git-core on deployment-squid so we can track

April 24

  • 23:21 labs-logs-bottie: petrb: disabled $wmgEnableCaptcha
  • 18:54 hashar: wrap -dbdump motd to 80 chars
  • 15:42 hashar: deployement-web host does work!  :-]
  • 15:41 hashar: made deployement-web host a wikimedia-task-appserver , add to create some apache2 configuration placeholder. Apache2 does launch but it is not working though (timeout)
  • 15:40 petan|wk: I told hashar to log stuff, if he won't, slap him
  • 13:18 hashar: added apache::service on deployment-web host
  • 12:18 labs-logs-bottie: petrb: moved the log file storage to gluster
  • 12:18 labs-logs-bottie: petrb: updated git and commited all changes

March 21

  • 03:22 mutante: mysqld on deployment-sql is stopped - did not start it though after i heard petan is working on corrupted db's
  • 03:06 mutante: added myself as a member just to see the instance names and check for the sql server...

March 20

  • 16:16 labs-logs-bottie: petrb: it seems that corruption of db is worse than I expected, need to restore backup old few months
  • 16:12 labs-logs-bottie: petrb: mysql is back up
  • 15:41 labs-logs-bottie: petrb: getting sql server down I found a bunch of corrupted db's, rollback is necessary
  • 15:41 labs-logs-bottie: j: install php-pear on deployment-web3/4/5 required by TMH

March 19

  • 08:18 labs-logs-bottie: root: restoring sql tables from backup

March 15

  • 20:17 labs-logs-bottie: root: restored ok
  • 20:14 labs-logs-bottie: root: restoring database from backup
  • 10:16 labs-logs-bottie: petrb: failed auth on db server reboot was required
  • 09:53 labs-logs-bottie: petrb: scheduling auto replication of sql server

March 14

  • 14:37 labs-logs-bottie: petrb: switching en.wikipedia to older previous v

March 11

  • 22:14 Damianz: Increased nofile on deployment-squid and added max_filedesc option to squid config. Also installed squidclient.
  • 04:36 Ryan_Lane: also deployment-web3
  • 04:35 Ryan_Lane: also deployment-web
  • 04:34 Ryan_Lane: make that deployment-web5
  • 04:34 Ryan_Lane: rebooting deployment-web, it OOM'd

March 9

  • 14:12 labs-logs-bottie: petrb: rebooting -nfs
  • 13:52 labs-logs-bottie: root: updated apt on webs1
  • 13:34 j^: add ppa:j/timedmediahandler and install ffmpeg on web3 and web5

March 6

  • 22:21 labs-logs-bottie: petrb: some instances will need to reboot, however site seems to be ok now
  • 22:09 labs-sexy-bottie: petrb: updating svn
  • 22:07 labs-sexy-bottie: petrb: fixed squid a bit
  • 15:23 labs-sexy-bottie: petrb: test
  • 14:53 labs-sexy-bottie: root: disabling bot for a while
  • 03:11 Andrew: facepalm: apparently all reboots are failing, so this will be down until Ryan brings it all back up tomorrow
  • 02:57 Andrew: rebooting a few hosts, there is something seriously wrong with fetching resources at the moment

March 5

  • 15:48 labs-sexy-bottie: petrb: temporary disabled ssl server
  • 15:48 labs-sexy-bottie: petrb: reconfigured squid
  • 15:36 labs-sexy-bottie: petrb: restarted servers
  • 15:24 labs-sexy-bottie: petrb: temporary changed code of localsettings to debug site
  • 15:20 labs-sexy-bottie: petrb: fixed broken memc :o
  • 15:04 labs-sexy-bottie: petrb: inserted new wiki to sul
  • 15:01 labs-sexy-bottie: petrb: please ignore some of the previous lines in log we were just testing bot
  • 14:58 labs-sexy-bottie: petrb: updated live
  • 14:58 labs-sexy-bottie: petrb: meh
  • 14:36 labs-sexy-bottie: petrb: created a new log system, just type log message to log your change on prep
  • 14:35 labs-sexy-bottie: petrb: this is test :o

March 4

  • 12:00 Andrew: Finished deployment of het deploy, added a new ee.

March 1

  • 16:35 Platonides: Installed dpkg-dev on deployment-dbdump
  • 16:03 Platonides: Installed joe on deployment-dbdump
  • 16:03 petan|wk: platonides needs to check the project name

February 27

  • 08:20 petan: creating 2 more web servers to handle load
  • 08:19 petan: rebooting both web servers, starting with web1

February 23

  • 08:52 petan|wk: fixing the squid

February 22

  • 01:03 Ryan_Lane: reconfiguring the web server instances to remove puppet classes that no longer exist

February 17

  • 02:34 Andrew: Moving /usr/local/apache/common/live to /usr/local/apache/common/live-hom and symlinking live to live-hom
  • 01:56 Andrew: running afl_rev_id patch on all wikis
  • 01:52 Andrew: installing ack (source code search tool) on dbdump
  • 00:59 petan: if anything is broken, it was me
  • 00:51 petan: I broke it!
  • 00:27 petan: switched to HEAD

February 16

  • 10:11 j^: add video/webm to /etc/mime.types on web/webs1/web2

February 13

  • 10:08 petan|wk: removing the puppetized memcached
  • 08:54 petan|wk: removing some extensions from config which are missing in latest branch

January 30

  • 18:46 petan: configuring some boxes for cluster to handle high load

January 29

  • 01:13 hexmode: oom reboot -web

January 27

  • 13:30 j^: install upstart script /etc/init/timedmediahandler.conf on deployment-transcoding and start service
  • 13:05 j^: touch /etc/wikimedia-image-scaler on deployment-transcoding; transcoding needs more wgMaxShellMemory too
  • 12:47 Platonides: updated /usr/local/apache/common/live/extensions/TimedMediaHandler to r110117 per j^request
  • 10:11 j^: add-apt-repository ppa:j/timedmediahandler and update ffmpeg on deployment-web to support frame extraction from WebM videos
  • 06:35 j^: update ffmpeg on deployment-transcoding (new security release from ppa)

January 26

  • 00:00 petan: configured new firewall rule irc

January 25

  • 23:52 petan: linked /usr/local/apache/common-local to /usr/local/apache/common
  • 23:06 petan: updating svn
  • 22:28 petan: reverted unlogged changes made to config which broke whole site
  • 10:03 j^: installed ffmpeg on deployment-web (required by TMH to extract stills)

January 24

  • 20:52 petan: created db user oren and new database for temporary wiki
  • 13:53 petan|wk: reconfigured new instance and fixed some issues on puppet, no logs in sal regarding it
  • 00:51 hexmode: svn up * updatedata

January 23

  • 19:31 hexmode: restart memcache on nfs-memc
  • 19:06 hexmode: aptitude update deployment-web

January 22

  • 09:16 petan: configured nfs to listen for backup server
  • 01:01 petan: configured firewall for backup instance
  • 00:45 petan: creating a backup instance in -prepbackup project for online backup of mysql from deployment project + fs backup
  • 00:30 petan: updating /live to head
  • 00:21 petan: installed timedmediahandler (trunk) to commons

January 16

  • 23:41 hexmode: to solve the trusted XFF problem, I installed tinycdb and created an 0 length file in the right place
  • 20:35 Ryan_Lane: released unused IP address from project

January 15

  • 15:45 petan: ran live/extensions/TrustedXFF/generate.php
  • 11:20 petan: updated to latest head all wikis

January 14

  • 18:49 petan: enabled global blocking
  • 18:21 johnduhart: Removed myself from the project.
  • 14:32 petan: separated common to own deployment file
  • 14:30 hexmode: enabled webfonts for mywiki properly in IntialiseSettingsDeploy.php
  • 14:25 johnduhart: Updated wmf-config/InitialiseSettings.php from production
  • 14:25 johnduhart: Reverted change to wmf-config/InitialiseSettings.php
  • 14:24 hexmode: enabled webfonts for mywiki

January 13

  • 21:34 petan: assigning new dns
  • 21:33 petan: moved deployment to beta.wmf...
  • 13:07 petan|w: installed jdk on search

January 12

  • 18:49 petan: installed all requested sw on search
  • 18:46 petan: mounted conf files
  • 18:38 petan: installed updates on new instances and rebooting it
  • 06:48 Ryan_Lane: added nfs mounts to the fstab for deployment-web
  • 06:47 Ryan_Lane: remounted /mnt/upload on deployment-web as nfs rather than nfs4
  • 06:47 Ryan_Lane: modified export options on deployment-nfs-memc; removed nfs4 specific options, and removed other options not necessary for our environment.
  • 00:38 johnduhart: Unmounted /mnt/export from /tmp on -web

January 11

January 10

  • 23:06 petan: checks done
  • 23:04 petan: disabled sql for fs checks
  • 22:30 petan: created nfs:/mnt/export/backup use it for all files which aren't versioned
  • 21:44 petan: deleting big db's expect db lags ^^
  • 21:35 petan: maintenance on simple
  • 21:17 petan: created squid box
  • 19:05 petan: tweaked memcached and flushed cache
  • 16:48 johnduhart: installing wikimedia-task-appserver on -web
  • 16:29 johnduhart: live and extensions recheckedout into a new folder
  • 16:04 johnduhart: Site broken, currently recreating live folder
  • 12:27 johnduhart: Running updatedata (very slowly)
  • 12:19 johnduhart: Fixed wmf-config permissions on -nfs-memc
  • 05:32 johnduhart: thumbnails now working
  • 05:30 johnduhart: Installed imagemagick on web
  • 05:11 johnduhart: Adding apache config for upload.deployment.wmflabs.org
  • 05:11 johnduhart: Made a quick stab at upload config
  • 05:00 johnduhart: Mounted that export onto -web
  • 05:00 johnduhart: Created nfs export /mnt/upload on deployment-nfs-memc
  • 04:37 johnduhart: unmounted deployment-nfs-memc:/mnt/export on /mnt from deployment-web
  • 03:32 johnduhart: Last update solves an issue where CentralAuth would make 70+ queries per page
  • 03:31 johnduhart: Updated databases
  • 03:26 johnduhart: svn up'd live
  • 03:04 johnduhart: interwiki now works

January 9

  • 21:29 johnduhart: Updated databases
  • 21:29 johnduhart: Created test.wikimedia
  • 21:25 petan: reconfigured global permissions
  • 20:38 petan: created hi wiki + de wiki
  • 20:00 petan: creating commons, de wiki, en_wiktionary etc. etc...
  • 18:57 petan: configured ip back to -web and removed temporary ip
  • 18:53 petan: disasociated 208.80.153.219
  • 18:50 petan: turned off -test definitely and reconfiguring ip
  • 18:47 petan: moved all stuff -web reconfiguring IP
  • 18:18 petan: moving configuration of apache to web
  • 16:40 petan|work: reconfigured apache on test
  • 16:15 johnduhart: svn up /usr/local/apache/common/live
  • 15:59 johnduhart: Ran update on metawiki
  • 15:58 johnduhart: Recreated metawiki enwiki enwikibooks
  • 15:57 johnduhart: Imported centralauth
  • 15:55 johnduhart: Dropped all wikis except simplewiki
  • 15:55 johnduhart:
  • 15:53 petan|work: restarted memcached
  • 15:24 johnduhart: Created enwiki and enwikibooks
  • 15:20 johnduhart: Restarted memcached
  • 15:17 johnduhart: Created simplewiki
  • 15:04 petan|work: restarted memcached
  • 15:02 johnduhart: Ran update.php on metawiki
  • 14:59 johnduhart: Creating metawiki
  • 14:55 petan|work: disabled current site
  • 14:55 johnduhart: Creating centralauth db
  • 14:52 johnduhart: DROPing new configuration tables, will recreate
  • 14:38 johnduhart: Forget the metawiki dump
  • 14:09 petan|work: created backup of broken db of meta and replaced it with auth db
  • 13:52 petan|work: test is done, restored test SUL to previous state
  • 13:43 petan|work: created backup of central auth and replace the testing SUL with current data, merged with current SUL so that we can use same logins on all sites
  • 13:19 petan|work: updated svn
  • 12:44 johnduhart: Ran update.php on metawiki
  • 12:32 johnduhart: Created enwikibooks http://en.wikibooks.deployment.wmflabs.org/wiki/Main_Page
  • 08:12 johnduhart: Starting import of metawiki
  • 07:37 johnduhart: This only affects my new configuration though
  • 07:36 johnduhart: WARNING: some how new users are showing up on some feed outside of labs, and was picked up on a monitoring robot. wtf.
  • 07:15 johnduhart: Importing simplewikibooks
  • 07:10 johnduhart: SiteMatrix is now working http://meta.wikimedia.deployment.wmflabs.org/wiki/Special:SiteMatrix
  • 07:09 johnduhart: Created simple wikibooks http://simple.wikibooks.deployment.wmflabs.org/
  • 07:09 johnduhart: Adding wikibooks configuration
  • 06:31 johnduhart: Installing git on deployment-test
  • 06:23 johnduhart: Created simplewiki http://simple.wikipedia.deployment.wmflabs.org/wiki/Main_Page
  • 05:50 johnduhart: Downloaded metawiki dump ont dbdump, extracting now. poor labs.
  • 05:36 johnduhart: Recreated hwiki, had a bad JohnTest account. Central auth now works fully
  • 05:15 johnduhart: Running update.php on hiwiki
  • 05:14 johnduhart: Importing prefstats table to hiwiki
  • 04:41 johnduhart: Centralauth is working :)
  • 04:28 johnduhart: Imported db schema to centralauth
  • 04:25 johnduhart: Created centralauth database centralauth
  • 04:18 johnduhart: Created metawiki http://meta.wikimedia.deployment.wmflabs.org/
  • 04:18 johnduhart: http://meta.wikimedia.deployment.wmflabs.org/
  • 04:09 johnduhart: Created meta docroot and added favicons to meta and wp
  • 04:08 johnduhart: Adding remnant.conf to httpd.conf
  • 04:07 johnduhart: Adding metawiki apache config to /usr/local/apache/conf/remnant.conf
  • 04:03 johnduhart: Lot of configuration done, much tuning to come. Missing.php now works http://nope.wikipedia.deployment.wmflabs.org/wiki/Main_Page
  • 01:24 johnduhart: Installed mysql-client on deployment-test

January 8

  • 23:45 johnduhart: Installed php5-cli on deployment-test
  • 23:21 johnduhart: Enabled rewrite rules on deployment-test
  • 20:39 petan: created new group for /mnt/www
  • 19:33 petan: clearing cache
  • 18:47 hexmode: set up relative paths for config files
  • 16:45 petan: moved memcached to deployment-nfs-memc and update global
  • 14:04 petan: created temporary memcached instance for data
  • 12:17 petan: restarted apache to fix some problem with config
  • 00:11 hexmode: set config for timeline
  • 00:11 hexmode: "apt-get install ploticus ttf-freefont" for timeline
  • 00:04 jeremyb: booting memcached fixed it
  • 00:02 jeremyb: booting memcache in case that fixes interwiki
  • 00:02 jeremyb: [auth] added bugzilla to interwiki.sql and ran it, doesn't seem to be working

January 7

  • 23:55 hexmode: Fix path for PoolCounter: PoolCounter.php -> PoolCounterClient.php
  • 23:52 hexmode: Fix path for OAI: OAI.php -> OAIRepo.php
  • 23:31 hexmode: set up git for /var/www/global
  • 23:08 petan: disabled LQT on other wikis
  • 23:07 petan: changed configuration of ajax
  • 23:07 petan: fixed InitialiseSetting
  • 22:42 petan: updated all wikis to latest head
  • 22:36 petan: disabled LQT because it's broken
  • 22:34 petan: ran update on auth db
  • 22:10 petan: removed puppetized memcached, because its configuration suck

January 5

  • 20:26 petan: importing MW ns full history to en_wikipedia
  • 16:48 petan|w: created instance for dumps, current instance is overloaded
  • 15:53 petan|w: import now running using mwimport

January 4

  • 20:27 petan: opened port 80 for wide net
  • 20:26 petan: registered deployment.wmflabs.org
  • 20:24 petan: allocated ip208.80.153.215
  • 20:22 mutante: raised floating IP quota to 1
  • 16:10 petan|work: create instances for apache and mysql
  • 16:08 petan|work: configured firewall for webserver
  • 16:06 mutante: added members MarkAHershberger & Petrb - added them to sysadmin and netadmin roles
  • 16:02 mutante: added new project deployment-prep for hexmode and petan