Obsolete:Thumbnail repository
Setting up a new thumbnail repo
First get the current thumbnail data over to the new server. Multiple rsyncs (the commons subdirectories should go in parallel a few at a time cause they are too large to do one at a time), zfs replication, other? If your machine is falling over and you really don't have time to do this, you can create the bare bones directory structure and speed through the rest of this document. However then you should be ready to deploy more image scalers in case they cannot handle the load of regenerating so many requested thumbnails in a very short period of time.
If you have the option to take part of the load off an unhappy server, you can put an empty directory structure on the new server, nfs mount it on apaches, image scalers, and the old server. Then on the old host move a few commons directories out of the way and replace them with symlinks to the new file structure. This may buy you enough time to take additional measures (e.g. removing the data you moved out of the way on the unhappy host, in case of zfs issues).
Writing thumbs
Required files
You need to make sure the following files are also available on the server, beside the thumbs themselves:
favicon.ico, index.html, jars, mime.php, pybaltestfile.txt, sync-from-home, and scripts (symlink to sync-from-home). Grab these from a preexisting thumb server, or .. from somewhere we are going to put them. Seriously. Local repository. Yesterday.
I can verify we need favicon, index.html, pybaltestfile, and scripts. I don't know about mime.php, jars.
Change the thumbnail configuration
You need to edit CommonSettings.php in /home/wikipedia/common/wmf-deployment/wmf-config. You should make a copy of this file, edit that, php -l it, then put it into place (at least that way test.wp isn't dead cause of typos). You will want to look at the stanzas that define $wgLocalFileRepo. The attribute thumbDir determines where files will be retrieved from. Remember that this is the directory of the thumbs as mounted everywhere else, not as you would find it on the local thumb server filesystem. Example:
'thumbDir' => str_replace( '/mnt/upload5', '/mnt/thumbs', "$wgUploadDirectory/thumb" ),
Test!
You can test on test.wikipedia.org since changes to CommonSettings.php are live immmediately to that db.
Make sure you can actually create thumbs now: find a thumb directory on your new server and try to request it. An example with the current setup: ms4:/export/thumbs/wikipedia/commons/thumb/0/00/opampnoninverting.png iswas empty, so you could choose any resolution. We'll choose 80px:
http://upload.wikimedia.org/wikipedia/commons/thumb/0/00/opampnoninverting.png/80px-opampnoninverting.png
Now check in your directory on the server to be sure it was really created over there (and not on the old host).
Update
Once that's good, you can push the file out to the world:
sync-common-file wmf-deployment/wmf-config/CommonSettings.php 'your fine message here'
Then check it in to svn! That's right, there's a local repo right there in /home/wikipedia/conf-svn/wmf-config/trunk ! (Thanks, brion!)
Reading thumbs
Web server setup
Our thumbnail server(s) run Solaris 10 so that they can do snapshots and replication. The web server on those hosts is Sun Java Webserver7. For instructions on setting up the web server, see ms4. (Instructions should be moved to separate page)
Check the web server to be sure you can retrieve some thumbnail (from zwinger, say). Example, if you were setting up ms4:
wget http://ms4/wikipedia/commons/thumb/4/42/Zemp.jpg/85px-Zemp.jpg
Remove test squid from pool
You need to remove it from the front end pool, otherwise the front end squid will send some portion of its requests to the back end squid on the same host for live requests, not just your test ones.
Currently both image and thumbnail links in wikitext are rewritten to have "http://upload.wikimedia.org" as the server. These requests get served by one of the LVS servers; check which one. On that host, look for the PyBal conf files in /etc/pybal. In particular there is a file called upload_squids; this contains a list of all squids used for the upload.wikimedia.org name, along with priorities and whether or not they are enabled.
You can target a specific one of these squids and take it out of the list. Add this attribute:
'enabled': True
after the server name. Example:
{ 'host': 'sq11.wikimedia.org', 'enabled': False }
You need to restart pybal; /etc/init.d/pybal restart
It may take up to 20 minutes for the incoming front end requests to go away. You can watch the progress from the lvs host by
ipvsadm -l | grep sq11.wikimedia
(or whatever your squid is).
After that, take the squid out of the backend pool as well:
Look at the file upload-settings.php in the directory /home/wikipedia/conf/squid; find your squid in the pmtpa array and comment it out. Now regenerate the files (php generate.php in the same directory), and deploy them everywhere (./deploy cache in the same directory).
Update the squid config files
Next, update the squid configuration files on zwinger and ship them out to your server. The files are in /home/wikipedia/conf/squid; why not make a copy of the directory and modify the files there? This prevents someone from coming along behind you and deploying your untested changes to all the squids and breaking the site. Also a tarball of the current copies is nice, in case you overwrite something and want to put it back quickly.
The files you want to change are upload-settings.php and squid.conf.php.
In squid.conf.php, define an acl that's descriptive. Something with the word thumbs in it and the hostname of the server you want to add might be nice. For example:
#thumbs are on ms4 now acl ms4_thumbs url_regex ^http://upload\.wikimedia\.org(/+)[^/][^/]*/[^/][^/]*/thumb/
At this writing there is such a stanza in place; feel free to add to it.
If you're splitting the repo based on directories, adjust your regexp accordingly.
Now, in uploadsettings.php, add your host to the apaches stanza. That stanza currently looks like this:
'apaches' => array( 'pmtpa' => array( 'ms1.wikimedia.org', '=ms4_thumbs'=> 'ms4.wikimedia.org', ), ),
Add your acl with an = sign in front, and the hostname, like the ms4 example here. The = in front means that the generating script will look in the squid.conf.php file to find the acl.
To generate the actual config files, run the script generate.php in the same directory. This will overwrite the files that were in there (remember when we said to make a copy?).
The results of the script will be in the directory generated.
You want the new file to go to sq11 (or whichever your test squid is), but because sq11 is now set up for testing, its file doesn't get regenerated; it was commented out of the pool. Instead. go to generated/squid.conf and do something like
cat sq3.wikimedia.org | sed -e 's/sq3.wi/sq1.wi/g;' > sq11.wikimedia.org.new
(check the diff of the old and new sq11 files to see that there's nothing too crazy), then move the new file in place of the old one. Now you can deploy it by:
./deploy sq11.wikimedia.org
which should deploy the backend file you just created, as well as the frontend file you didn't touch, and restart the frontend and backend squids, on sq11.
Test!
In our example, we'll assume we just added ms4 and we want to see if it's working. Go to your squid (we assume sq1), make yourself a little junk directory and cd into it. Now on ms4, find yourself some thumb not likely to be in the cache (but that exists; you're just testing reading, not writes). You can go to /export/thumbs/wikipedia/commons/thumb/0/00, look at the oldest 10 directories in there, and choose one of those files. In our example, Coupe.Saint.Sepulcre.2.png looks pretty ancient and it has 120px-Coupe.Saint.Sepulcre.2.png in it. So on your squid, do
env http_proxy=208.80.152.11:3128 wget -S -o blot http://upload.wikimedia.org/wikipedia/commons/thumb/0/00/Coupe.Saint.Sepulcre.2.png/120px-Coupe.Saint.Sepulcre.2.png
The IP should be sq1's IP addr. You have to specify the port number 3128 so that you talk to the back end squid and not the front end one. Otherwise the front end squid will ask some *other* back end squid for your file and you won't be testing your conf file changes.
First off, you should get the file back. Next, look at the headers in file "blot". You should see X-Cache: MISS from your server, not from any other hosts. You still don't know which host actually served you the file, and we don't keep access logs. So, next:
Remove the file "blot" and run the same command but leave off the filename (to cause an error), for example:
env http_proxy=208.80.152.11:3128 wget -S -o blot http://upload.wikimedia.org/wikipedia/commons/thumb/0/00/Coupe.Saint.Sepulcre.2.png/
The file "blot" should now contain a 403 Forbidden error. Check the last 100-200 lines of /var/log/http/errors on your new thumbnail server. Is your request in there? Congrats, you set stuff up correctly!
Put the squid changes everywhere
Now you can copy those changed files (upload-settings.php and squid.conf.php) to the directory /home/wikipedia/conf/squid and regenerate the conf files for all the squids:
php generate.php
After this you'll have to manually add your changes for your test squid (sq1) by the sed trick above, since the old unchanged file is what will be in the directory. Now you can deploy everywhere:
./deploy cache
Let things run for a while; no breakage? Now is when you would check your changes in... but we have no repo atm. Soon!
You can check whether everything is being served by the new host by going to the old host and using dtrace to look at the access log live. You can first verify that there are no 304 codes returned for thumbnails:
dtrace -qs ./access_log.d | grep thumb | grep ' 304 '
Then check for 200s:
dtrace -qs ./access_log.d | grep thumb | grep ' 200 '
If there are none, all your squids got the change and restarted ok.
Warning
Make sure that if there are any upload squids that are temporarily unavailable, you get the new files over to them *before* they go live.
Put the squid back in the pool, and done!
Back end:
Go back to the file /home/wikipedia/conf/squid/upload-settings.php and uncomment your squid. Now regenerate the files (php generate.php in the same directory), and deploy them everywhere (./deploy cache in the same directory).
Front end:
In the file /etc/pybal/upload-squids on your lvs host, change the attribute you added/changed for your squid:
'enabled': False
to
'enabled': True
after the server name. Example:
{ 'host': 'sq11.wikimedia.org', 'enabled': True }
You need to restart pybal; /etc/init.d/pybal restart
Notes
- The script thumbnail-handler.php on image servers is not in use; the 404.php script is the error handler now.
- If you run into zfs issues and need to free up space, you are going to have to toss snapshots. Why? Because... if you remove regular files... your removed files will remain in the snapshots. And no space will be freed up. If you toss all snapshots then you can start tossing regular data, should that be necessary.
- If you toss old snapshots and create new ones, then eventually stuff you remove early will really get removed, as the older snapshots which contain those files get tossed.
Emergency? Contact...
Ask ariel, tim, brion (in that order) if there are (serious) thumbnail serving issues. They may have some back story for you.