Obsolete:Wiki farm
The configuration that Wikimedia uses for MediaWiki is quite different to the one that is documented for the purposes of external use. I thought I'd give a brief explanation of how running 374 wikis on 7 top level domains is different to running one wiki, and how we overcome the technical challenges encountered.
The Matrix
There are currently four multi-subdomain "sites" operated by Wikimedia: Wikipedia, Wiktionary, Wikibooks and Wikiquote. Our setup is unusual in that instead of using a database prefix to indicate which site the wiki belongs to, we use a database suffix. This is for historical reasons. Following is a list of Wikimedia wikis. Wikis which actually exist are shown in bold.
Wikipedia w | Wiktionary wikt | Wikibooks b | Wikiquote q |
aa | aa | aa | aa |
ab | ab | ab | ab |
af | af | af | af |
ak | ak | ak | ak |
als | als | als | als |
am | am | am | am |
an | an | an | an |
ar | ar | ar | ar |
arc | arc | arc | arc |
as | as | as | as |
ast | ast | ast | ast |
av | av | av | av |
ay | ay | ay | ay |
az | az | az | az |
ba | ba | ba | ba |
be | be | be | be |
bg | bg | bg | bg |
bh | bh | bh | bh |
bi | bi | bi | bi |
bm | bm | bm | bm |
bn | bn | bn | bn |
bo | bo | bo | bo |
br | br | br | br |
bs | bs | bs | bs |
ca | ca | ca | ca |
ce | ce | ce | ce |
ch | ch | ch | ch |
cho | cho | cho | cho |
chr | chr | chr | chr |
chy | chy | chy | chy |
co | co | co | co |
cr | cr | cr | cr |
cs | cs | cs | cs |
csb | csb | csb | csb |
cv | cv | cv | cv |
cy | cy | cy | cy |
da | da | da | da |
de | de | de | de |
dv | dv | dv | dv |
dz | dz | dz | dz |
ee | ee | ee | ee |
el | el | el | el |
en | en | en | en |
eo | eo | eo | eo |
es | es | es | es |
et | et | et | et |
eu | eu | eu | eu |
fa | fa | fa | fa |
ff | ff | ff | ff |
fi | fi | fi | fi |
fj | fj | fj | fj |
fo | fo | fo | fo |
fr | fr | fr | fr |
fy | fy | fy | fy |
ga | ga | ga | ga |
gd | gd | gd | gd |
gl | gl | gl | gl |
gn | gn | gn | gn |
gu | gu | gu | gu |
gv | gv | gv | gv |
ha | ha | ha | ha |
haw | haw | haw | haw |
he | he | he | he |
hi | hi | hi | hi |
ho | ho | ho | ho |
hr | hr | hr | hr |
ht | ht | ht | ht |
hu | hu | hu | hu |
hy | hy | hy | hy |
hz | hz | hz | hz |
ia | ia | ia | ia |
id | id | id | id |
ie | ie | ie | ie |
ig | ig | ig | ig |
ii | ii | ii | ii |
ik | ik | ik | ik |
io | io | io | io |
is | is | is | is |
it | it | it | it |
iu | iu | iu | iu |
ja | ja | ja | ja |
jv | jv | jv | jv |
ka | ka | ka | ka |
kg | kg | kg | kg |
ki | ki | ki | ki |
kj | kj | kj | kj |
kk | kk | kk | kk |
kl | kl | kl | kl |
km | km | km | km |
kn | kn | kn | kn |
ko | ko | ko | ko |
kr | kr | kr | kr |
ks | ks | ks | ks |
ku | ku | ku | ku |
kv | kv | kv | kv |
kw | kw | kw | kw |
ky | ky | ky | ky |
la | la | la | la |
lb | lb | lb | lb |
lg | lg | lg | lg |
li | li | li | li |
ln | ln | ln | ln |
lo | lo | lo | lo |
lt | lt | lt | lt |
lv | lv | lv | lv |
mg | mg | mg | mg |
mh | mh | mh | mh |
mi | mi | mi | mi |
minnan | minnan | minnan | minnan |
minnan | minnan | minnan | minnan |
mk | mk | mk | mk |
ml | ml | ml | ml |
mn | mn | mn | mn |
mo | mo | mo | mo |
mr | mr | mr | mr |
ms | ms | ms | ms |
mt | mt | mt | mt |
mus | mus | mus | mus |
my | my | my | my |
na | na | na | na |
nah | nah | nah | nah |
nb | nb | nb | nb |
nds | nds | nds | nds |
ne | ne | ne | ne |
ng | ng | ng | ng |
nl | nl | nl | nl |
nn | nn | nn | nn |
no | no | no | no |
nv | nv | nv | nv |
ny | ny | ny | ny |
oc | oc | oc | oc |
om | om | om | om |
or | or | or | or |
pa | pa | pa | pa |
pi | pi | pi | pi |
pl | pl | pl | pl |
ps | ps | ps | ps |
pt | pt | pt | pt |
qu | qu | qu | qu |
rm | rm | rm | rm |
rn | rn | rn | rn |
ro | ro | ro | ro |
roa-rup | roa-rup | roa-rup | roa-rup |
ru | ru | ru | ru |
rw | rw | rw | rw |
sa | sa | sa | sa |
sc | sc | sc | sc |
sd | sd | sd | sd |
se | se | se | se |
sg | sg | sg | sg |
sh | sh | sh | sh |
sh | sh | sh | sh |
si | si | si | si |
simple | simple | simple | simple |
sk | sk | sk | sk |
sl | sl | sl | sl |
sm | sm | sm | sm |
sn | sn | sn | sn |
so | so | so | so |
sq | sq | sq | sq |
sr | sr | sr | sr |
ss | ss | ss | ss |
st | st | st | st |
su | su | su | su |
sv | sv | sv | sv |
sw | sw | sw | sw |
ta | ta | ta | ta |
te | te | te | te |
tg | tg | tg | tg |
th | th | th | th |
ti | ti | ti | ti |
tk | tk | tk | tk |
tl | tl | tl | tl |
tlh | tlh | tlh | tlh |
tlh | tlh | tlh | tlh |
tn | tn | tn | tn |
to | to | to | to |
tokipona | tokipona | tokipona | tokipona |
tpi | tpi | tpi | tpi |
tr | tr | tr | tr |
ts | ts | ts | ts |
tt | tt | tt | tt |
tw | tw | tw | tw |
ty | ty | ty | ty |
ug | ug | ug | ug |
uk | uk | uk | uk |
ur | ur | ur | ur |
uz | uz | uz | uz |
ve | ve | ve | ve |
vi | vi | vi | vi |
vo | vo | vo | vo |
wa | wa | wa | wa |
wo | wo | wo | wo |
xh | xh | xh | xh |
yi | yi | yi | yi |
yo | yo | yo | yo |
za | za | za | za |
zh | zh | zh | zh |
zh-cfr | zh-cfr | zh-cfr | zh-cfr |
zu | zu | zu | zu |
There are also a number of "special" wikis:
- sources (Wikisource)
- Wikinews
- meta
- sep11 (September 11 Memorial)
- wikimedia (experimental)
- mediawiki
There's also a few experimental wikis that have their own script directories and so don't need to be listed in all.dblist. They aren't backed up by the normal process, and won't be included in maintenance operations:
- test
- rel12test
- code.wikimedia.org
History
In the beginning, all wikis had database names ending in "wiki". For example, frwiki for the French Wikipedia, metawiki for Meta, textbookwiki for Wikibooks. This scheme was broken when, on popular demand, Brion added French and Polish Wiktionaries with the database names "frwiktionary" and "plwiktionary". These were the first language-specific subdomains outside Wikipedia. Unfortunately this didn't fit in too well with various maintenance scripts, which assumed that the database name could be obtained by concatenating the "language" (from /home/wikipedia/common/langlist) with "wiki". This was a rather loose definition of language, including things such as meta.
At this time, every wiki had its own directory in htdocs, containing a "skeleton" LocalSettings.php. This skeleton file set the $lang variable appropriately and then passed on processing to CommonSettings.php. Also, every wiki had a separate <VirtualHost *> entry in the Apache configuration, and a separate MySQL GRANT to wikiuser. This was difficult to maintain. On demand for more Wiktionaries, I decided to make some changes.
I decided to create companion Wiktionaries for all existing Wikipedias. I did this by moving to a shared document root layout. A single VirtualHost section was created with a ServerAlias of *.wiktionary.org. All wiktionaries had the same document root. In CommonSettings.php, the language was detected by retrieving the hostname from Apache. At the time I couldn't work out how to keep the same URLs for the upload directories, so I set them up with the /upload/en/0/0/Thing.png URL style, that is, including the language. I later realised that a rewrite rule could be used to rewrite traditional upload URLs to language-specific URLs. This involves a little trick with a RewriteCond that always matches. I also converted the MySQL permissions to use database wildcards, removing the need to add grants for every added wiki.
Auto-creation
This was all very well, but it became obvious that the sheer number of wikis was making maintenance difficult. Each of the 300 wikis had its own MediaWiki namespace with a copy of about 750 messages. Updating these messages took a long time. Other kinds of maintenance tasks were also tedious. There was a lot of demand from the users for a multi-subdomain layout in other projects. Adding languages was a tedious, error-prone, time-consuming process, which developers had to perform on a very regular basis. I decided that I needed to automate the process. At first I wrote a command-line script to add languages, but the script was complicated and needed developer involvement due to the unwieldy legacy layout of the Wikipedias. For a shared document-root layout, the only thing a script needs to do is to set up the database. Armed with my new upload rewriting trick, I decided to convert Wikipedia to a shared document root layout. Instead of creating 150 new wikis for Wikibooks and 150 for Wikiquote, I decided to make an on-demand system, with a script invoked by the user to create new wikis. This consists of the following components.
missing.php
- /home/wikipedia/common/php-new/missing.php
This script is invoked by CommonSettings.php if the detected hostname does not correspond to an existing wiki. "Existing wikis" are those listed in /home/wikipedia/common/all.dblist. This script displays some nice-looking HTML. If the subdomain is in $wgLanguageNames (from Names.php), it also displays a "create wiki" button. Clicking on this button adds a line to /home/wikipedia/logs/addwiki_requests. Since security restrictions do not allow the apache user to create tables, the requests are fulfilled by an hourly cron job running as tstarling. A commmand-line script is invoked called addwiki.php
addwiki.php
- /home/wikipedia/common/php-new/maintenance/addwiki.php
This script creates wikis based on requests filed in addwiki_requests. To prevent an attack by a script automatically requesting creation of all wikis, at most one request per hour is fulfilled. A particularly difficult part of writing this script (and indeed a difficult part of adding wikis before the script was written) is handling interwiki links. I gave up on trying to write a script to incrementally add links, and instead used rebuildInterwiki.inc.
rebuildInterwiki.inc
- /home/wikipedia/common/php-new/maintenance/rebuildInterwiki.inc
This script rebuilds all interwiki tables by looping through all.dblist. For each database, it truncates the interwiki table, and then reinserts all necessary entries in a multi-row insert statement. Actually it doesn't do anything, it just returns the SQL to do things. The SQL is executed by addwiki.php using dbsource(). There's about 4.6 MB of SQL altogether, and it takes a few minutes to run.
Special wikis
There are always special cases left over, and these come under the "special wiki" banner. Special wikis such as sep11 were absorbed into the *.wikipedia.org handling. Special wikis which are not subdomains of wikipedia.org were left at their original locations in the htdocs directory, each with their own document root. Skeleton LocalSettings.php files were done away with some time ago, instead CommonSettings.php constructs the database name by concatenating the document root with "wiki". So meta has a document root of /home/wikipedia/htdocs/meta, and hence is assigned a database name of metawiki. For the purposes of CommonSettings.php, such wikis are considered to be Wikipedias ($site is set to "wikipedia"). The hostname used by MediaWiki needs to be overridden explicitly so that self-referential URLs can be constructed.
MediaWiki configuration
Much to our chagrin, the communities of the individual wikis like to have their own individual settings, resisting our attempts to homogenise them all with great tenacity. CommonSettings.php used to be a mess of switch($lang) structures and special cases. I decided that we needed to move the settings from code to data. With so many wikis, things were steadily getting uglier.
This problem is resolved by the SiteConfiguration object. This object stores a two dimensional array, with the names of the settings as the first index, and the names of the wikis as the second index. The keys in the second index can be of three types, checked in this order:
- Database name
- Site name (wikipedia, wiktionary, etc.)
- "default"
The object provides a method to extract all defined settings into the global scope. That is, it sets global variables. If no database-specific setting exists for a given variable, it will check to see if there is a site default. If there is no site default, it will check for a global default. If there is no global default, it will not set the variable, and hence the value set in DefaultSettings.php will be used. At some stage I intend to add language-wide settings between #1 and #2.
The initial idea was to construct this SiteConfiguration object only occasionally, and to store it in serialised form in NFS. But I decided this would make changing settings difficult, so a new object is constructed every time. Caching is still possible in principle. The object provides for delayed variable expansion, so that strings such as "$lang" can be stored in the cache and then expanded on each invocation.
Backup
Backups occur on manual request, by running /home/wikipedia/bin/backup-all. It uses the site-specific database name lists, e.g. wiktionary.dblist and wikibooks.dblist, so that the HTML pages on backup.wikipedia.org are of a manageable size. The backup script dumps SQL, compresses it and makes MD5 checksums.
How-To
Some How-To related to the Wiki farm: