Jump to content

MediaWiki at WMF

From Wikitech

MediaWiki is the collaborative editing software that runs Wikipedia. This page documents its deployment at Wikimedia Foundation.

Infrastructure

A Wikipedia web request is processed in a series of steps outlined here (as of August 2022).

  • The DNS resolves hostnames like en.wikipedia.org ultimately points to an address like text-lb.*.wikimedia.org, for which the IP addresses are service IPs handled by LVS, which acts as a direct-routing load balancer to our caching proxies.
    » See also DNS, Global traffic routing, and LVS.
  • Wikimedia Foundation owns its content-delivery network. The public load balancers and caching proxies are located in all data centres (especially those with the sole role of being an edge cache, also known as "pop").
    » See also Data centers and PoPs.
  • The caching servers are implemented as a reverse proxy consisting of three layers: TLS termination, frontend caching, backend caching. Each cache server hosts all three of these layers.
    » See also Caching overview.
    • TLS termination and HTTP/2 handling, handled by HAProxy.
    • Frontend caching: This is an in-memory HTTP cache (uses Varnish, called "Varnish frontend", or varnish-fe). The LVS load balancers route the request to a random cache proxy server to maximise the amount of parallel traffic we can handle. Each frontend cache server likely holds the same set of responses in its cache, the logical capicity for the frontend cache is therefore equal to 1 server's RAM.
    • Backend caching: The backend HTTP caches are routed to by frontend caches in case of a cache miss. Contrary to the frontends, these are routed by a consistent hash, and they also persist their cache on disk (instead of in memory). The backend caches scale horizontally and have a logical capacity equal to the total of all servers. In case of a surge in traffic to a particular page, the frontends should each get a copy and distribute from there. Because of consistent hashing, the same backend cache is always consulted for the same URL. We use request coalescing to avoid multiple requests for the same URL hitting the same backend server. For the backend cache, we use Apache Traffic Server (ats-be).
  • After the cache proxies we arrive at the application servers (that is, if the request was not fulfilled by a cache). The application servers are load-balanced via LVS. Connections between backend caches and app servers are encrypted with TLS, which is terminated locally on the app server using a local Envoy instance, which, in turn, hands the request off to the local Apache. Prior to mid-2020, Nginx- was used for TLS termination. Apache there is in charge of handling redirects, rewrite rules, and determining the document root. It then uses php-fpm to invoke the MediaWiki software on the app servers. The application servers and all other backend services (such as Memcached and MariaDB) are located in "Core services" data centers, currently Eqiad and Codfw.
    » See also Application servers for more about how Apache, PHP7 and php-fpm are configured.

App servers

See Application servers for more about how Apache and php-fpm are configured.

The application servers are divided in the following groups:

Group Purpose Hiera Servergroup Conftool
Main app servers Default catch-all for HTTP requests to wiki domains for anything not served by another group.

This includes /w/index.php and /w/load.php. Notably excluded are X-Wikimedia-Debug, /w/api.php and /w/rest.php.

Hostnames: appservers-ro.discovery.wmnet, appservers-rw.discovery.wmnet

appserver appserver appserver
Web (Kubernetes) Default catch-all for HTTP requests to wiki domains, if sampled into MediaWiki On Kubernetes, and not served by another group.

Hostnames: mw-web-ro.discovery.wmnet, mw-web.discovery.wmnet

kube-mw-web
Debug servers Public HTTP to wiki domains, with X-Wikimedia-Debug. This is used by WikimediaDebug.

Hostnames: mwdebug####.{eqiad,codfw}.wmnet

appserver appserver testserver
Debug (Kubernetes) Public HTTP to wiki domains, with X-Wikimedia-Debug and select k8s-mwdebug as backend.

Hostnames: mwdebug.discovery.wmnet

kube-mw-debug
Misc (Kubernetes) Public HTTP request to noc.wikimedia.org.

Hostnames: mw-misc.discovery.wmnet

FIXME: ?
API app servers Public HTTP to wiki domains, with /w/api.php or /w/rest.php.

Hostnames: api-ro.discovery.wmnet, api-rw.discovery.wmnet

api_appserver api_appserver api_appserver
API external (Kubernetes) Public HTTP to wiki domains, with /w/api.php or /w/rest.php.

Hostnames: mw-api-ext-ro.discovery.wmnet, mw-api-ext.discovery.wmnet

kube-mw-api-ext
API internal (Kubernetes) Internal HTTP to wiki domains , with /w/api.php or /w/rest.php.

Hostnames: mw-api-int-ro.discovery.wmnet, mw-api-int.discovery.wmnet

kube-mw-api-int
Parsoid (Kubernetes) Internal HTTP from RESTBase to wiki domains, with /w/rest.php to call Parsoid.

Hostnames: mw-parsoid.discovery.wmnet

- parsoid -
Jobrunners (Kubernetes) Internal HTTP from ChangeProp-JobQueue to wiki domains, with /rpc/RunSingleJob.php.

Hostnames: mw-jobrunner.discovery.wmnet

- jobrunner -
Videoscalers Internal HTTP from ChangeProp-JobQueue to wiki domains, with /rpc/RunSingleJob.php.

Hostnames: videoscaler.discovery.wmnet

jobrunner jobrunner videoscaler
Maintenance server Internal php-cli processes that run mwscript to invoke a MediaWiki maintenance script. These are either from cron (systemd timers), or run on ad-hoc by MediaWiki deployers shelling to a maintenance server.

Hostnames: mwmaint####.{eqiad,codfw}.wmnet

misc other
Snapshot hosts Internal php-cli processes that run mwscript. These perform scheduled work to produce XML dumps.

Hostnames: snapshot####.{eqiad,codfw}.wmnet

dumps other

For web requests to Apache on bare metal appservers, the $_SERVER['SERVERGROUP'] environmente variable is automatically set based on the "Hiera cluster" value. For MediaWiki On Kubernetes, this is set in the operations/deployment-charts.git repo explicitly for each helm chart. See also MediaWiki On Kubernetes/How it works.

Logstash messages from MediaWiki carry a servergroup label that is set to $_SERVER['SERVERGROUP'].

Prometheus metrics (e.g. Grafana dashboards and Icinga alerts) carry a cluster field set to the "Hiera cluster" value as well.

MediaWiki configuration

For web requests not served by the cache, the request eventually arrives on an app server where Apache invokes PHP via php-fpm.

Document root

Example request: https://en.wikipedia.org/w/index.php

The document root for a wiki domain like "en.wikipedia.org" is /srv/mediawiki/docroot/wikipedia.org (source).

The /srv/mediawiki directory on apps servers comes from the operations/mediawiki-config.git repository, which is cloned on the Deployment server, and then rsync'ed to the app servers by Scap.

The docroot/wikipedia.org directory is mostly empty, except for w/, which is symlinked to a wiki-agnostic directory that looks like a MediaWiki install (in that it has files like "index.php", "api.php", and "load.php"), but actually contains small stubs that invoke "Multiversion".

Multiversion

Multiversion is a WMF-specific script (maintained in the operations/mediawiki-config repo) that inspects the hostname of the web request (e.g. "en.wikipedia.org"), and finds the appropiate MediaWiki installation for that hostname. The weekly Deployment train creates a fresh branch from the latest master of MediaWiki (including any extensions we deploy), and clones it to the deployment server in a directory named like /srv/mediawiki/php-–.

For example, if the English Wikipedia is running MediaWiki version 1.30.0-wmf.5, then "en.wikipedia.org/w/index.php" will effectively be mapped to /srv/mediawiki/php-1.30.0-wmf.5/index.php. For more about the "wikiversions" selector, see Heterogeneous deployment.

The train also creates a stub LocalSettings.php file in this php-… directory. This stub LocalSettings. file does nothing other than include wmf-config/CommonSettings.php (also in the operations/mediawiki-config repo).

The CommonSettings.php file is responsible for configuring MediaWiki, this includes database configuration (which DB server to connect to etc.), loading MW extensions and configuring them, and general site settings (name of the wiki, its logo, etc.).

After CommonSettings.php is done, MediaWiki handles the rest of the request and responds accordingly.

MediaWiki internals

To read more about how MediaWiki works in general, see:

  • Manual:Code on mediawiki.org, about entry points and the directory structure of MediaWiki.
  • Manual:Index.php on mediawiki.org, for what a typical MediaWiki entrypoint does.

Static files

There are broadly speaking two kinds of static assets served by Apache on MediaWiki application servers:

  1. /w/**/*
  2. /static/**/*

Application resources

  • Route: /w/**/*?1234567 or /w/**/*
    • Varnish: Strip cookies, fixed hostname.
    • Apache: Rewrite to /w/static.php (source).
  • Caching: public, 1 year, hostname-agnostic (Varnish object is shared across wiki domains).
  • Stats: Grafana: MediaWiki Static.

Versioned resources are the most common way we serve static files, and is generally how new code should use assets. These URLs are produced by MediaWiki's ResourceLoader or OutputPage component, and work by mapping the URL to a file on disk, hashing it, and appending that hash as a query string.

This offers the strongest performance (client-side immutable, and server-side wiki-agnostic shared cache), whilst also operating under relatively tight requirements (must know the exact version at the point where the file is requested). This is especially difficult in the face of ParserCache and our CDN, given that you'd generally want to present users with a consistent experience from page-to-page where an icon or other visual aspect does not alternate based on when the page was last modified. The way we generally make this work is by linking to assets through one level of indirection, e.g. through a stylesheet or JavaScript manifest (see also: mw:ResourceLoader/Architecture#Caching).

On the backend, the requests for versioned resources are rewritten to /w/static.php. This implements important behaviours:

  • If given a version hash, match the request with the right version of the file by checking the two currently active MW branches in production.
  • If given a version hash, and the requested version is not found, we disable caching (reduce to 1 minute for clients and CDN). This avoids non-recovering cache poisoning around deployments, which would otherwise be possible given that we do not atomically group end-users and CDN servers and backend servers. More background about this eventual-consistency can be found in the source, and in T47877.
  • Without a version hash, serve the current version as found in the latest MW branch, regardless of hostname. In this case, it is expected that it is not important for changes to propagate immediately. They will generally propagate slowly over a 24-hour period, with any individual client always having a consistent experience between pages until a specific point where the resource is renewed and then all pages have the new version.

Below are examples of use cases where we can't reasonably specify a version hash and thus request the current version without a hash parameter:

  • Gadgets and user scripts that augment core functionality and reuse some of our assets. For example, Wikipedia's Vector.css override references an SVG icon from MediaWiki. It isn't versioned as the editor would otherwise have to keep it in sync with our deployments.
  • Debug mode from ResourceLoader, where we intentionally serve internal JS and CSS files directly without minification at their "current" version. Cache performance is not a concern in debug mode.
  • A tail of random things in core and extensions that reference static files that aren't part of any UI code. Such as Special:Version linking the COPYING license file.
  • ULS web fonts (T135806). Upto 2021, files like this were sometimes served from "/static/current/**" which was deprecated in favour of simply "/w/**" in T302465.

WMF resources

These are custom assets, generally pointed to from settings in wmf-config.

The most prominent example are our project logos and favicons. We want to serve these from a stable URL that we can expose through APIs, to external organizations, be saved in databases, ParserCache, CDN, etc.. These URLs present a consistent experience to any given user, regardless of when the page they are on was last edited or purged. Changes to "static" resources should be rare as browsers are allowed to use their copy offline, without revalidation, for up to a year. This means that purging from the CDN does not mean users can be expected to get the latest copy.

The /static directory is external to MediaWiki and only used if and when explicitly configured so in wmf-config.

Timeouts

Request timeouts

Generally speaking, the app servers allow upto 60 seconds for most web requests (e.g. page views, HTTP GET), and for write actions we allow upto 200 seconds (e.g. edits, HTTP POST).

» See HTTP timeouts#App server for a detailed breakdown of the various timeouts on app servers.

Backend timeouts

MySQL/MariaDB
Setting MySQL's event_scheduler core events master core events replica
  • web requests user on running queries for read only replicas: 60s
  • web requests user for idle connections for read only replicas: 60s
  • web requests user read only replicas, on connection overload: 10s
  • web requests user for idle connections on read-write master: 300s
Type Wall clock time
Notes This was added as a measure to prevent pileups from a single event, as well as to overcome the (considered not ideal behavior) of terminated connections keeping running even if there won't be any socket open to report to. Implemented on MySQL's event scheduler for legacy reasons, but using max_execution_time or equivalent should be probably ideal.

Involved codebases

  • The "MediaWiki ecosystem" – MediaWiki core itself, 8 skins and 189 extensions , and production site configuration (and a static check-out of our composer dependencies). This set of code totals (as of October 2022) approximately 2M lines of Wikimedia-maintained PHP (alongside another 1.2M of JavaScript),[1] plus third party libraries like Symfony, jQuery, Guzzle, and Vue.
  • The "puppet" Wikimedia server orchestration and configuration This set of code totals (as of November 2022) approximately 200K lines of Puppet/ERB (plus 55K each of YAML and Python, 35K of Ruby, and 10K of Bash and Perl).[2]

Footnotes

  1. Calculated using cloc v1.94 in a fresh check-out of production 1.40.0-wmf.10 with:
    cloc --not-match-d 'vendor|node_modules|lib' --fullpath --skip-uniqueness .
    cloc --skip-uniqueness vendor/wikimedia vendor/oojs vendor/wmde vendor/diff
    cloc --skip-uniqueness resources/lib/oo* resources/lib/codex* resources/lib/CLDRPluralRuleParser resources/lib/wvui
    
  2. Calculated using cloc v1.94 in a check-out of the production branch on 2022-11-21 with:
    cloc .
    

Pages in the MediaWiki production category

See also