MediaWiki at WMF

From Wikitech
(Redirected from MediaWiki)
Jump to navigation Jump to search
Wikimedia infrastructure

[edit]

MediaWiki is the collaborative editing software that runs Wikipedia. This page documents its deployment at Wikimedia Foundation.

Infrastructure

Wikipedia request flow

A Wikipedia web request is processed in a series of steps outlined here (as of April 2020).

  • The DNS resolves hostnames like en.wikipedia.org ultimately points to an address like text-lb.*.wikimedia.org, for which the IP addresses are service IPs handled by LVS, which acts as a direct-routing load balancer to our caching proxies.
    » See also DNS, Global traffic routing, and LVS.
  • Wikimedia Foundation owns its content-delivery network. The public load balancers and caching proxies are located in all data centres (especially those with the sole role of being an edge cache, also known as "pop").
    » See also Clusters and PoPs.
  • The caching proxies are servers consisting of three layers: TLS termination, frontend caching, backend caching. Each cache proxy server hosts all three of these layers.
    » See also Caching overview.
    • TLS termination and HTTP/2 handling, handled by Apache Traffic Server (ATS) (internally called ats-tls). Prior to 2020, we used Nginx- here.
    • Frontend caching: This is an in-memory HTTP cache (uses Varnish, called "Varnish frontend", or varnish-fe). The LVS load balancers route the request to a random cache proxy server to maximise the amount of parallel traffic we can handle. Each frontend cache server likely holds the same set of responses in its cache, the logical capicity for the frontend cache is therefore equal to 1 server's RAM.
    • Backend caching: The backend HTTP caches are routed to by frontend caches in case of a cache miss. Contrary to the frontends, these are routed by a consistent hash, and they also persist their cache on disk (instead of in memory). The backend caches scale horizontally and have a logical capacity equal to the total of all servers. In case of a surge in traffic to a particular page, the frontends should each get a copy and distribute from there. Because of consistent hashing, the same backend cache is always consulted for the same URL. We use request coalescing to avoid multiple requests for the same URL hitting the same backend server. For the backend cache, we use a second layer of ATS (ats-be). Prior to 2020, WMF used a second layer of Varnish (varnish-be) for backend caching.
  • After the cache proxies we arrive at the application servers (that is, if the request was not fulfilled by a cache). The application servers are load-balanced via LVS. Connections between backend caches and app servers are encrypted with TLS, which is terminated locally on the app server using a local Envoy instance, which, in turn, hands the request off to the local Apache. Prior to mid-2020, Nginx- was used for TLS termination. Apache there is in charge of handling redirects, rewrite rules, and determining the document root. It then uses php-fpm to invoke the MediaWiki software on the app servers. The application servers and all other backend services (such as Memcached and MariaDB) are located in "Core services" data centers, currently Eqiad and Codfw.
    » See also Application servers for more about how Apache, PHP7 and php-fpm are configured.

App servers

See Application servers for more about how Apache and php-fpm are configured.

The application servers are divided in the following groups:

Description Conftool cluster Hiera cluster Purpose
Main app servers appserver appserver Public HTTP from ATS for wiki domains (except XWD, /w/api.php, or /api/rest_v1).
Debug servers testserver appserver Public HTTP from ATS for wiki domains with X-Wikimedia-Debug.
API app servers api_appserver api_appserver Public HTTP from ATS for wiki domains with /w/api.php.
Parsoid servers parsoid parsoid Internal HTTP to parsoid-php.discovery.wmnet. Used by RESTBase via /w/rest.php.
Jobrunners jobrunner jobrunner Internal HTTP to jobrunner.discovery.wmnet. Used by ChangeProp-JobQueue via /rpc or /w/rest.php.
Videoscalers videoscaler jobrunner Internal HTTP to videoscaler.discovery.wmnet. Used by ChangeProp-JobQueue via /rpc or /w/rest.php.
Maintenance hosts misc Internal. Used for scheduled and ad-hoc maintenance scripts run from the command-line.
Snapshot hosts dumps Internal. Used for scheduled work from the command-line relating to XML dumps.

For web requests using Apache, the "Hiera cluster" value is also exposed as $_SERVER['SERVERGROUP'] to PHP.

In Grafana dashboards, Prometheus metrics, and Icinga alerts the cluster field usually refers to the "Hiera cluster" value as well.

MediaWiki configuration

For web requests not served by the cache, the request eventually arrives on an app server where Apache invokes PHP via php-fpm.

Document root

Example request: https://en.wikipedia.org/w/index.php

The document root for a wiki domain like "en.wikipedia.org" is /srv/mediawiki/docroot/wikipedia.org (source).

The /srv/mediawiki directory on apps servers comes from the operations/mediawiki-config.git repository, which is cloned on the Deployment server, and then rsync'ed to the app servers by Scap.

The docroot/wikipedia.org directory is mostly empty, except for w/, which is symlinked to a wiki-agnostic directory that looks like a MediaWiki install (in that it has files like "index.php", "api.php", and "load.php"), but actually contains small stubs that invoke "Multiversion".

Multiversion

Multiversion is a WMF-specific script (maintained in the operations/mediawiki-config repo) that inspects the hostname of the web request (e.g. "en.wikipedia.org"), and finds the appropiate MediaWiki installation for that hostname. The weekly Deployment train creates a fresh branch from the latest master of MediaWiki (including any extensions we deploy), and clones it to the deployment server in a directory named like /srv/mediawiki/php-–.

For example, if the English Wikipedia is running MediaWiki version 1.30.0-wmf.5, then "en.wikipedia.org/w/index.php" will effectively be mapped to /srv/mediawiki/php-1.30.0-wmf.5/index.php. For more about the "wikiversions" selector, see Heterogeneous deployment.

The train also creates a stub LocalSettings.php file in this php-… directory. This stub LocalSettings. file does nothing other than include wmf-config/CommonSettings.php (also in the operations/mediawiki-config repo).

The CommonSettings.php file is responsible for configuring MediaWiki, this includes database configuration (which DB server to connect to etc.), loading MW extensions and configuring them, and general site settings (name of the wiki, its logo, etc.).

After CommonSettings.php is done, MediaWiki handles the rest of the request and responds accordingly.

MediaWiki internals

To read more about how MediaWiki works in general, see:

  • Manual:Code on mediawiki.org, about entry points and the directory structure of MediaWiki.
  • Manual:Index.php on mediawiki.org, for what a typical MediaWiki entrypoint does.

Static files

There are broadly speaking three ways for Apache to serve a static asset from the MediaWiki application:

  1. /w/**/*?1234567
  2. /w/**/*
  3. /static/**/*

Versioned resources

  • Route: /w/**/*?1234567
    • Varnish: Strip cookies, strip/replace hostname.
    • Apache: Rewrite to /w/static.php (source).
  • Caching: public, immutable (or 1 year), hostname-agnostic (Varnish object is shared across wiki domains).
  • Stats: Grafana: MediaWiki Static.

Versioned resources are the most common way we serve static files, and is generally what any new code should use. These URLs are produced by MediaWiki's ResourceLoader or OutputPage component, and work by mapping the URL to a file on disk, hashing it, and append it as a query parameter.

This offers the strongest performance (immutable and wiki-agnostic URLs), whilst also operating under relatively tight requirements (must know the exact version at the point where the file is requested). This is especially difficult in the face of ParserCache and our CDN, given that you'd generally want to present users with a consistent experience from page-to-page where an icon or other visual aspect does not alternate based on when the page was last modified. The way we make this work is by ensuring frontend assets are served at least with one level of indirection, e.g. through a stylesheet or JavaScropt module (see also: mw:ResourceLoader/Architecture#Caching).

On the backend, the requests for versioned resources are proxied through /w/static.php. This implements two important behaviours:

  • Match the request with the right version of the file (given that we have multiple MW branches in production at any given time).
  • Avoid caching (by clients or CDN) if the requested version is not yet found. This avoids non-recovering cache poisoning around deployments, which would otherwise be possible given that we do not atomically group end-users and CDN servers and backend servers. More background about this eventual-consistency can be found in the source, and in T47877.

Unversioned resources

  • Route: /w/**/*
    • Varnish: Strip cookies, keep hostname.
    • Apache: Rewrite to /w/static.php (source).
  • Caching: public, 24 hours.
  • Stats: Grafana: MediaWiki Static.

This exists for less visible use cases, and offers weaker guruantees and weaker client-side performance. It must not be used in prominent places, especially not directly in MediaWiki backend code or in wmf-config.

These are currently best-effort cached with a relatively short expiry. There is some amount of caching, but not much as we can't predict when the file will change. Given the cache time is more than a few hours, though, it might as well be a random MW-branch so it may be served from a newer branch (in case of rollback, or around deployment races), or an older branch (when simply cached).

Example usage includes:

  • Gadgets and user scripts that augment core functionality and re-purpose some of our assets. For example, Wikipedia's Vector.css override references an svg icon from MediaWiki. It isn't versioned as the editor would otherwise have to keep it in sync with our deployments.
  • Debug mode from ResourceLoader, where we intentionally serve internal JS and CSS files directly without minification at their "current" version. Cache performance is not a concern in debug mode.
  • A tail of random things in core and extensions that reference static files that aren't part of any UI code. Such as Special:Version linking the COPYING file.

Static resources

  • Route: /static/**/*
    • Varnish: Strip cookies, strip/replace hostname.
    • Apache: Served directly without involving PHP code.
  • Caching: public, immutable (or 1 year), hostname-agnostic (Varnish object is shared across wikis).
  • Stats: (no stats).

The "static" resources perform as well as versioned resources. The main difference is that this one works without any version parameter. It exists specifically for cases where we have to refer to a file from a URL that we can't reliably or immediately propagate changes to.

The most prominent example are our project logos and favicons. We want to serve these from a stable URL that we can expose through APIs, to external organizations, be saved in databases, ParserCache, CDN, etc.. These URLs present a consistent experience to any given user, regardless of when the page they are on was last edited or purged. Changes to "static" resources should be rare as browsers are allowed to use their copy offline, without revalidation, for up to a year. This means that purging from the CDN does not mean users can be expected to get the latest copy.

The /static directory is external to MediaWiki and only used if and when explicitly configured so in wmf-config. Remember that it does not consider the wiki's multiversion assignment, so it may serve a version that is a week ahead or a week behind from the wiki's MW branch and PHP code.

We use /static/current/ when an extension or something in wmf-config can't version its URL or can't use ResourceLoader, but needs to serve assets with strong client-side caching (and thus isn't concerned about propagating updates immediately). Examples of current usage:

  • Footer "Powered by" icons. These should not have to be redownloaded every day by browsers. The files exist in core, but we configure them in wmf-config to be served from /static/current for better caching performance (change 295184).

Past examples of /static/current usage:

Timeouts

Request timeouts

Generally speaking, the app servers allow upto 60 seconds for most web requests (e.g. page views, HTTP GET), and for write actions we allow upto 200 seconds (e.g. edits, HTTP POST).

» See HTTP timeouts#App server for a detailed breakdown of the various timeouts on app servers.

Backend timeouts

MySQL/MariaDB
Setting MySQL's event_scheduler core events master core events replica
  • web requests user on running queries for read only replicas: 60s
  • web requests user for idle connections for read only replicas: 60s
  • web requests user read only replicas, on connection overload: 10s
  • web requests user for idle connections on read-write master: 300s
Type Wall clock time
Notes This was added as a measure to prevent pileups from a single event, as well as to overcome the (considered not ideal behavior) of terminated connections keeping running even if there won't be any socket open to report to. Implemented on MySQL's event scheduler for legacy reasons, but using max_execution_time or equivalent should be probably ideal.

Pages in the MediaWiki production category

See also