MediaWiki at WMF

From Wikitech
(Redirected from MediaWiki)
Jump to navigation Jump to search
Wikimedia infrastructure

Data centres and PoPs

Networking

HTTP Caching

MediaWiki


Media

Logs

MediaWiki is the collaborative editing software that runs Wikipedia.

Infrastructure

Wikipedia request flow

A Wikipedia web request is processed in a series of steps outlined here (as of December 2019).

  • The DNS resolves hostnames like en.wikipedia.org ultimately points to an address like text-lb.*.wikimedia.org, for which the IP addresses are service IPs handled by LVS, which acts as a direct-routing load balancer to our caching proxies.
    » See also DNS, Global traffic routing, and LVS.
  • Wikimedia Foundation owns its content-delivery network. The public load balancers and caching proxies are located in all data centres (especially those with the sole role of being an edge cache, also known as "pop").
    » See also Clusters and PoPs.
  • The caching proxies are servers consisting of three layers: TLS termination, frontend caching, backend caching. Each cache proxy server hosts all three of these layers.
    » See also Caching overview.
    • TLS termination and HTTP/2 handling, handled by ATS (internally called ats-tls). Prior to 2020, we used Nginx- here.
    • Frontend caching: This is an in-memory HTTP cache (uses Varnish, called "Varnish frontend", or varnish-fe). The LVS load balancers route the request to a random cache proxy server to maximise the amount of parallel traffic we can handle. Each frontend cache server likely holds the same set of responses in its cache, the logical capicity for the frontend cache is therefore equal to 1 server's RAM.
    • Backend caching: The backend HTTP caches are routed to by frontend caches in case of a cache miss. Contrary to the frontends, these are routed by a consistent hash, and they also persist their cache on disk (instead of in memory). The backend caches scale horizontally and have a logical capacity equal to the total of all servers. In case of a surge in traffic to a particular page, the frontends should each get a copy and distribute from there. Because of consistent hashing, the same backend cache is always consulted for the same URL. We use request coalescing to avoid multiple requests for the same URL hitting the same backend server. For the backend cache, we use a second layer of ATS (ats-be). Prior to 2020, WMF used a second layer of Varnish (varnish-be) for backend caching.
  • After the cache proxies we arrive at the application servers (that is, if the request was not fulfilled by a cache). The application servers are load-balanced via LVS. Connections between backend caches and app servers are encrypted with TLS, which is terminated locally on the app server using a simple Nginx- install. Nginx then hands the request off to the local Apache. Apache there is in charge of handling redirects, rewrite rules, and determining the document root. It then uses php-fpm to invoke the MediaWiki software on the app servers. The application servers and all other backend services (such as Memcached and MariaDB) are located in "Core services" data centers, currently Eqiad and Codfw.
    » See also Application servers for more about how Apache, PHP7 and php-fpm are configured.

MediaWiki configuration

See Application servers for more about how Apache and php-fpm are configured.

For web requests not served by the cache, the request eventually arrives on an app server where Apache invokes PHP via php-fpm.

Document root

Example request: https://en.wikipedia.org/w/index.php

The document root for a wiki domain like "en.wikipedia.org" is /srv/mediawiki/docroot/wikipedia.org (source).

The /srv/mediawiki directory on apps servers comes from the operations/mediawiki-config.git repository, which is cloned on the Deployment server, and then rsync'ed to the app servers by Scap.

The docroot/wikipedia.org directory is mostly empty, except for w/, which is symlinked to a wiki-agnostic directory that looks like a MediaWiki install (in that it has files like "index.php", "api.php", and "load.php"), but actually contains small stubs that invoke "Multiversion".

Multiversion

Multiversion is a WMF-specific script (maintained in the operations/mediawiki-config repo) that inspects the hostname of the web request (e.g. "en.wikipedia.org"), and finds the appropiate MediaWiki installation for that hostname. The weekly Deployment train creates a fresh branch from the latest master of MediaWiki (including any extensions we deploy), and clones it to the deployment server in a directory named like /srv/mediawiki/php-–.

For example, if the English Wikipedia is running MediaWiki version 1.30.0-wmf.5, then "en.wikipedia.org/w/index.php" will effectively be mapped to /srv/mediawiki/php-1.30.0-wmf.5/index.php. For more about the "wikiversions" selector, see Heterogeneous deployment.

The train also creates a stub LocalSettings.php file in this php-… directory. This stub LocalSettings. file does nothing other than include wmf-config/CommonSettings.php (also in the operations/mediawiki-config repo).

The CommonSettings.php file is responsible for configuring MediaWiki, this includes database configuration (which DB server to connect to etc.), loading MW extensions and configuring them, and general site settings (name of the wiki, its logo, etc.).

After CommonSettings.php is done, MediaWiki handles the rest of the request and responds accordingly.

MediaWiki internals

To read more about how MediaWiki works in general, see:

  • Manual:Code on mediawiki.org, about entry points and the directory structure of MediaWiki.
  • Manual:Index.php on mediawiki.org, for what a typical MediaWiki entrypoint does.

Timeouts

In a nutshell:

  • Web requests generally get 1 minute (e.g. page views, HTTP GET),
    • ... but write actions get upto 3 minutes (e.g. edits, HTTP POST).
  • Jobs generally get 20 minutes,
    • ... but video transcoding jobs get upto 24 hours.

Ceveats:

  • Higher layers are configured at the max of all lower layers. This means that while Nginx's timeout at 180s is aligned with MediaWiki's timeout for POST requests, it's quite far from MediaWiki's timeout for GET requests.
    • .. except for CPU limits on job runners, which are far below their overall timeout for video scaling (20min vs 24h). This is a compromise to prevent regular jobs from being able to spend 24h on the CPU in their main PHP code, which would be very unexpected. Videoscaling jobs are expected to spend most of their time transcoding videos, which happens in separate functions and sub processes.

Execution timeouts on MediaWiki app servers (as of 23 March 2020):

Nginx
Setting proxy_read_timeout
  • 1200 seconds for jobrunners (20min)
  • 86400 seconds for videoscalers (24h)
  • 180 seconds otherwise
Type Wall clock time
Notes This only refers to the time to first byte in the response.
Apache
Setting Timeout
  • 1202 seconds on jobrunners (20min+)
  • 86402 seconds on videoscalers (24h+)
  • 202 seconds otherwise
Type Wall clock time
Notes This is a timeout for the entirety of the request, including connection time.
php-fpm
Setting request_terminate_timeout
  • jobrunner (including videoscaler): 86400s (24h)
  • default (appserver, api_appserver, parsoid): 201s
Type Wall clock time
Notes This is the maximum time php-fpm will spend processing a request before terminating the worker process. It is set in /etc/php/7.2/fpm/pool.d/www.conf and is controlled by the puppet variable profile::mediawiki::php::request_timeout, which can be set in Hiera.
 
Setting max_execution_time
  • jobrunner (including videoscaler): 1200s (20min)
  • default (appserver, api_appserver, parsoid): 180s
Type CPU time (not including syscalls and C functions from extensions)
Notes This is controlled by the max_execution_time setting in php.ini.

Managed in Puppet/Hiera as part of profile::mediawiki::php::fpm_config (mediawiki/jobrunner.yaml, php/init.pp)

MediaWiki
Setting ExcimerTimer
  • videoscaler: 86400s (24h)
  • jobrunner: 1200s (20min)
  • default (POST): 200s
  • default (GET/others): 60s
Type Wall-clock time
Notes This is controlled by the ExcimerTimer interval value, in wmf-config/set-time-limit.php.

Upon reaching the timeout, php-excimer will throw a WMFTimeoutException exception once the current syscall returns.

Pages in the MediaWiki production category

See also