MediaWiki On Kubernetes/How it works

From Wikitech

What is in a MediaWiki kubernetes pod

A MediaWiki kubernetes pod contains the following containers:

mediawiki-httpd
The apache web server, containing all the apache configuration, the static files, and a basic web configuration
mediawiki-app
The PHP-FPM application server
mediawiki-mcrouter
The memcached proxy from facebook
tls-proxy
The mesh network sidecar, based on envoy.
rsyslog
The rsyslog container
prometheus exporters
Exporters for metrics from apache, php-fpm, and mcrouter

mediawiki-httpd (apache)

Resource utilization
resource requests limits
CPU 200m 500m
Memory 200Mi 400Mi
The docker image layering

The image for the apache web server is called restricted/mediawiki-webserver and is only downloadable in production with the correct credentials, as it contains potentially sensitive data.

This image is built on the deployment hosts during the execution of scap backport on the basis of the docker-registry.wikimedia.org/mediawiki-httpd image, that is in turn based on our docker-registry.wikimedia.org/httpd-fcgi image.

Each layer of this build provides part of the apache configuration.

Specifically, from the basic image up:

  • httpd provides the installation of apache, the basic environment variables, the user setup and the basic directories
  • httpd-fcgi provides the installation of a standard-configured httpd server with the ability to funnel all requests to a backend running a fcgi application server. It defines a few environment variables we also use in MediaWiki on kubernetes, see the code for details. It also adds configurations used by a fcgi application server, like the logging configuration, forwarding of headers, opening the admin port, and the configuration for two virtual hosts: a default one to handle fcgi requests and a monitoring subsite to handle requests for metrics coming from php internals like ACPU and OPCACHE
  • mediawiki-httpd which adds the basic configuration for apache for mediawiki, so basically everything you'd find on a normal appserver, minus the virtualhosts. Specifically:
    • The main apache2.conf file, which is equal to the one on our main appservers
    • Configuration for a few modules, same as the one on our main appservers
    • Virtual hosts to overwrite the default one from the base image and the one to handle nonexistent domains
  • restricted/mediawiki-webserver includes the specific paths used by MediaWiki as /srv/mediawiki and all the endpoints and static assets we need to serve from the httpd image.
In the chart

We control the following env variables:

  • SERVER_NAME = <pod-name> - the value of the Server: header
  • LOG_FORMAT = ecs_rsyslog - sets sending logs to the local rsyslog via UDP
  • APACHE_RUN_PORT = .php.httpd.port - the main port apache will listen on
  • SERVERGROUP = .php.servergroup - the SERVERGROUP variable we reference in mediawiki-config. It's used to choose various parameters and to tag the metrics with the origin cluster.
  • FCGI_MODE = .php.fcgi_mode - Whether to talk to fcgi via a unix socket (see below) or a tcp port on localhost
  • LOG_SKIP_SYSTEM = 1 - Suppress access logging for monitoring calls from prometheus and kubernetes

The liveness probe just checks that the tcp port (set in php.httpd.port) is open and accepting connection, while the readiness probe fetches /healthz from the metrics port. This way a pod is only ready when apache is up and php-fpm has free worker slots.

We mount the following volumes in the container:

  • /run/shared is an emptyDir used by this container and the php-fpm container to communicate via unix socket if FCGI_MODE is set to FCGI_UNIX.
  • we bind-mount the website definitions from value mw.sites under /etc/apache2/sites-enabled. Those virtualhosts are injected using templates and a declarative structure for wikis. For standard production installations, this value is included from the /etc/helmfile-defaults/mediawiki/httpd.yaml file, which is generated from puppet directly via the profile::kubernetes::deployment_server::mediawiki::config puppet class.

In addition, we allow optional mounting of two volumes that allow a quick-and-dirty way to modify the behaviour of the pod:

  • /srv/mediawiki/w/debug is mounted if the value debug.php.enabled is true, and contains debug endpoints the user wants to inject into the deployment.
  • /etc/apache2/conf-enabled/00-aaa.conf with the content set to the value mw.httpd.additional_config if present - providing apache configurations that will be evaluated before everything else.
Logging

The access logs are sent from apache httpd to the rsyslog running in the same pod via UDP on port 10200 using logger(1). Rsyslog then processes these logs and sends them out. The apache error log is sent to standard error and thus is picked up by rsyslog.

This happens because we set the environment variable LOG_FORMAT to ecs_rsyslog - otherwise the logs would go to stdout. If you also set the DEBUG environment variable to 1, then the loglevel is set to "debug", and mod_log_debug is also loaded. Please note: this setting is only available in the containers; you are not allowed to set it in the chart as the amount of logs produced would overwhelm our logging infrastructure.

Metrics

Metrics from apache are exported using the apache-httpd-exporter prometheus exporter, running in the pod. As of now, we don't analyze the apache logs to export latency metrics as we do on the bare metal servers, but rather rely on envoy's telemetry data like for all the other services.

mediawiki-app (php-fpm)

Resource utilization
resource requests limits
CPU 4 5
Memory 1000Mi 2800Mi
The docker image layering

The image for running the php-fpm daemon and all of the MediaWiki code is called docker-registry.discovery.wmnet/restricted/mediawiki-multiversion and is only downloadable in production with the correct credentials, as it contains private data.

The image is built on the deployment hosts during the execution of scap backport on the basis of the docker-registry.wikimedia.org/php7.4-fpm-multiversion-base image, that is in turn based on our docker-registry.wikimedia.org/php7.4-fpm image.

Each layer of this build provides part of the container functionality.

Specifically, from the basic image up:

  • php7.4-cli Is the basic docker image for any php application. It sets up the source repository, installs php and all the most common extensions, including our own excimer; defines a large number of environment variables that are then directly injected in the php.ini file
  • php7.4-fpm Installs php7.4-fpm, defines even more environment variables to inject both in the php.ini file, including OPCACHE and APC, and the php-fpm configuration. Of main note is the FCGI_MODE variable, that decides if php-fpm will be listening on a socket or a TCP port, by deciding which of the files under pool.d will be included.
  • php7.4-fpm-multiversion-base Is a thin layer on top of php-fpm: it installs all of our own custom extensions (excimer, luasandbox, wikidiff2, wmerrors) and any additional extensions (yaml) one might need; We also install an smtp null agent (msmtp) to make sure we can send email (more on that below).
  • restricted/mediawiki-multiversion Is the final image where we install the mediawiki code. It's built on the deployment server, using the make-container-image process like the webserver image. It's typically built incrementally on top of the lower layer. In order to reduce the amount of bytes transferred, we typically build the new version of the image as an additional layer on top of the last one, so each layer will contain more or less the patch we want to apply. At some point, one of our conditions verify (size of the image, number of layers on top of the base image, size of the new layer) that triggers a full rebuild. That usually happens when we deploy the train and so we have many new files in a new version branch under /srv/mediawiki/php-XXX., and both building and pushing this image can be very slow.
In the chart

We control quite a few env variables:

  • SERVERGROUP, FCGI_MODE - same as for the httpd container
  • FCGI_URL derives from FCGI_MODE and gets injected in the php-fpm configuration
  • FCGI_ALLOW is used to list clients authorized to call php-fpm, and the list is limited to 127.0.0.1, so other containers in the pod.
  • PHP__* - all these env variables can modify the php.ini file and thus php's behaviour.

Of note is the fact that if you set the php.devel_mode value to true, opcache revalidation will be turned on, with checks for every request - so that any change to php files will be picked up by php-fpm.

The liveness probe is just a check that the tcp socket is reachable or that the file /run/shared/fpm-www.sock is a unix socket.

We mount the following volumes in the container:

  • /etc/wikimedia-cluster - a file bind-mounted containing the name of the datacenter we're in. This is used by MediaWiki.
  • /var/www - if mw.mail_host is set, we configure out null mailer agent to send email with some configuration in the home directory of the www-data user.
  • /run/shared - same as for the httpd container
  • /etc/wmerrors contains files defined via the mw.wmerrors value as filename:content yaml pairs. This value in production is fetched from /etc/helmfile-defaults/mediawiki/httpd.yaml, which is generated by puppet injecting the fatal-error.php file defined in puppet.
  • /var/log/php-fpm if the value mw.logging.rsyslog is true

In addition, we allow optional mounting of /srv/mediawiki/w/debug that allow a quick-and-dirty way to inject code for debugging purposes: the configmap will contain one file per key of the value debug.php.contents, if debug.php.enabled is true. This allows us to deploy new endpoints to a pod that can be used for debugging purposes. For example, given this configuration for the mw-debug deployment, when using the WikimediaDebug extension, you will be able to reach the code you injected using an url like https://en.wikipedia.org/w/debug/geoip.php (on any wiki).

Logging

When rsyslog is enabled (mw.logging.rsyslog is set to true), php-fpm logs both its error log and its slow log to an emptyDir shared with the rsyslog container. Otherwise, both are logged to stdout and picked up unstructured by our standard k8s logging pipeline.


These logs are elaborated by the local rsyslog (see below) and sent to logstash. Of particular importance are the php-fpm slowlogs (NDA restricted), which allow you to see where MediaWiki is spending time executing code for requests lasting more than 5 seconds.

Metrics

Metrics from php-fpm are collected via the php-fpm exporter sidecar container. Given the interface it uses (php-fpm status page) doesn't provide all the metrics we want, like opcache/apcu status, we've created a "monitoring vhost" running on a separate port, 9181, to be used by all monitoring.

This sends requests to the php backend that extracts the relevant metrics.

mediawiki-mcrouter

Resource utilization
resource requests limits
CPU 200m 700m
Memory 100Mi 200Mi
The docker image

The README for the image does a good job explaining how the image can be configured, so there is no major point adding any further information here.

In the chart

We mostly use the configurations baked into the cache module for our helm charts. Right now that allows for declarative specification of mcrouter pools. In production, we set up the same pools that we use on-premises. See memcached for MediaWiki for further details about how those routes are organized.

As an important aside: given how mcrouter monitors its configuration using inotify, when we change the configuration for mcrouter we don't need a rolling restart of the pods.

Logging

Standard kubernetes logging applies to the mcrouter containers.

Metrics

Metrics are collected by the mcrouter prometheus exporter, the same way as they're collected on-premises.

tls-proxy (envoy)

Resource utilization
resource requests limits
CPU 200m 750m
Memory 100Mi 350Mi


Please refer to the page about the service proxy mesh for details about how it works and how to add new services to it.

Syncing with puppet

There's several things in the configuration of the mesh who are kept in sync with puppet:

  • The list of potential listeners is under the services_proxy key, and populated in /etc/helmfile-defaults/general-<cluster>.yaml, as defined in the puppet class profile::kubernetes::deployment_server::global_config
  • The list of active listeners is under the discovery.listeners key, and is populated in /etc/helmfile-defaults/mediawiki/tlsproxy.yaml, as defined in the puppet class profile::kubernetes::deployment_server::mediawiki::config
  • The error page to serve from envoy in case of connection failure is under the mesh.error_page key. It is defined in /etc/helmfile-defaults/mediawiki/tlsproxy.yaml, and the content is currently generated by the mediawiki::errorpage_content puppet define, included by the puppet class profile::kubernetes::deployment_server::mediawiki::config


rsyslog

Resource utilization
resource requests limits
CPU 100m 1
Memory 200Mi 300Mi
The docker image

The docker image is a very simple bullseye based rsyslog image. Nothing special or fancy about it.

In the chart

We install rsyslog if the value mw.logging.rsyslog is set to true. We pass to it as env variables a few chunks of kubernetes metadata, so that those can be used in the log messages. We run rsyslog as www-data, because we need to share the directory /var/log/php-fpm with the php container in order to parse the slowlog and error log. The configuration files for rsyslog are installed under /etc/rsyslog.d via a configmap

This rsyslog handles various log sources we didn't think we could manage with the node-local rsyslog we're all used to. Specifically:

  • The apache httpd access logs, which are sent in json format to rsyslog and then mangled and shipped to logstash over the mediawiki.httpd.accesslog kafka topic. They are sent over udp to port 10200
  • The php-fpm error log, which is fetched from the shared directory /var/log/php-fpm, and parsed according to a custom ruleset. They are also shipped to kafka and then logstash.
  • The php-fpm slowlog, which is very important for allowing us to understand what's slowing down requests in production. It is parsed using a relatively obscure ruleset, transformed to proper ECS format, and shipped to kafka over the mediawiki.php-fpm.slowlog topic, then collected in a logstash dashboard. It's important to note that php-fpm slowlogs are a terrible fit for rsyslog or really any other logging systems - including the fact that its log field separator - an empty line - is prepended to the log line, and not appended. This results in interesting issues outlined in the chart already.
  • The MediaWiki logs, which are sent via UDP and we just ship out whatever MediaWiki sends us on port 10514 directly to logstash, like we do on-prem.