MediaWiki On Kubernetes

From Wikitech
Jump to navigation Jump to search

MediaWiki-on-Kubernetes (or mw-on-k8s for short) is an initiative to transition the MediaWiki at WMF deployment from dedicated Application servers to Kubernetes. This page contains information what is changing as part of this transition.

Server groups

MediaWiki on Kubernetes is deployed to both the main Eqiad and Codfw datacenters, in the wikikube kubernetes clusters. While the situation is in constant evolution, we currently have the following server groups:

  • mw-debug For external requests with X-Wikimedia-Debug, like the old mwdebug VMs. Accessible using the k8s-experimental option of the WikimediaDebug browser extension.
  • mw-web For external requests from web browsers (via the CDN).
  • mw-api-ext For external requests to the API (via the CDN).
  • mw-api-int For internal requests to the API (from other services).
  • mw-jobrunner For internal requests from the JobQueue runners (except videoscaling jobs).

What is in a MediaWiki pod

Each MediaWiki pod in Kubernetes contains 8 containers:

  • mediawiki-main-tls-proxy - running Envoy, as service mesh and TLS terminator.
  • mediawiki-main-httpd - running the Apache httpd daemon.
  • mediawiki-main-app - running the PHP daemon.
  • mediawiki-main-mcrouter - running mcrouter.
  • mediawiki-main-rsyslog - running rsyslog, to collect MediaWiki logs (for Logstash) and Apache access logs.
  • mediawiki-main-{php-fpm,mcrouter,httpd}-exporter - the Prometheus exporters for PHP, mcrouter and Apache httpd.

How to manage changes to the infrastructure

Given we're in a transition phase between the old puppet-managed systems and mediawiki running on kubernetes, we tried to keep things common as much as possible. This also means that deployments of puppet changes affecting the application servers typically will also reflect on changes to mw-on-k8s. Now, given that applying an infra-level change to MW on k8s and doing a code deployment are exactly the same procedure, we need additional care when merging changes that would affect it.

Things that we source from puppet in MediaWiki on Kubernetes include:

  • The list of logging brokers, udp2log host.
  • The list of service proxy endpoints to offer, and the list of all available too (out of service::catalog).
  • The list of MediaWiki sites and Apache configuration parameters (e.g. which domain names for Apache vhosts), but not the Apache config template itself!
  • The list of memcached servers.
  • The GeoIP and GeoIPInfo data

So, whenever you want to change any of the above things, you will need to:

  • check what your change would modify on a role::deployment_server::kubernetes host - if it changes a file under /etc/helmfile-defaults/mediawiki then the following applies to your change
  • only merge the change during one of the MediaWiki Infrastructural change windows routinely scheduled on Deployments calendar, or otherwise well outside of any MediaWiki code deployment window. This is done to allow both SREs and deployers to manage the deployment of their changes independently and avoid unexpected consequences.
  • once the change is merged, ensure a puppet run happens on the deployment server, then proceed to deploy all of MediaWiki service groups on kubernetes.

How to force a full rebuild of the image

scap sync-world --stop-before-sync