Jump to content

Maintenance scripts

From Wikitech

This page documents the new setup for maintenance scripts on Kubernetes. The old system on the maintenance servers is still available as a fallback for now, but those servers will be going away.

If this setup doesn't work for you, please report issues on task T341553 promptly, so that we can ensure the new system meets your needs before that happens.

As of September 2024, maintenance scripts should no longer be run on the maintenance servers (mwmaint*). Instead, they're launched as Kubernetes jobs on Wikikube, the same Kubernetes cluster (and using the same MediaWiki docker image) as our MediaWiki deployments serving production traffic.

Any time you would previously SSH to a mwmaint host and run mwscript to run a maintenance script, follow these steps instead.

Starting a maintenance script

This requires production access, particularly membership in the deployment group.

SSH to any deployment server. Either deployment server will work; your job will automatically start in whichever data center is active, so you don't need to change deployment hosts when there's a datacenter switchover. You may use a screen or tmux, but it's not required.

rzl@deploy2002:~$ mwscript-k8s --comment="T341553" -- Version.php --wiki=enwiki

Any options for the mwscript-k8s tool, as described below, go before the --. After the --, the first argument is the script name; everything else is passed to the script.

The --comment flag sets an optional (but encouraged) descriptive label, such as a task number.

Kubernetes saves the maintenance script's output for seven days after completion.

Tailing stdout

By default, mwscript-k8s prints a kubectl command that you (or anyone else) can paste and run to monitor the output or save it to a file.

As a convenience, you can pass -f (--follow) to mwscript-k8s to immediately begin tailing the script output. If you like, you can do this inside a screen or tmux. Either way, you can safely disconnect and your script will continue running on Kubernetes.

rzl@deploy2002:~$ mwscript-k8s -f -- Version.php --wiki=testwiki
[...]
MediaWiki version: 1.43.0-wmf.24 LTS (built: 22:35, 23 September 2024)

Input on stdin

For scripts that take input on stdin, you can pass --attach to mwscript-k8s, either interactively or in a pipeline.

rzl@deploy2002:~$ mwscript-k8s --attach -- shell.php --wiki=testwiki
[...]
Psy Shell v0.12.3 (PHP 7.4.33 — cli) by Justin Hileman
> $wmgRealm
= "production"
>

(Note: for shell.php in particular, you can also use mw-debug-repl instead.)

rzl@deploy2002:~$ cat example_url.txt | mwscript-k8s --attach -- purgeList.php
[...]
Purging 1 urls
Done!

Attaching to the process will attach to both its stdin and stdout; you don't need to pass --attach --follow.

Interacting with jobs

Use standard kubectl commands to check the status, and view the output, of running jobs. Some selected examples are below, but refer to the kubectl documentation for detailed usage.

Job names are automatically generated, of the form mw-script.codfw.1234wxyz, with a random alphanumeric component at the end. mwscript-k8s prints the job name when a job is started.

Scripts are always launched in the active data center (in these examples, codfw) so that cluster appears in the job name and should be passed to kube_env. Like mwscript-k8s, kubectl can be used from either deployment host.

Listing jobs

Use kubectl get job. Optionally, use -l username=$USER to filter the list to only jobs started by a particular user; this can make it easier to find your own.

rzl@deploy1003:~$ kube_env mw-script codfw
rzl@deploy1003:~$ kubectl get job -l username=rzl -L script
NAME                       COMPLETIONS   DURATION   AGE   SCRIPT
mw-script.codfw.0aajirtz   1/1           5s         15m   Version.php

Showing script output

Pass both the job name and container name to kubectl logs. (Several containers run in each MediaWiki pod, but only one is the application container we're interested in.) The appropriate command is provided by mwscript-k8s, but you can reconstruct it; if you don't remember the name of the right container, omit it, and the error message will offer you several to choose from. The application container has a name ending in -app.

rzl@deploy1003:~$ kubectl logs job/mw-script.codfw.0aajirtz
error: a container name must be specified for pod mw-script.codfw.0aajirtz-r69bf, choose one of: [mediawiki-0aajirtz-app mediawiki-0aajirtz-tls-proxy mediawiki-0aajirtz-rsyslog]

rzl@deploy1003:~$ kubectl logs job/mw-script.codfw.0aajirtz mediawiki-0aajirtz-app
MediaWiki version: 1.43.0-wmf.24 LTS (built: 22:35, 23 September 2024)

In this example, the job is already completed. If it were still running, we could use kubectl logs -f (analogous to tail -f) to stream the output.

Finished jobs are saved for up to a week, including their logs, then cleaned up.

Terminating a job

Deleting a Kubernetes job sends a SIGTERM to the running script. You'll need to act as the deploy user to exercise delete privileges; use caution.

This terminates the job, but also deletes it from the Kubernetes cluster, including deleting its saved logs. Capture those first, if you need to keep them.

rzl@deploy1003:~$ kube_env mw-script-deploy codfw  # Act as the deploy user to get delete privileges; use caution
rzl@deploy1003:~$ kubectl delete job mw-script.codfw.0aajirtz

Not yet supported

For now, fall back to running mwscript directly on the bare-metal maintenance servers if you need any of the following:

  • Helpers that run a maintenance script on multiple wikis: mwscriptwikiset, foreachwiki, foreachwikiindblist. (Of course, it's fine to manually use mwscript-k8s multiple times to run a script on several wikis. Remember that by default, mwscript-k8s exits immediately without waiting for job completion; if you wrap it in a shell for-loop, the jobs will run in parallel.)
  • Jobs that need to save persistent files to disk. On Kubernetes, your maintenance script runs in a Docker container which will not outlive it. Scripts should log their important output to stdout, or persist it in a database or other remote storage.
  • The sql command (i.e., the mysql.php maintenance script). The mysql client is not installed in our production MediaWiki images. The replacement probably won't be a maintenance script, but a wrapper for mysql using dbconfig data. (task T375910)

If the job is interrupted (e.g. by hardware problems), Kubernetes can automatically move it to another machine and restart it, babysitting it until it completes. Because not all maintenance scripts were originally written to be safely restarted, mwscript-k8s jobs are not restarted automatically; if your job is interrupted, it will stay stopped unless you manually intervene.