Maintenance scripts
You can run MediaWiki maintenance scripts ad-hoc via the mwscript-k8s
command on any deployment server. This command will run the script inside a new one-off Kubernetes job in the same WikiKube cluster (and using the same MediaWiki Docker image) as web traffic, mw-cron, and other MediaWiki On Kubernetes deployments.
Starting a maintenance script
As of September 2024, maintenance scripts should no longer be run on the maintenance server (mwmaint*
). Any time you would previously SSH to a mwmaint
host and run mwscript
to run a maintenance script, follow these steps instead.
This requires production access, particularly membership in the deployment or restricted group.
SSH to any deployment server. Either deployment server will work; your job will automatically start in whichever data center is active, so you don't need to change deployment hosts when there's a datacenter switchover. You may use a screen or tmux, but it's not required.
rzl@deploy2002:~$ mwscript-k8s --comment="T341553" -- Version.php --wiki=enwiki
Any options for the mwscript-k8s tool, as described below, go before the --
. After the --
, the first argument is the script name; everything else is passed to the script.
The --comment
flag sets an optional (but encouraged) descriptive label, such as a task number.
Kubernetes saves the maintenance script's output for seven days after completion.
Tailing stdout
By default, mwscript-k8s prints a kubectl command that you (or anyone else) can paste and run to monitor the output or save it to a file.
As a convenience, you can pass -f
(--follow
) to mwscript-k8s to immediately begin tailing the script output. If you like, you can do this inside a screen or tmux. Either way, you can safely disconnect and your script will continue running on Kubernetes.
rzl@deploy2002:~$ mwscript-k8s -f -- Version.php --wiki=testwiki
[...]
MediaWiki version: 1.43.0-wmf.24 LTS (built: 22:35, 23 September 2024)
Input on stdin
For scripts that take input on stdin, you can pass --attach
to mwscript-k8s, either interactively or in a pipeline.
rzl@deploy2002:~$ mwscript-k8s --attach -- shell.php --wiki=testwiki
[...]
Psy Shell v0.12.3 (PHP 7.4.33 — cli) by Justin Hileman
> $wmgRealm
= "production"
>
(Note: for shell.php
in particular, you can also use mw-debug-repl
instead.)
rzl@deploy2002:~$ cat example_url.txt | mwscript-k8s --attach -- purgeList.php
[...]
Purging 1 urls
Done!
Attaching to the process will attach to both its stdin and stdout; you don't need to pass --attach --follow
.
Input from a file
Because the script runs in a Docker container on a Kubernetes worker machine, it can't read files on the deployment host. When the script needs to read from a file, such as a list of URLs, you can pass --file
to mwscript-k8s to copy the file into the container.
Only text files are supported, and the maximum total size is 1 MiB. Files are always placed in /data inside the container; that's the maintenance script's working directory, so no path needs to be specified.
rzl@deploy2002:~$ ls
input.txt
rzl@deploy2002:~$ mwscript-k8s --file=input.txt -- ReadFromAFile.php --wiki=testwiki --filename=input.txt
You can pass --file
repeatedly to copy multiple files.
rzl@deploy2002:~$ mwscript-k8s --file=/srv/example/input1.txt --file=/srv/example/input2.txt -- ReadFromTwoFiles.php --wiki=testwiki --urls=input1.txt --more-urls=input2.txt
Optionally, you can specify a different filename to use inside the container, using a colon as below. (But don't specify a directory after the colon; /data is the only supported destination.)
rzl@deploy2002:~$ ls
input_with_a_long_filename.txt
rzl@deploy2002:~$ mwscript-k8s --file=input_with_a_long_filename.txt:input.txt -- ReadFromAFile.php --wiki=testwiki --filename=input.txt
Output to a file
Because the script runs in a Docker container on a Kubernetes worker machine, it can't write files on the deployment host. Moreover, the Docker container is torn down as soon as the script completes (or, rarely, even sooner -- such as if the worker needs to be shut down for maintenance). The Docker container is not a good place to keep data you care about.
New maintenance scripts should be designed, and old maintenance scripts should be updated, so that all output is either logged to stdout (where it can be collected and saved) or stored safely in a database or other remote storage. Only temporary working files should be written inside the container's file system.
As a workaround for scripts that write output files, instead of launching your maintenance script directly, launch shell.php
(with --attach
) and invoke your maintenance script within the shell. Then, the container will persist until you close the shell, so in another window you can use kubectl cp to retrieve the output files. Finally, close the shell when you're done, and the container will be cleaned up as usual.
Running on multiple wikis (the safe way)
foreachwikiindblist
can be invoked within the container, by passing --dblist
to mwscript-k8s:
rzl@deploy1003:~$ mwscript-k8s --comment="T378479" --dblist="s6" --follow -- Version.php
⏳ Starting Version.php on Kubernetes as job mw-script.eqiad.l26iadau ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
Version.php: Start run
Version.php: Running on s6
frwiki MediaWiki version: 1.45.0-wmf.1 (built: 13 mai 2025 à 01:07)
jawiki MediaWiki version: 1.45.0-wmf.1 (built: 2025年5月12日 (月) 23:07)
labswiki MediaWiki version: 1.45.0-wmf.1 (built: 23:07, 12 May 2025)
ruwiki MediaWiki version: 1.45.0-wmf.1 (built: 23:07, 12 мая 2025)
Version.php: Finished run
Running on multiple wikis (the scary way)
If the above technique doesn't suffice, multiple scripts can be invoked carefully from a shell loop.
Note the difference: running mwscript-k8s once with --dblist
invokes a single Kubernetes job which operates on n wikis in sequence. But running mwscript-k8s in a loop invokes n Kubernetes jobs. What's more, by default mwscript-k8s immediately exits after launching the job, without waiting for the job to complete. When invoking mwscript-k8s in a loop, you can launch those n jobs in parallel, multiplying the impact on shared resources like the databases.
To avoid this problem, use --attach
or --follow
whenever invoking mwscript-k8s in a loop, so that the launcher doesn't terminate until the job does, in order to launch those n Kubernetes jobs one at a time.
rzl@deploy1003:~$ for wiki in $(grep -v '^#' /srv/mediawiki/dblists/s6.dblist); do echo === $wiki; mwscript-k8s --follow -- Version.php $wiki; done
=== frwiki
⏳ Starting Version.php on Kubernetes as job mw-script.eqiad.mdpn3mfw ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
MediaWiki version: 1.44.0-wmf.28 (built: 6 mai 2025 à 00:40)
=== jawiki
⏳ Starting Version.php on Kubernetes as job mw-script.eqiad.jjpb18ca ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
MediaWiki version: 1.44.0-wmf.28 (built: 2025年5月5日 (月) 22:40)
=== labswiki
⏳ Starting Version.php on Kubernetes as job mw-script.eqiad.4ywwqidc ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
MediaWiki version: 1.44.0-wmf.28 (built: 22:40, 5 May 2025)
=== ruwiki
⏳ Starting Version.php on Kubernetes as job mw-script.eqiad.0ywq8yq4 ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
MediaWiki version: 1.44.0-wmf.28 (built: 22:40, 5 мая 2025)
Even if you don't want the job output, pass --follow
anyway and pipe it to /dev/null.
Shelling out to mwscript-k8s
If invoking mwscript-k8s from software, rather than in an interactive session, use -o json
(--output=json
) for machine-readable information about the job. Human-readable output still appears on stderr, and can be suppressed.
rzl@deploy2002:~$ mwscript-k8s --comment="T341553" --output=json -- Version.php --wiki=enwiki 2>/dev/null
{
"error": null,
"mwscript": {
"cluster": "codfw",
"config": "/etc/kubernetes/mw-script-codfw.config",
"deploy_config": "/etc/kubernetes/mw-script-deploy-codfw.config",
"job": "mw-script.codfw.c60nd9x7",
"mediawiki_container": "mediawiki-c60nd9x7-app",
"namespace": "mw-script"
}
}
The error
and mwscript
keys will always be present, and exactly one of them will be non-null.
If there was a problem launching the job, mwscript-k8s will exit with nonzero status. error
will be a string containing a human-readable error message, and mwscript
will be null.
If the job launched successfully, mwscript-k8s will exit with status 0. error
will be null and mwscript
will contain everything you need to check on your job using the Kubernetes API (either programmatically or by shelling out to kubectl), formatted like the above example.
(This doesn't indicate the exit status of the maintenance script, which may still crash later on—or might even immediately fail to start, e.g. if its command-line flags are wrong. Successful termination of mwscript-k8s indicates only that the job was successfully submitted to the Kubernetes cluster.)
Note that mwscript.config
and mwscript.deploy_config
are paths to Kubernetes config files on the deployment host with different levels of privilege; use mwscript.config
whenever possible for read-only operations like checking job status, and mwscript.deploy_config
when necessary for mutating operations like terminating your job early.
Some fields in the output look similar; for example, it looks as though you could deduce the value of mwscript.cluster
by parsing mwscript.job
. Don't do this. Instead, treat each entry as an opaque string whose structure is an implementation detail. This will ensure your automation keeps working when the naming conventions change with future updates to the maintenance scripts' Helm chart and helmfile.
Because the extra output would interfere with JSON parsing, the flags --attach
, --follow
, and --verbose
are incompatible with --output=json
.
--attach
or --follow
, mwscript-k8s terminates (returning your JSON) immediately after launching the job, without waiting for the job to complete. If you invoke mwscript-k8s in a loop, you can launch many jobs in parallel, multiplying the impact on shared resources like the databases.Interacting with jobs
Use standard kubectl commands to check the status, and view the output, of running jobs. Some selected examples are below, but refer to the kubectl documentation for detailed usage.
Job names are automatically generated, of the form mw-script.codfw.1234wxyz
, with a random alphanumeric component at the end. mwscript-k8s prints the job name in its first line of output.
Scripts are always launched in the active data center (in these examples, codfw) so that cluster appears in the job name and should be passed to kube_env. Like mwscript-k8s, kubectl can be used from either deployment host.
Listing jobs
Use kubectl get job
. Optionally, use -l username=$USER
to filter the list to only jobs started by a particular user; this can make it easier to find your own.
rzl@deploy1003:~$ kube_env mw-script codfw
rzl@deploy1003:~$ kubectl get job -l username=rzl -L script
NAME COMPLETIONS DURATION AGE SCRIPT
mw-script.codfw.0aajirtz 1/1 5s 15m Version.php
To get more information, you can use -o custom-columns
or -o json
piped into a tool like jq
.
rzl@deploy1003:~$ kubectl get job -l username=$USER -o json |
jq -r '.items |
sort_by(.metadata.creationTimestamp)[] |
[
.metadata.name,
.metadata.labels.username,
.metadata.creationTimestamp,
.status.completionTime // "(no completion time)",
(.spec.template.spec.containers[0].args[1:] | join(" "))
] |
@tsv'
Showing script output
Pass both the job name and container name to kubectl logs
. (Several containers run in each MediaWiki pod, but only one is the application container we're interested in.) The appropriate command is provided by mwscript-k8s, but you can reconstruct it; if you don't remember the name of the right container, omit it, and the error message will offer you several to choose from. The application container has a name ending in -app
.
rzl@deploy1003:~$ kubectl logs job/mw-script.codfw.0aajirtz
error: a container name must be specified for pod mw-script.codfw.0aajirtz-r69bf, choose one of: [mediawiki-0aajirtz-app mediawiki-0aajirtz-tls-proxy mediawiki-0aajirtz-rsyslog]
rzl@deploy1003:~$ kubectl logs job/mw-script.codfw.0aajirtz mediawiki-0aajirtz-app
MediaWiki version: 1.43.0-wmf.24 LTS (built: 22:35, 23 September 2024)
In this example, the job is already completed. If it were still running, we could use kubectl logs -f
(analogous to tail -f
) to stream the output.
Finished jobs are saved for up to a week, including their logs, then cleaned up.
Terminating a job
Deleting a Kubernetes job sends a SIGTERM to the running script. You'll need to act as the deploy user to delete the job; use caution as this gives you elevated privileges over all maintenance scripts, not just your own.
This terminates the job, but also deletes it from the Kubernetes cluster, including deleting its saved logs. Capture those first, if you need to keep them.
rzl@deploy1003:~$ kube_env mw-script-deploy codfw # Act as the deploy user to get delete privileges; use caution
rzl@deploy1003:~$ kubectl delete job mw-script.codfw.0aajirtz
Not yet supported
As of June 2025, there exist maintenance script use cases that are not yet supported on Kubernetes.
On a temporary basis, please use the active deployment server to run your script with mwscript
(i.e., running directly on the host, rather than on Kubernetes with mwscript-k8s
).
This also applies to the following use cases:
- The sql command (and the mysql.php MediaWiki maintenance script). This shells out to the mysql CLI client, which is not currently installed in any MediaWiki Docker images in production. The replacement probably won't be a maintenance script, but a wrapper for mysql using dbconfig data. (T375910)
- The foreachwiki command. Though try #Running on multiple wikis (the safe way) first, using
--dblist=all
, and report any issues you find. - The mwscriptwikiset command. Though try #Running on multiple wikis (the safe way) first, using
--dblist
(pass value without.dblist
suffix), and report any issues you find. - The
importImages.php
maintenance script. There's presently no way to provide non-text input files to a maintenance script. (T377497)
If the job is interrupted (e.g. by hardware problems), Kubernetes can automatically move it to another machine and restart it, babysitting it until it completes. Because not all maintenance scripts were originally written to be safely restarted, mwscript-k8s jobs are not restarted automatically; if your job is interrupted, it will stay stopped unless you manually intervene.
Disabled
- The foreachwikiindblist command. Try #Running on multiple wikis (the safe way) instead, and report any issues you find.