Help:Toolforge/Kubernetes

Any objects manually created in Kubernetes (as opposed to using toolforge clients and APIs) are not officially supported by the Toolforge admin team. They may stop working without notice following any Kubernetes software update or platform outage.

This page describes the Toolforge Kubernetes cluster. Kubernetes (often abbreviated k8s) is a platform for running containers. It is used in Toolforge to isolate Tools from each other and allow distributing Tools across a pool of servers.

You can think about container like a "micro virtual machine" whose only task is to execute a single application. It has its own (minimal) file system and limited CPU and memory resources. In Kubernetes each container is inside a pod, which is what connects the container to the tools directories, the db replicas, the internet, and with other pods.

One characteristic of containers is that, due to the small size, it can not have all packages that were present on all the Grid Engine nodes. To run your application you either need to choose a pre-built container image that has the packages you need (you can see the images available in the Container images section below) or use the Toolforge Build Service to build a custom container image that contains the packages your application needs.

Kubernetes webservices

See our Webservice help for more details.

You can also manually create kubernetes ingressess and services (for example as a way to create reverse proxies without writing any code) as you need, though it's not supported so it might break after an upgrade or not work as you expect.

Kubernetes jobs

There are two ways of running jobs on Kubernetes.

by using the Toolforge jobs framework (recommended).
by directly using the raw Kubernetes API.

Previous to allowing jobs on Kubernetes, Toolforge offered Grid Engine as job scheduling backend.

Namespaces

Each tool has been granted control of a Kubernetes "namespace". Your tool can only create and control objects in its namespace. A tool's namespace is the same as the tool's name with "tool-" appended to the beginning (e.g. tool-admin, tool-stashbot, tool-hay, etc).

You can see monitoring data of your namespace in Grafana, enter in this page and select your namespace in the select box at the top of the page.

Quotas and Resources

Storage limits

The storage size limit of a container, including the image size, is 10GB. That gives you approximately 9GB of free space to use inside the /tmp directory while the container is running, when the container ends all data is deleted. That can be useful to use with some kind of file based database (SQLite, dbm, csv, etc) when working with data that is larger than the memory available. If you need a larger temporary space you can try to use an emptydir volume. For persistent storage use your tool directory (NFS mounted in /data/project) or the toolsdb (SQL server).

Container memory limits

As of this writing, Toolforge Kubernetes worker nodes have 16GB RAM. If the RAM allocated to a single job gets close (or beyond) these numbers, Toolforge Kubernetes may be unable to schedule the tool's workloads. This applies to both webservices and jobs.

Namespace-wide quotas

Your entire tool account can only consume so many cluster resources. The cluster places quota limits on an entire namespace which determine how many pods can be used, how many service ports can be exposed, total memory, total CPU, and others. To view the live quotas that apply to your kubernetes namespace, run kubectl describe resourcequotas.

Quota increases

You can request a quota increase by opening a task for Toolforge (Quota requests) in Phabricator. Please read all the instructions there before submitting your request.

Container images

The Toolforge Kubernetes cluster is restricted to loading Docker images published at docker-registry.tools.wmflabs.org (see Portal:Toolforge/Admin/Kubernetes#Docker Images for more information). These images are built using the Dockerfiles in the operations/docker-images/toollabs-images git repository.

Available container types

A complete list of images is available from the docker-registry tool which provides a pretty frontend for browsing the Docker registry catalog. These include:

bookworm (Debian base image)
jdk17
node18
perl5.36
php8.2
python3.11
ruby3.1
tcl8.6

Any image built with Help:Toolforge/Build Service is also available to use, see the details on the service page.

Troubleshooting

"failed to create new OS thread" from kubectl

If kubectl get pods or a similar command fails with the error message "runtime: failed to create new OS thread (have 12 already; errno=11)", use GOMAXPROCS=1 kubectl ... to reduce the number of resources that kubectl requests from the operating system.

The active thread quota is per-user, not per-session or per-tool, so if you have multiple shell sessions open to the same bastion server this will effect the available quota for each of your shells. To check the active running threads for your user, use $ ps -Lf --user $YOUR_SHELL_USERNAME.

Get a shell inside a running Pod

Kubectl can be used to open a shell inside a running Pod: $ kubectl exec -it $NAME_OF_POD -- bash

See Get a Shell to a Running Container at kubernetes.io/docs for more information.

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support

Chat in real time in the IRC channel #wikimedia-cloud ^connect or the bridged Telegram group
Discuss via email after you have subscribed to the cloud@ mailing list

Stay aware of critical changes and plans

Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
Read the News wiki page

Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)