Wikimedia Cloud Services team/EnhancementProposals/Toolforge container image configuration

From Wikitech

This document proposes to establish a single place to configure Toolforge container images.

Overview

Currently there are two separate components that have their own separate copy of the list of container images available for tools (webservice and jobs-framework-cli), and in the future that list might be needed in a growing number of places (for example phab:T311917). To simplify operations, I propose creating a single location for this configuration.

Proposed solution

Format

The following is a quick sketch of what the data format might like:

{
    "images": {
        "python3.9": {
            "state": "stable",
            "aliases": ["tf-python39"],
            "variants": {
                "jobs-framework": {
                    "image": "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-base"
                },
                "webservice": {
                    "image": "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web"
                }
            }
        },
        "python3.9-pwb": {
            "state": "beta",
            "variants": {
                "jobs-framework": {
                    "image": "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-pwb"
                }
            }
        },
        "python3.7": {
            "state": "deprecated",
            "variants": {
                "jobs-framework": {
                    "image": "docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base"
                },
                "webservice": {
                    "image": "docker-registry.tools.wmflabs.org/toolforge-python37-sssd-web"
                }
            }
        },
        "php7.4": {
            "state": "stable",
            "variants": {
                "jobs-framework": {
                    "image": "docker-registry.tools.wmflabs.org/toolforge-php74-sssd-base"
                },
                "webservice": {
                    "image": "docker-registry.tools.wmflabs.org/toolforge-php74-sssd-web"
                }
            }
        }
    }
}

Note how in the example, python3.9-pwb does not have a webservice variant. A configuration like this means that the specific image is available via the jobs framework but not to webservices.

The aliases field mostly exists to allow non-breaking migration from the current names used by webservice and jobs-framework.

For the final product, we should also create a versioned schema to ensure implementations in different tools are compatible with each other.

Location of the file

This one I'm not sure. We have multiple options:

  1. Standalone Git repository, fetched on runtime with a gerrit/gitlab raw url
  2. Standalone Git repository included as a submodule on build time for the tools that need it
  3. Protected Wikitech page fetched on runtime
  4. File on disk provisioned by Puppet
  5. Kubernetes ConfigMap (managed with our standard Helm tooling?)
  6. Some sort of a Toolforge metadata API webservice (?)

I personally prefer (5), (1) or (3) in that order, but am up to other options as well.

Other

One more thing I'm considering adding to the spec would be default resources allocated to the pod. The main use case is allocating more memory to Java apps since the JDK tends to be a memory hog.

A future version of this spec could be used when implementing T213641 Design mechanism and process for upgrading Kubernetes container runtimes.