Wikimedia Cloud Services team/EnhancementProposals/Toolforge push to deploy/Blog post

From Wikitech

The Cloud Services team is planning to introduce new "push-to-deploy" functionality to Toolforge, a hosting platform (PaaS) for the Wikimedia movement. If you're not already familiar with it, Toolforge is used to perform analytics, administer bots, run webservices, and create tools. These tools help project editors, technical contributors, and other volunteers who work on Wikimedia projects.

The push-to-deploy project aims to provide users with a more flexible environment to deploy tools in, while also reducing the amount of technical knowledge needed to deploy a tool.

History

The original setup was for tools to run in what's known as the GridEngine environment, which has access to nearly all software installed in Toolforge. You could easily switch between PHP, Python or nodejs, it's all accessible. And if your tool needed to call an external program like imagemagick, it's available too. While this gave users a lot of flexibility in what they could access, it has some major downsides too. There was no way to provide updated software versions without upgrading it for everyone, potentially breaking other peoples' tools. Upgrades became tied to new versions of the base operating system (formerly Ubuntu, now Debian), which only happened every two years. And users were forced to update their software when the time for the upgrade came around, rather than doing it on a timeline that was convenient for them.

In 2016, Toolforge introduced an option to run tools on Kubernetes, which made different runtimes available. Each runtime was based on a programming language version, plus related utilities. For example, the PHP 5.6 runtime included composer. Because these runtimes were just Docker images, it was possible to add more language versions without waiting for operating system upgrades. But these runtimes only had packages related to that language in them, and nothing else. If you had a web tool written in Python that needed nodejs/npm to install client-side assets, you were out of luck. In one case users wanted the SVG rendering library, librsvg, leading to a discussion on whether it was fine to add it to the PHP runtime, and where the line should be drawn, if someone wanted something like LaTeX to be installed. Toolforge administrators were also reluctant to add one-off custom images for users because there was no process in place to make that scale as each change to an image needed to be reviewed and deployed by an admin.

Problems

So at this point we have some common problems that keep coming up and need addressing:

  • A fixed one-size-fits-all environment does not actually address all the needs of our users
  • Users want to be able to compose different language runtimes together, e.g. Python and nodejs
  • Users should be able to move at their own pace, not be blocked by Toolforge administrators
  • Users want a stable platform but also have access to newer versions of software if they want it
  • And users who want to deviate from the norm or have special requirements should not add extra burden to Toolforge administrators.

And there are some areas in which Toolforge is behind what commercial PaaS' offer:

  • Integration with CI/CD tools, only deploying a new version of the tool if it passes all tests
  • Ability to reproduce the environment locally for debugging
  • Option to deploy new versions of tools without logging in via SSH and using command-line tools

Vision

A simple git push should start a pipeline that builds your tool as necessary and then deploys it on Toolforge. You shouldn't need to ssh to any host or run any command-line tools, instead observing and controlling the deployment from a web dashboard. Overall, users should have the flexibility to install the software they want without requiring intervention from Toolforge administrators.

Enter buildpacks

Cloud Native Buildpacks are a relatively new project of the Cloud Native Computing Foundation (of which the Wikimedia Foundation is a member), based off the work done by Heroku and others. In contrast to the current Kubernetes model where the same images are used by all users, buildpacks generate Docker images custom tailored to your project. Each buildpack examines the code in your Git repository and then decides what it should install/add to the image.

Here's what the build workflow of a tool written for Python 3.7 hypothetically would look like, using 3 buildpacks:

  • python37:
    • Looks for type: python3.7 in the service.template in your repository. If this doesn't match, it'll try a different language runtime or error out if it can't match any of them
    • Installs Python 3.7 (currently from Debian packages)
    • Installs pip and virtualenv
    • Provides python (version 3.7), pip, virtualenv
  • pip
    • Looks for a requirements.txt file
    • Creates a virtualenv, installs dependencies
    • Requires python, any version and virtualenv
  • uwsgi (webserver to run Python applications)
    • Unconditionally used.
    • Installs uwsgi into a virtualenv
    • Sets launch process to uwsgi ... with roughly the same configuration as a current Kubernetes tool
    • Requires python, any version and virtualenv

It would be trivial to swap out the pip buildpack for one that used another Python dependency manager like pipenv or poetry. There could also be an optional buildpack that looked for a package.json file and used npm to install and build client-side assets without adding nodejs/npm to the final runtime image.

Once the image is built, it will be published to the Toolforge image registry and deployed to your tool. You could easily pull the same image locally to run and debug your application in a similar environment as Toolforge.

In summary, users would get to control which buildpacks apply to their tool, allowing them to compose together a Docker image that includes all the dependencies they need without anything extra. For most things if you need something new, you would have the power to add it yourself, rather than having to wait for Toolforge administrators.

Status

Currently the project exists in a proof-of-concept stage. We've created base images and buildpacks for the Python 3.7 workflow, which can be used to create a fully functioning Docker image today. Changes have been made to the Toolforge Kubernetes cluster to allow these kinds of images to be run and was successfully tested in the "toolsbeta" testing environment. It is currently possible to deploy a tool using a buildpack-built image in Toolforge today, except it would require a Toolforge administrator to manually build and publish the image (somewhat defeating the goals of this project).

There will be some limitations during the initial rollout. First, these images will only have access to files that are committed to the Git repo, not anything on the shared NFS filesystem. This means tools will need to rely on the MariaDB or Redis databases for persistent storage. And as these images will be publicly available, it will not be possible to access any secrets (TODO: phab task). Logs will be accessible only through Kubernetes rather than written to disk.

Next steps

The next major work is to design and build the deployment pipeline that receives a Git trigger/webhook, builds a new Docker image using buildpacks, and deploys it to the current tool. We'll also need help writing buildpacks for other languages and tools that users want.