Wikimedia Cloud Services team/EnhancementProposals/Toolforge API gateway

From Wikitech

This document is a proposal to set up a single API ingress/gateway solution for Toolforge APIs, such as the Jobs API and the upcoming Build service API.

Background

The Toolforge platform is moving from a Grid Engine based platform to a Kubernetes based on. As a part of this work, we've been building software make it easier to interact with the Kubernetes platform and add missing features to let all tools migrate.

These new components are in general being designed with an API based design, where the user-facing interfaces (such as CLI tools) interact with a custom-built API service which in turns interacts with Kubernetes and other software doing the actual work.[1] Currently, we have one of these components deployed in the live Toolforge cluster (the Jobs API), there is a second one in development (the Build service API), and it is not unreasonable to think that there will be more in the future.

We currently directly expose the Jobs API as a NodePort service fronted by a HAProxy load balancer in TCP mode. This works fine for one service, but as the number of services increases the amount of scaffolding is quite high for each service.

Proposal

We will build a new service, called the Toolforge API gateway, which will take all incoming requests, authenticate them, and route them to the backend service based on the URL path. The initial implementation of this service will be very simple: it will be Nginx running in Kubernetes with configuration generated from data in Helm values.[2]

Access

The gateway will be exposed as a NodePort fronted by HAProxy, very similarly to the current Jobs API design. There will be an internal service DNS name for the clients to use. Future work may include making the service available to the public, but that is currently in scope for the initial implementation.

Traffic will be routed to the backends based on the URL path, for example /jobs will be routed to the Jobs API and /builds to the Build service API.

All API requests must be TLS encrypted. For now, the API gateway will use a certificate signed by the Kubernetes internal CA like the Jobs API does, but in the future we could switch it to live Let's Encrypt certificates or something else.

Authentication

As this new service will need to terminate TLS for incoming API requests, it will also need to perform the TLS client certificate authentication for requests that we currently use to authenticate requests.

Requests from the gateway to the backends will also be TLS encrypted. For this purpose we can use self-signed certificates from cert-manager.[3] As a bonus, we can use client certificates from the same CA to authenticate traffic between the gateway and the backend service.

Possible further work

After this has been implemented, we unlock several possibilites. For example, we could update the authentication mechanism to not rely on internal Kubernetes certificates. After that is done, we could open the APIs to external consumers to make it possible to use Toolforge in new ways, such as with Terraform or with a locally installed CLI tool or with a web interface built into Striker.

Notes

  1. phab:T326136
  2. We want the service to be separated from the main Toolforge ingress for security and reliability reasons. Given the low amount of services which won't change often we do not need a separate ingress controller.
  3. cert-manager is not currently deployed on Toolforge, but phab:T292238 would also need it.