GitLab/Gitlab Runner/Cloud Runners

From Wikitech

A set of instance-wide Shared Runners offer CI capabilities for unreviewed (meaning untrusted) code for private projects and forks. This Runners are available for every project by default and help the community and volunteers to have CI for every code change.

Public Cloud Evaluation

Shared Cloud Runners execute unreviewed code for private projects and forks. This means Runners could be potentially compromised or run malicious code. The Runners should be ephemeral so any compromise can not persist and affect other jobs, especially not production jobs. To make sure resources are not abused, the Runners should be managed by a certain quota (maximum CPU/Mem, execution timeout).

The above requirements could be offered by a public cloud provider such as AWS, GCP or Digital Ocean. These providers offer high elasticity, ephemeral machines, quotas and proper separation from production.

To support ephemeral, separated and resource-limited Runners the Docker or Kubernetes executor has to be used. The Kubernetes executor offers more flexibility regarding resource quotas and adding and removing ephemeral Runner hosts. So the goal should be to use a managed Kubernetes offering from a Cloud provider and use the Kubernetes executor. The benefit of using Kubernetes is also that migration to a different provider or WMCS should be easier. The integration to GitLab stays the same, only the provisioning of the underlying infrastructure changes.

AWS GCP Digital Ocean
Access/

existing subscription

account exists,

security-team

? unknown account exists,

dedicated project for RelEng

Managed Kubernetes
Auto scaling nodepools
API/

Infrastructure as code

✔+ terraform provider ✔+ terraform provider ✔+ terraform provider
API and automation

The cloud provider should also offer an interface to setup all components using code or API. The environment for Cloud Runners should be provisioned using APIs and/or infrastructure as code. Although the environment is expected to be very small with only one managed Kubernetes cluster this automation enables anyone to setup the environment. Furthermore it also helps as additional documentation (for example for migrating to a different environment or re-create the environment).

The industry standard tool for provisioning cloud resources is Terraform. Terraform offers integrations for a wide range of Cloud providers, including the mentioned above. GitLab offers a Terraform integration and management of a shared state. Different Cloud providers also have their own IoC tool, like Deployment Manager or CloudFormation. Using this provider-specific tooling makes it harder to switch between different environments. So Terraform should be favored.

Provider Dependency

The Runners do not perform critical CI/CD jobs, so any dependency to a public cloud provider is not affecting the ability to deploy production code. A termination of such a public cloud offering would disrupt private CI jobs from the community and volunteers. This feature of private CI is not available in Gerrit. So the risk would be to fall back to the same functionality Gerrit offers.

By using open source and known tools like Terraform and Kubernetes it is possible to migrate Cloud Runners to a different environment, if necessary.

Provisioning of Runner environment

A dedicated GitLab project /repos/releng/cloud-runners manages all of the Terraform code for provisioning for the environment. This project has a pipeline to execute the Terraform commands needed to create the cloud environment.

The README.md contains further instructions for provisioning the Cloud Runners.

Integration of GitLab Runner into Kubernetes Cluster

GitLab offers a dedicated Kubernetes Executor. This executor interfaces with the Kubernetes API to create pods for CI jobs. The executor can be installed using the official gitlab/gitlab-runner helm chart.

The helm chart is applied by a dedicated CI job in a GitLab CI pipeline. (in /repos/releng/cloud-runners).

Usage of Cloud Runners

Cloud Runners are CI workers which are available to every project by default. So CI jobs should be executed by the Cloud Runners without any additional configuration. To force a run on Cloud Runners use the tag cloud (and kubernetes). So all jobs which need Cloud Runners only should specify this tag:

cloud-build-job:
  stage: cloud-build
  image: docker-registry.wikimedia.org/bullseye:20221127
  script:
    - echo "Compiling the code..."
    - echo "Compile complete."
  tags:
    - cloud # run job on Cloud Runners only