Wikimedia Cloud Services team/EnhancementProposals/Decision record T304060 How to manage quotas in Toolforge Build service

From Wikitech

Origin task: phab:T304060

Date of the decision: 2022-05-04

People in the decision meeting (alphabetical order):

Decision taken

Option 1 was chosen.

Direct responsible individual

Rationale

Given that is the simplest will go with this option, and after the Beta reconsider once we have user input.

Problem

The build service will enable every user to trigger a build, not having per-user quotas might end up in a situation where one user creates a denial of service for the rest.

Currently the build service will enforce quotas at the namespace level, but not at the user level.

Original design with one namespace and rationale: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Toolforge_Buildpack_Implementation#Namespacing

Note that the scope of this decision is for the first iteration of the project, for the long term there's other setups being thought out that will be decided later (after having feedback from the beta).

Constraints and risks

  • The risks of not doing anything is allowing one user to block the builds for the rest.

Options considered

Option 1

Keep only the shared namespace quota.

Pros:

  • No extra work needed.

Cons:

  • Does not avoid a user from affecting the others.

Option 2

Create one namespace per-user/tool

Pros:

  • Each user then has it's own namespace, with their own quotas.

Cons:

  • Not sure if it's possible with the current solutions (tekton), or how many modifications will it require
  • Will add extra maintenance costs, as those namespaces will have to be updated every time we update the system (pipelines for now)
    • This will require changing mainain_kubeusers to create those namespaces and quotas and such
    • This also complicates development/testing

Option 3

Re-use user namespace

Pros:

  • Each user then has it's own namespace, with their own quotas.

Cons:

  • Not sure if it's possible with the current solutions (tekton), or how many modifications will it require
  • Will add extra maintenance costs, as those namespaces will have to be updated every time we update the system (pipelines for now)
  • This allows users to control everything on the pipeline, making it impossible to put any secrets (ex. robot accounts to push to the repos) that are not user-owned.
    • That means changing the whole security model, and introducing quotas/tenants on harbor side.

Option 4

Wrap it on a webservice. Instead of relying on kubernetes limiting tools that are bound to a workspace, and a resource, build our own admission controller (or similar) to limit (also, and first) on higher abstractions like 'build request'. This is something that tekton does not implement yet, but there's talks about implementing it there (in a more generic and nuanced way than we need).

Pros:

  • Fine grained control of quotas/limits
  • Flexibility to change it in any direction we need in the future
  • Might be part of the next step for the toolforge build serivce (TBD, but mainly moving from smart client to API based service, like the jobs framework)

Cons:

  • Some extra code to write, maintain and deploy.
  • Will take longer to implement (maybe better as a second step)