Wikimedia Cloud Services team/EnhancementProposals/Decision record T303931 k8s standard deployment code pattern

Origin task: phab:T303931

Date of the decision: 2022-03-23

People in the decision meeting:

User:David_Caro
Nicholas Skaggs
Vivian Rook
Komla Sapaty
Andrew Bogott
Slavina Stefanova
Bryan Davis
Arturo Borrero González

Decision taken

Option 3 was chosen, to have a deploy.sh script by default. With the following action points:

Test helmfile task T304532
Standardize the deploy.sh file to use helmfile if no blockers found.
Use kustomize if any blockers are found.

Rationale

We think this deploy.sh approach is the most flexible, long-term proof, easy to understand and valuable option.

However, we also recognize value in helm and helmfile being the potential right solution to live under this standard deploy.sh script.

Problem

We have a number of things that deploy into kubernetes, for example:

toolforge custom admission controllers (currently 3 and growing)
toolforge & paws maintain_kubeusers
toolforge jobs framework components (currently 2, api and emailer)
toolforge components deployed from operations/puppet.git such as the ingress setup and other pieces
paws stuff

(and potentially more that I'm overlooking at the moment).

Each of the items listed above has a different deployment code pattern. For example:

a `deploy.sh` script with some logic inside it
a `kustomize`-based setup
a `helm`-based setup
a raw `kubectl apply` call
some combination of all of the above

For a number of reasons, there is no written agreement on which deployment code pattern to use for a given repository.

NOTE: we have a number of kubernetes clusters maintained by WMCS: tools, toolsbeta, paws, and potentially more in the future. This request covers all software components for k8s clusters maintained by WMCS.

Constraints and risks

Some additional notes.

certificates

Some components need x509 certificate generation, and/or other credential management. Ideally, the option we choose is valid to handle the required certificate/credential management.

deployment mechanism

We should perhaps consider this 'deployment code pattern' different from the 'deployment mechanism'.

Let 'deployment mechanism' be the way in which we trigger this deployment, at the moment the options are:

100% manual. A human runs a command on a server.
somewhat automated: by means of a spicerack cookbook, puppet agent run, some other script, or whatever.
CI/CD pipeline, for example for PAWS, which is currently based on github actions I believe.

Please note that the 'deployment code pattern' concept is independent of the 'deployment mechanism'. We could automate helm, kustomize or whatever, once we decide which one to use.

Deciding on 'deployment mechanism' (or automation level/mode) is out of scope of this request.

Note however, that deciding on this very request will greatly benefit us later on when we start automating stuff.

new standard, who makes the changes?

If we introduce a new standard, we will need updates to several code repositories. That could be a lot of work.

The author of this request volunteers to do the work once the standard has been decided.

Decision record

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T303931_k8s_standard_deployment_code_pattern

Options

Option 1

Use helm https://helm.sh/

The proposed standard files are as follow:

topdir/
topdir/helmchart/
topdir/helmchart/values.yaml            <--- base file
topdir/helmchart/values-toolsbeta.yaml  <--- toolsbeta-specific overrides
topdir/helmchart/values-tools.yaml      <--- toolforge-specific overrides
topdir/helmchart/values-paws.yaml       <--- paws-specific overrides
topdir/helmchart/values-devel.yaml      <--- additional, arbitrary overrides are allowed

(yes, some components deploy into the 3 environments, maintain-kubeusers is a good example)

Example of patch introducing this layout to one of our custom components:

https://gerrit.wikimedia.org/r/c/cloud/toolforge/jobs-framework-emailer/+/747107

Example of manual operations using helm:

user@machine:~$ helm install --debug --dry-run app-name ./helmchart -f helmchart/values-toolsbeta.yaml
[..]
user@machine:~$ helm install --debug --dry-run app-name ./helmchart -f helmchart/values-tools.yaml
[..]
user@machine:~$ helm install app-name ./helmchart -f helmchart/values-toolsbeta.yaml
[..]
:# to upgrade:
user@machine:~$ helm diff upgrade app-name ./helmchart -f helmchart/values-toolsbeta.yaml
[..]
user@machine:~$ helm upgrade app-name ./helmchart -f helmchart/values-toolsbeta.yaml
[..]

(but again, the deployment mechanism is not covered in this request)

Pros:

Industry standard to deploy stuff in k8s.
Standard within other SRE teams @ WMF.
Has a concrete specification on how to layout a given directory.

Cons:

Is a package manager, mostly aimed at "apps". Many of our components are not "apps", but simple pieces of codes that do something.
not integrated by default in kubernetes (kubectl etc)
a bit "more" noisy code than with kustomize.
the concrete specification on how to layout a given directory could be handicap in some cases.
some unknowns for x509 certificate generation & management.

Option 2

Use kustomize https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/

The propose directory tree layout is as follows:

topdir/
topdir/deployment/
topdir/deployment/base/           <--- the base yaml
topdir/deployment/tools/          <-- the toolforge-specific overrides
topdir/deployment/toolsbeta/      <-- the toolsbeta-specific overrides
topdir/deployment/paws/           <-- the paws-specific overrides
topdir/deployment/devel-whatever/ <-- additional overrides are allowed

Example of patch introducing this layout to one of our custom components:

https://gerrit.wikimedia.org/r/c/cloud/toolforge/jobs-framework-emailer/+/769694

Example of manual operations using kustomize:

user@machine:~$ kubectl get -k deployment/toolsbeta
[..]
user@machine:~$ kubectl apply -k deployment/toolsbeta
[..]
user@machine:~$ kubectl diff -k deployment/toolsbeta
[..]

(but again, the deployment mechanism is not covered in this request)

Pros:

Industry standard to deploy stuff in k8s.
Integrated by default in kubernetes (via kubectl).
simple and to the point.

Cons:

kustomize is not a full fledged standardized ecosystem (e.g. a repository format itself) and would rely on us introducing a explicit layout.
apparently less sugar & magic than helm (but do we need that?)
you need to manually delete removed kubernetes resources
some unknowns for x509 certificate generation & management.

Option 3

Use whatever, but have a common entry point ./deploy.sh.

This options assumes that each component has its particularities, and that we want to retain flexibility above all.

To achieve this, each k8s component will have an executable ./deploy.sh file at the top level directory which will do all the magic. The magic can be helm, kustomize, or whatever, we don't care as long as it works. This executable script receives no input arguments (or should work out of the box with no input arguments).

Pros:

Simple, efficient, flexible.
Perhaps the most sensible way to handle x509 certificate generation (since we can have arbitrary logic here).

Cons:

Less elegant maybe?
Perhaps assuming we need the additional flexibility is overly defensive and this will bite us in the future.