Wikimedia Cloud Services team/EnhancementProposals/Decision record T362233 Toolforge policy agent
Origin task: phab:T362233
Date of the decision: 2024-04-30
People in the decision meeting (alphabetical order):
- Andrew Bogott
- Arturo Borrero
- David Caro
- Francesco Negri
- Taavi Väänänen
Decision taken
Option 3 was chosen.
Rationale
Option 3 received approval, with some additional caveats:
We want to drop Kyverno in favor of VallidationAdmissionPolicies after we upgrade K8s to 1.26 and before we upgrade it to 1.29. If we get to the point where we upgrade to 1.29 and we're still using Kyverno, we will hold a new decision request to agree on a new plan.
The task to do the followup is task T364293.
Problem
We need to decide on the implementation for a policy agent for Toolforge Kubernetes. This policy agent should replace the Pod Security Policy mechanism, a core Toolforge security function, that is being removed from Kubernetes in version 1.25. See also: {T279110}
Constraints and risks
- TBD.
Decision record
In progress.
Options
Option 1
Adopt Kyverno https://kyverno.io/ without a date to migrate out of it.
This is a CNCF incubating project, with about [[ https://landscape.cncf.io/?item=provisioning--security-compliance--kyverno | 300 different contributors since its inception in 2019 ]].
This software was originally created by Nirmata, a company that offers a number of additional enterprise services based on it, in particular:
{F45640334}
Kyverno is simple to work with, it was designed specifically for Kubernetes, and supports [[ https://kyverno.io/docs/writing-policies/validate/#common-expression-language-cel | writing policies in CEL ]], which has since been adopted by the main Kubernetes for [[ https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ | Validating Admission Policies ]] (starting in k8s 1.26).
This means that if we adopt Kyverno today, and once we get to k8s 1.26, we can consider migrating our policies -unchanged- to the native VAPs, thus removing the need for Kyverno itself.
Either in Kyverno original policy language or CEL, policies are rather simple and straightforward to work with.
If we adopted Kyverno, the events could be: 1) in k8s 1.24, adopt kyverno 1.10, with policies in the native language 2) we migrate from k8s 1.24 to 1.25 3) we upgrade kyverno to 1.11 4) we translate policies from the native language to CEL (or, fetch them from the policy registry, where they will likely be) 5) we migrate from k8s 1.25 to 1.26 6) we evaluate dropping kyverno in favor of VAPs.
Pros:
- Simplified workflow for writing policies, compared to OPA gatekeeper (no template indirection).
- Apparently stable native CEL language support.
- Has a more or less sensible migration path towards VAPs.
Cons:
- The native CEL language is only available starting with Kyverno 1.11, which has the requirement of k8s 1.25, meaning we cannot adopt CEL directly in k8s 1.24.
- Kyverno is pushed mainly by a single company that has an enterprise version on top of it.
- CNCF incubating project (some risk of the project changing direction)
Option 2
Adopt Open Policy Agent Gatekeeper. https://open-policy-agent.github.io/gatekeeper/website/
This is a CNCF graduated project, with about [[ https://landscape.cncf.io/?item=provisioning--security-compliance--open-policy-agent-opa | 450 different contributors since its inception in 2015 ]].
This software was originally created by Styra, a company that offers a many additional enterprise services based on it.
OPA Gatekeeper is more complex and apparently a bit "uglier" compared to Kyverno. Policies have a template indirection, which means you need to create a policy template, then a policy instance.
Policies are written in Rego language, which is a domain specific language. Apparently, the rego language is still receiving stabilization changes, as reported by the maintainers in Kubecon EU Paris.
Pros:
- Apparently more CNCF mature project compared to Kyverno (graduated vs incubating)
Cons:
- Templates indirection for policies, makes them more cumbersome to work with, compared to Kyverno.
- CEL-written policies [[ https://open-policy-agent.github.io/gatekeeper/website/docs/validating-admission-policy/ | only in pre-alpha support phase ]], not available for k8s 1.24 anyway.
- Not a clear migration path to VAPs.
Option 3
Goal k8s native: Option 1 + replacing kyverno with migration to k8s native VAPs on the 1.26 k8s upgrade
Pros:
- All the ones with option 1
- No risks involving the project changing direction
- One less component to maintain in the mid-long term
Cons:
- Same as option 2, without the risk of long-term maintenance