Jump to content

Wikimedia Cloud Services team/EnhancementProposals/Decision record T362233 Toolforge policy agent

From Wikitech

Origin task: phab:T362233

Date of the decision: 2024-04-30

People in the decision meeting (alphabetical order):

  • Andrew Bogott
  • Arturo Borrero
  • David Caro
  • Francesco Negri
  • Taavi Väänänen

Decision taken

Option 3 was chosen.

Rationale

Option 3 received approval, with some additional caveats:

We want to drop Kyverno in favor of VallidationAdmissionPolicies after we upgrade K8s to 1.26 and before we upgrade it to 1.29. If we get to the point where we upgrade to 1.29 and we're still using Kyverno, we will hold a new decision request to agree on a new plan.

The task to do the followup is task T364293.

Problem

We need to decide on the implementation for a policy agent for Toolforge Kubernetes. This policy agent should replace the Pod Security Policy mechanism, a core Toolforge security function, that is being removed from Kubernetes in version 1.25. See also: {T279110}


Constraints and risks

  • TBD.

Decision record

In progress.

Options

Option 1

Adopt Kyverno https://kyverno.io/ without a date to migrate out of it.

This is a CNCF incubating project, with about [[ https://landscape.cncf.io/?item=provisioning--security-compliance--kyverno | 300 different contributors since its inception in 2019 ]].

This software was originally created by Nirmata, a company that offers a number of additional enterprise services based on it, in particular:

{F45640334}

Kyverno is simple to work with, it was designed specifically for Kubernetes, and supports [[ https://kyverno.io/docs/writing-policies/validate/#common-expression-language-cel | writing policies in CEL ]], which has since been adopted by the main Kubernetes for [[ https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ | Validating Admission Policies ]] (starting in k8s 1.26).

This means that if we adopt Kyverno today, and once we get to k8s 1.26, we can consider migrating our policies -unchanged- to the native VAPs, thus removing the need for Kyverno itself.

Either in Kyverno original policy language or CEL, policies are rather simple and straightforward to work with.

If we adopted Kyverno, the events could be: 1) in k8s 1.24, adopt kyverno 1.10, with policies in the native language 2) we migrate from k8s 1.24 to 1.25 3) we upgrade kyverno to 1.11 4) we translate policies from the native language to CEL (or, fetch them from the policy registry, where they will likely be) 5) we migrate from k8s 1.25 to 1.26 6) we evaluate dropping kyverno in favor of VAPs.

Pros:

  • Simplified workflow for writing policies, compared to OPA gatekeeper (no template indirection).
  • Apparently stable native CEL language support.
  • Has a more or less sensible migration path towards VAPs.

Cons:

  • The native CEL language is only available starting with Kyverno 1.11, which has the requirement of k8s 1.25, meaning we cannot adopt CEL directly in k8s 1.24.
  • Kyverno is pushed mainly by a single company that has an enterprise version on top of it.
  • CNCF incubating project (some risk of the project changing direction)

Option 2

Adopt Open Policy Agent Gatekeeper. https://open-policy-agent.github.io/gatekeeper/website/

This is a CNCF graduated project, with about [[ https://landscape.cncf.io/?item=provisioning--security-compliance--open-policy-agent-opa | 450 different contributors since its inception in 2015 ]].

This software was originally created by Styra, a company that offers a many additional enterprise services based on it.

OPA Gatekeeper is more complex and apparently a bit "uglier" compared to Kyverno. Policies have a template indirection, which means you need to create a policy template, then a policy instance.

Policies are written in Rego language, which is a domain specific language. Apparently, the rego language is still receiving stabilization changes, as reported by the maintainers in Kubecon EU Paris.

Pros:

  • Apparently more CNCF mature project compared to Kyverno (graduated vs incubating)

Cons:


Option 3

Goal k8s native: Option 1 + replacing kyverno with migration to k8s native VAPs on the 1.26 k8s upgrade

Pros:

  • All the ones with option 1
  • No risks involving the project changing direction
  • One less component to maintain in the mid-long term

Cons:

  • Same as option 2, without the risk of long-term maintenance