Portal:Toolforge/Admin/Kubernetes/RBAC and Pod security/PSP migration

This page contains information on the PSP migration we conducted in 2024. See also: phab:T279110

PSP vs PSA feature comparison

PodSecurityAdmission don't have a profile that allows hostPath volume mounts for NFS, so we can't use it in our current setup

The old PSP tool account profile vs what we can do with PSA restricted profile, see https://v1-24.docs.kubernetes.io/docs/concepts/security/pod-security-standards/#restricted

Table
PSP tool account restricted	PSA profile	Comment
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'runtime/default'	included in restricted
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'runtime/default'	included in restricted
requiredDropCapabilities: [ALL]	included in restricted
allowPrivilegeEscalation: false	included in restricted
fsGroup.rule MustRunAs user.id	not included in any profile, we need alternative
hostIPC: false	included in baseline
hostNetwork: false	included in baseline
hostPID: false	included in baseline
privileged: false	not included in any profile?	maybe it was replaced by `spec.containers[*].securityContext.privileged`
readOnlyRootFilesystem: false	not included in any profile, we need alternative
runAsUser.rule 'MustRunAs user.id	not included in any profile, we need alternative
seLinux:rule: 'RunAsAny'	not included in any profile, we need alternative
runAsGroup.rule: 'MustRunAs' user.id	not included in any profile, we need alternative
supplementalGroups.rule: 'MustRunAs' (disallow root group)	not included in any profile, we need alternative
volumes: [list of allowed volume types]	None allows hostPath volume mounts	WARNING: this is a major blocker. We need to hostPath mounts for NFS. Or, we may rework how we do NFS at all. Is it worth it?
allowedHostPaths	not included in any profile, we need alternative

Custom admission controllers

We have several custom admission controllers:

volume-admission -- mount volumes in pods.
registry-admission -- verify and restricts container registry URLs.
ingress-admission -- verify and restricts ingress resources.
envvars-admission -- mutating pod manifests to add secrets.

Plans

experiment with a PolicyAgent to see if they are capable of fully replacing the PSP functions.
make sure the policy agent we choose can also absorb the functionalities of the several custom admissions controllers we have

Questions and answers

Q: Why using PSA at all if it can't cover all we need?
A: Using a standard function can introduce a baseline of expected behavior with regards to upstream practices. Which may be desirable. We can additionally add our own policies by means of OPA gatekeeper or Kyverno on top of the upstream standard functions.

Q: Why migrating from the custom admission controllers to a policy agent?
A: Some of our custom admission controllers are non-trivial pieces of codes that were crafted to enforce just 2 or 3 fields of a JSON definition. Given we are going to introduce a policy agent anyway, the migration may be trivial, with the additional gain of not having to maintain the custom admission controller codebase anymore.

Kyverno POC

Kyverno can work on different ways, checking resources and auditing, enforcing or even mutating them to achieve compliance. Mutating resources seems like a very cool thing to do, but probably the less-friction migration from current PSP setup is an auditing/enforcement setup.

Installing kyverno :

user@lima-lima-kilo:~$ helm repo add kyverno https://kyverno.github.io/kyverno/
user@lima-lima-kilo:~$ helm repo update
user@lima-lima-kilo:~$ helm search repo kyverno -l
NAME                    	CHART VERSION	APP VERSION	DESCRIPTION                                       
kyverno/kyverno         	3.1.4        	v1.11.4    	Kubernetes Native Policy Management               
kyverno/kyverno         	3.1.3        	v1.11.3    	Kubernetes Native Policy Management               
kyverno/kyverno         	3.1.2        	v1.11.2    	Kubernetes Native Policy Management               
kyverno/kyverno         	3.1.1        	v1.11.1    	Kubernetes Native Policy Management               
kyverno/kyverno         	3.1.0        	v1.11.0    	Kubernetes Native Policy Management               
kyverno/kyverno         	3.0.9        	v1.10.7    	Kubernetes Native Policy Management 
[..]
user@lima-lima-kilo:~$ helm install kyverno kyverno/kyverno -n kyverno --create-namespace --version 3.0.9

Kyverno version 1.10.x supports k8s min 1.24 max 1.26, see https://kyverno.io/docs/installation/#compatibility-matrix

Kyverno have 2 main config resources:

Policy: namespace scope
ClusterPolicy: cluster scope

Example ClusterPolicy to validate all pods definition for common set of desirable security configs:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: "toolforge-tool-account-cluster-policy"
  annotations:
    policies.kyverno.io/title: "pod security"
    policies.kyverno.io/category: "toolforge tool account"
    kyverno.io/kyverno-version: "1.10.7"
    kyverno.io/kubernetes-version: "1.24"
    policies.kyverno.io/subject: "Pod"
    policies.kyverno.io/description: "potential tool account pod security check, clusterwide"
spec:
  validationFailureAction: "Audit"
  background: false
  rules:
  - name: "pod-level validations"
    match:
      all:
      - resources:
          kinds:
          - "Pod"
          namespaces:
          - "tool-*"
    validate:
      message: "pod-level configuration must be correct"
      pattern:
        spec:
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: false
            runAsNonRoot: true
            privileged: false
            hostNetwork: false
            hostIPC: false
            hostPID: false
            capabilities:
              drop:
              - ALL
            seccompProfile:
              type: "runtime/default"

Example Policy to validate per-tool account security configs (this one could be generated by maintain-kubeusers):

apiVersion: kyverno.io/v1
kind: Policy
metadata:
  name: "tf-test-policy"
  namespace: "tool-tf-test"
  annotations:
    policies.kyverno.io/title: "pod security"
    policies.kyverno.io/category: "toolforge tool account"
    kyverno.io/kyverno-version: "1.10.7"
    kyverno.io/kubernetes-version: "1.24"
    policies.kyverno.io/subject: "Pod"
    policies.kyverno.io/description: "potential tool account pod security check"
spec:
  validationFailureAction: "Audit"
  background: false
  rules:
  - name: "pod-level validations"
    match:
      any:
      - resources:
          kinds:
          - "Pod"
    validate:
      message: "pod-level configuration must be correct"
      pattern:
        spec:
          workingDir: "/data/project/tf-test"
          securityContext:
            runAsUser: 1001
            runAsGroup: 1001
            fsGroup: 1001
            supplementalGroups: 1001

Loading them:

user@lima-lima-kilo:~$ kubectl apply -f clusterpolicy.yaml 
clusterpolicy.kyverno.io/toolforge-tool-account-cluster-policy configured
user@lima-lima-kilo:~$ kubectl apply -f policy.yaml 
policy.kyverno.io/tf-test-policy unchanged

Exploring the policy effects on the pods. Note in this POC the policy is in audit mode on purpose:

user@lima-lima-kilo:~$ kubectl get policyreport -n tool-tf-test
NAME                                         PASS   FAIL   WARN   ERROR   SKIP   AGE
cpol-toolforge-tool-account-cluster-policy   0      2      0      0       0      2d18h
pol-tf-test-policy                           0      2      0      0       0      2d20h

arturo@lima-lima-kilo:~$ kubectl get policyreport -n tool-tf-test cpol-toolforge-tool-account-cluster-policy -o yaml
apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
  creationTimestamp: "2024-04-05T15:14:07Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: kyverno
    cpol.kyverno.io/toolforge-tool-account-cluster-policy: "95124"
  name: cpol-toolforge-tool-account-cluster-policy
  namespace: tool-tf-test
  resourceVersion: "95262"
  uid: 2c548864-4327-4dc6-a097-cdb7ccf0aaf6
results:
- category: toolforge tool account
  message: 'validation error: pod-level configuration must be correct. rule autogen-pod-level
    validations failed at path /spec/template/spec/securityContext/allowPrivilegeEscalation/'
  policy: toolforge-tool-account-cluster-policy
  resources:
  - apiVersion: apps/v1
    kind: Deployment
    name: test
    namespace: tool-tf-test
    uid: d8075147-0721-4c80-bbff-e1e95431ecef
  result: fail
  rule: autogen-pod-level validations
  scored: true
  source: kyverno
  timestamp:
    nanos: 0
    seconds: 1712330036
- category: toolforge tool account
  message: 'validation error: pod-level configuration must be correct. rule pod-level validations
    failed at path /spec/securityContext/allowPrivilegeEscalation/'
  policy: toolforge-tool-account-cluster-policy
  resources:
  - apiVersion: v1
    kind: Pod
    name: test-6d779f4c7b-pccrr
    namespace: tool-tf-test
    uid: 51a38062-8c27-4266-bd76-5407cb13ebe7
  result: fail
  rule: pod-level validations
  scored: true
  source: kyverno
  timestamp:
    nanos: 0
    seconds: 1712330036
summary:
  error: 0
  fail: 2
  pass: 0
  skip: 0
  warn: 0
  
arturo@lima-lima-kilo:~$ kubectl describe clusterpolicy toolforge-tool-account-cluster-policy
Name:         toolforge-tool-account-cluster-policy
[..]
Events:
  Type     Reason           Age   From               Message
  ----     ------           ----  ----               -------
  Warning  PolicyViolation  30s   kyverno-admission  Pod tool-tf-test/test-6d779f4c7b-hlwsm: [pod-level validations] fail; validation error: pod-level configuration must be correct. rule pod-level validations failed at path /spec/securityContext/allowPrivilegeEscalation/
  Warning  PolicyViolation  21s   kyverno-admission  Deployment tool-tf-test/test: [autogen-pod-level validations] fail; validation error: pod-level configuration must be correct. rule autogen-pod-level validations failed at path /spec/template/spec/securityContext/allowPrivilegeEscalation/
  Warning  PolicyViolation  21s   kyverno-admission  Pod tool-tf-test/test-6d779f4c7b-bphqj: [pod-level validations] fail; validation error: pod-level configuration must be correct. rule pod-level validations failed at path /spec/securityContext/allowPrivilegeEscalation/

OpenPolicyAgent gatekeeper POC

Installing from helm:

user@lima-lima-kilo:~$ helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
user@lima-lima-kilo:~$ helm install gatekeeper/gatekeeper --name-template=gatekeeper --namespace gatekeeper-system --create-namespace

NOTE: we may want to cache the container image, or even build our own. Both of these options are supported upstream, in the sense that they provide docs on how properly do it. See https://open-policy-agent.github.io/gatekeeper/website/docs/install