Kubernetes/Upstream Helm charts policy

This page is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
This is a draft policy, not yet enforced

Problem Statement

The Kubernetes Special Interesting Group has identified the need to try and utilize upstream helm charts in order to avoid excessive divergence from upstream, avoid version locks and avoid waste of effort to redo what upstream has already achieved. Through discussion the following policy has been formulated.

Policy

On 2023-05-25, the Kubernetes Special Interest Group, after discussion regarding the re-use ofupstream helm charts, came up with the following policy. In the text below, terms MUST, MUST NOT, SHOULD, SHOULD NOT, MAY, OPTIONAL, have the meaning defined in RFC2119.

If and only if the following requirements are met and on a case by case basis, we allow usage of an upstream helm chart

An evaluation MUST be done on the upstream helm chart by an (impromptu or planned) working group of at least 2 people. The first person SHOULD be the person asking for/proposing the usage of an upstream helm chart. The other person SHOULD be anyone generally agreed to have enough experience with helm chart writing. The 2 person rule is there to avoid self +2ing and merging. The 2 main people SHOULD NOT be members of the same team, so that they don't share incentives. Any other person MAY contribute to the evaluation, both at the request of one of the people mentioned above as well as out of their own initiative.

The evaluation and following decision MUST be documented on Wikitech, as a page under Helm/Upstream Charts, using Helm/Upstream Charts/Template as a template.

The list of things that need to be evaluated by the working group SHOULD contain, in no particular order

Overall quality of the chart.
If the above bullet point is failed, estimate the amount of work required to bring the upstream chart into the level of quality we strive for.
Estimate the amount of work it would take to rewrite from scratch internally, re-using all of our tooling.
Evaluation of the upstream charts authors receptiveness to changes to the chart, especially given the above bullet point. Authors SHOULD be receptive to pull/merge requests, offering some review at the very least.
Overall re-usability of the chart

The Kubernetes Special Interest Group, understands and acknowledges that:

This evaluation is effectively a judgement call by the 2+ people involved in the evaluation and that mistakes will be made in either direction.

An upstream chart can change over time. That means that charts that failed the evaluation in the past can pass in the future and charts that did pass the evaluation at some point may deteriorate and/or be abandoned over time.
Proceeding to the actions necessary (e.g. re-evaluation of an upstream Chart) to address situations resulting from the above SHOULD be done.
Helm chart writing is a new space, best practices are still being formulated and in some cases there might be push back and/or friction.
That while this is labeled a "policy", it is in effect, as of this writing (2023-06-28) a rough description of a playbook. Depending on this docs future, original naming might be wrong.

Potential reviewers

If you are looking for reviewers for the evaluation and are unsure whom to ask, you may pick one from the following list (in no particular order). They might be unavailable or ask you try someone else depending on their schedule.

Best practices for adoption of upstream helm charts

In case of adopting an upstream helm chart and yet finding ourselves in the need to edit it, we should aim to upstream changes as much as possible.

Non Upstream patches

For patches that are lingering upstream or patches that are not suitable for upstream, here's a non exhaustive list of best practices to follow:

Commit the upstream source in one CR, then make changes in a second (to make reviews and rebasing easier)
Add the wmf/upstreamVersion annotation to Chart.yaml pointing to the upstream version that was imported (makes rebasing easier as well):

annotations:
  # This chart is based on upstream version v1.10.1
  wmf/upstreamVersion: v1.10.1

Wherever possible, just add new template files rather than editing existing ones
If the above can't be followed and existing templates need to be edited, try to clearly designate local changes (e.g. with comments)

Special considerations considering privileges

Workloads/Pods that utilize elevated privileges are dangerous. Seek extra advice and consultation, including from the Kubernetes Special Interest Group before proceeding.

If the chart adds new functionalities that require high privileges, a more in depth review needs to be done. Examples of use cases happened in the past:

The Container Network Interface (CNI), plugin we use, Calico.

Add support for GPUs via Kubernetes AMD Device plugin.
Add support for PersistentVolumeClaims via Ceph's Container Storage Interface (CSI).

In both use cases, the upstream chart was designed to deploy high privileged containers via aDaemonset, to mimic a daemon running on the Kubernetes bare metal node. In the first use case, we were able to package the device plugin (a Go binary) into a Debian package, and we deployed it via Puppet on the GPU equipped Kubernetes nodes. In the second use case, some long term maintainability questions were raised to the SIG, since only one container (in a group of many) required high privileges and moving it out from the chart would have added a lot more manual/toil work during upgrades and general maintenance. After a discussion happened on May 2024, the SIG decided that both approaches are good, for the following reasons:

In the long term we may see less configurations and code in Puppet, in favor of deployment-charts. If this is the trend, we may just follow it completely to reduce confusion among engineers.
Having a daemon running as root or a container running with privileged:true it is not too different security wise, and it doesn't justify the extra hassle of creating Debian packages.

Such privileged containers, can only be run in the kube-system namespace.

Pods running in the kube-system namespace do not require network policies, as they have full network access by default. Defining network policies for these pods is both un-necessary and harmful, as any networkpolicy will drop all traffic by default, except for the targets defined in the policy. See https://phabricator.wikimedia.org/T381264.

It is worth to remind and emphasize that Kubernetes/Images guidance and rules continue to apply in these cases.

To re-iterate: adding containers with privileged:true (or extra capabilities like CAP_SYS_ADMIN etc..) is not forbidden but usage should be carefully reviewed by >1 people. In case of any questions, referring to the Kubernetes SIG for extra advice is prudent. Sometimes in upstream charts the need to support a lot of use cases is balanced with less security restrictions, but it may be possible to add more granularity for our use cases working with upstream developers.

The case of ClusterRoles

Kubernetes operators oftentimes get assigned broad permissions through ClusterRole and CluserRoleBinding resources, allowing them to get these permissions in all namespaces. This can be problematic, as they might get authorized to e.g. read Secrets in all namespaces in the Kubernetes cluster, which we view as an important security risk.

What we've done for cloudnative-pg-operator, ceph-csi-rbd and ceph-csi-cephfs is to:

Move permissions associated to namespaced resources, such as Secret, Event, Pod, etc, to a Role resource
Leave permissions associated to non-namespaced resources, such as Node, PersistentVolume, in the ClusteRole resource
Define an explicit list of namespaces in which the operator needs these permissions
Bind both these Role and ClusterRole resources with a RoleBinding. While this sounds counter-intuitive, it is perfectly possible and even documented. This allows us the grant these permissions to the operator in only a specific set of namespaces, instead of the whole cluster.

Should you need to to do the same, here is an example of a patch in which we perform such a refactoring.