Kubernetes/Clusters/Upgrade/1.31
Appearance
< Kubernetes | Clusters
This page provides a summary of how we upgraded the Wikikube Kubernetes clusters to version 1.31[1], including the prerequisites, required patches, and necessary steps.
Prerequisites
- All nodes and apiservers need to run bookworm
- All nodes and apiservers need to run containerd as container runtime
- The cluster has been migrated off of PodSecurityPolicies[2]
- The service deployments in deployment-charts use the correct helm version (depending on the cluster version)[3]
- Inform ops@ at least 3 days before the planned upgrade (if you are upgrading a production cluster)
Required patches
Prepare but not merge the following patches before you start the actual upgrade procedure:
- Update the Kubernetes and Calico version to use in hieradata/common/kubernetes.yaml; Example change
- "Unpin" updated charts so the latest versions are deployed after the upgrade; Example change
- Ensure the latest coredns image is deployed after the upgrade; Example change
- Update the cert-manager config to reflect changes to the chart. This will also ensure the leader election leases are created in the cert-manager namespace rather than kube-system[4]; Example change
Running the upgrade
As usual the upgrade will completely wipe the etcd data and initialize a new, empty cluster with the target Kubernetes version. We have created a cookbook to walk you though the process.
- Depool services running on the cluster:
cookbook sre.k8s.pool-depool-cluster depool --k8s-cluster <cluster name> - If you are upgrading a wikikube production cluster, depool `w.qs` services since `rdf-steaming-updater` is unable to resume it's work without human intervention which makes `w.qs` serve stale data[5][6]
confctl --object-type discovery select 'dnsdisc=w.qs,name=${DC}' set/pooled=false - You may check if everything is like you expect with:
cookbook sre.k8s.pool-depool-cluster status --k8s-cluster <cluster name> - Take a node on all releases which are deployed to the cluster:
helm list -A - Ensure that all services can be deployed properly to the cluster (e.g. don't have any pending changes/updates in the deployment-charts repo). The safest way to do so is to deploy all of them.
- Run the actual wipe/upgrade:
cookbook sre.k8s.wipe-cluster --k8s-cluster <cluster name> -H 2 --reason "Kubernetes upgrade"
The cookbook will run various consistency checks and will ultimately wait for your confirmation before wiping the etcd data. Proceed until it reports that the cluster state has been wiped and asks if you want to run puppet. This is the time to merge the patches you have prepared. After merging the patches, continue with the cookbook progress until it asks you to re-deploy admin_ng and let it sit there. - Now continue with the steps described in Kubernetes/Clusters/New#Networking,_cluster_configuration_and_basic_services for deploying istio CRDs (if you require them), admin_ng components and Istio itself. Use istioctl-1.24.2.
- You may now let the cookbook remove the downtimes of for nodes and kubernetes components
- If everything looks fine, no alerts arise etc. continue with deploying all services back to the cluster and confirm removing the service downtimes in the cookbook session.
- When the services look fine as well, repool them:
cookbook sre.k8s.pool-depool-cluster pool --k8s-cluster <cluster name>