Portal:Toolforge/Admin/Kubernetes/Upgrading Kubernetes/1.21 to 1.22 notes
Appearance
This page contains historical information. It may be outdated or unreliable.
Toolforge Kubernetes 1.22 upgrade
prep
- [x] disable puppet, `A:k8s-node-all` on tools-cumin-1 selects what's needed
- [x] downtime stuff on https://prometheus-alerts.wmcloud.org
- [--] disable jobs-emailer due to potential noise?
- [x] update `profile::wmcs::kubeadm::component` project-wide hiera key
control nodes
tools-k8s-control-4
- [x] sudo -i kubectl drain tools-k8s-control-4 --ignore-daemonsets
- [x] sudo run-puppet-agent --force && sudo apt-get install -y kubeadm
- [x] sudo -i kubeadm upgrade plan 1.22.17
- [x] sudo -i kubeadm upgrade apply 1.22.17
- [x] sudo apt-get install -y kubelet kubectl docker-ce containerd.io helm
- [x] sudo -i kubectl uncordon tools-k8s-control-4
- [x] (wait that all pods start, and apiserver/scheduler/controller-manager aren't logging any errors and look good overall)
- [x] drain ; reboot ; uncordon
aborrero@tools-k8s-control-4:~$ sudo -i kubectl -n kube-system logs -f kube-controller-manager-tools-k8s-control-4 E0410 09:18:10.387734 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: leases.coordination.k8s.io "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
tools-k8s-control-5
- [x] sudo -i kubectl drain tools-k8s-control-5 --ignore-daemonsets
- [x] sudo run-puppet-agent --force && sudo apt-get install -y kubeadm
- [x] sudo -i kubeadm upgrade node
- [x] sudo apt-get install -y kubelet kubectl docker-ce containerd.io helm
- [x] reboot VM?
- [x] sudo -i kubectl uncordon tools-k8s-control-5
- [x] (wait that all pods start, and apiserver/scheduler/controller-manager aren't logging any errors and look good overall)
tools-k8s-control-6
- [x] sudo -i kubectl drain tools-k8s-control-6 --ignore-daemonsets
- [x] sudo run-puppet-agent --force && sudo apt-get install -y kubeadm
- [x] sudo -i kubeadm upgrade node
- [x] sudo apt-get install -y kubelet kubectl docker-ce containerd.io helm
- [x] reboot VM?
- [x] sudo -i kubectl uncordon tools-k8s-control-6
- [x] (wait that all pods start, and apiserver/scheduler/controller-manager aren't logging any errors and look good overall)
workers
- use the wmcs-k8s-node-upgrade script with them, start with a couple first
- after the first nodes were upgraded successfully, split the remaining nodes into two or three groups and start multiple wmcs-k8s-node-upgrade instances to process them all
tools-k8s-worker-30 tools-k8s-worker-31 tools-k8s-worker-32 tools-k8s-worker-33 tools-k8s-worker-34 tools-k8s-worker-35 tools-k8s-worker-36 tools-k8s-worker-37 tools-k8s-worker-38 tools-k8s-worker-39 tools-k8s-worker-40 tools-k8s-worker-41 tools-k8s-worker-42 tools-k8s-worker-43 tools-k8s-worker-44 tools-k8s-worker-45 tools-k8s-worker-46 tools-k8s-worker-47 tools-k8s-worker-48 tools-k8s-worker-49 tools-k8s-worker-50 tools-k8s-worker-51 tools-k8s-worker-52 tools-k8s-worker-53 tools-k8s-worker-54 tools-k8s-worker-55 tools-k8s-worker-56 tools-k8s-worker-57 tools-k8s-worker-58 tools-k8s-worker-59 tools-k8s-worker-60 tools-k8s-worker-61 tools-k8s-worker-62 tools-k8s-worker-64 tools-k8s-worker-65 tools-k8s-worker-66 tools-k8s-worker-67 tools-k8s-worker-68 tools-k8s-worker-69 tools-k8s-worker-70 tools-k8s-worker-71 tools-k8s-worker-72 tools-k8s-worker-73 tools-k8s-worker-74 tools-k8s-worker-75 tools-k8s-worker-76 tools-k8s-worker-77 tools-k8s-worker-78 tools-k8s-worker-79 tools-k8s-worker-80 tools-k8s-worker-81 tools-k8s-worker-82 tools-k8s-worker-83 tools-k8s-worker-84 tools-k8s-worker-85 tools-k8s-worker-86 tools-k8s-worker-87 tools-k8s-worker-88 (58 total)
Executions:
user@laptop:~/git/wmf/operations/puppet $ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control tools-k8s-control-4 -n tools-k8s-worker-30 -n tools-k8s-worker-31 -n tools-k8s-worker-32 -r --project tools
[..]
user@laptop:~/git/wmf/operations/puppet $ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control tools-k8s-control-4 --project tools --no-pause -r --file nodelist1.txt
[..]
user@laptop:~/git/wmf/operations/puppet $ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control tools-k8s-control-4 --project tools --no-pause -r --file nodelist2.txt
[..]
user@laptop:~/git/wmf/operations/puppet $ modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control tools-k8s-control-4 --project tools --no-pause -r --file nodelist3.txt
[..]
ingress nodes
- [x] tools-k8s-ingress-4
- [x] tools-k8s-ingress-5
- [x] tools-k8s-ingress-6
- these are otherwise similar to the regular workers, but you'll want to set the replica count to 2 beforehand:
- kubectl -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=2
- and revert afterwards with: kubectl -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=3
- also evicting the nginx pod takes ages, that is normal and you don't need to worry about it
afterwards
- [x] update kubectl on bastions
- [x] restart jobs-emailer if disabled earlier? (not needed)