Kubernetes/Clusters/Add or remove nodes
![]() | This guide assumes you have a basic understanding of the various kubernetes components. If you don't, please refer to https://kubernetes.io/docs/concepts/overview/components/ |
![]() | This guide has been written to instruct a WMF SRE, it is NOT meant to be followed by non-SRE people. |
Intro
This is a guide for adding or removing nodes from existing Kubernetes clusters.
Adding a node
Adding a node is a 4 step process, first we add the node to BGP via our network configuration manager, Homer, and then we create 3 puppet patches, which we merge one by one.
Step 0: DNS
Make sure that the node's DNS is properly configured both for IPv4 and IPv6.
Step 1: Add node to BGP
Nodes (in the calico setup) need to be able to establish BGP with the routers. To be able to, they need to be added to as neighbors in either config/sites.yaml (for rows A-D) or config/devices.yaml (for rows E/F) of the public Homer repository. Please go through the Homer#Usage_🚀 documentation beforehand.
# config/sites.yaml
eqiad:
[...]
foo_neighbors:
foo_node1001: {4: <Node IPv4>, 6: <Node IPv6}
# example https://gerrit.wikimedia.org/r/c/operations/homer/public/+/895175/
You will have to run homer on a cumin host, once that change is merged.
Step 2: Node installation
With the creation of a Kubernetes cluster a Puppet role for the workers has been created (see: Kubernetes/Clusters/New#General_Puppet/hiera_setup)
- Apply the partman recipe
partman/custom/kubernetes-node-overlay.cfg
for the node in modules/install_server/files/autoinstall/netboot.cfg. - Apply the proper kubernetes worker puppet role for your cluster to the node in manifests/site.pp eg: 894697 .
- Reimage the node if needed, or simply run puppet on the host.
![]() | Reimaging will make the node join the cluster automatically. But for them to be fully functional, we need a puppet run on docker-registry nodes, see: task T273521. Try to avoid joining nodes in deployment windows. |
- Make sure to run puppet on Docker-registry nodes soon after the reimage, as well as fellow kubernetes nodes. For example
sudo cumin A:docker-registry 'run-puppet-agent -q'
sudo cumin A:wikikube-master and A:eqiad 'run-puppet-agent -q'
sudo cumin -b 2 -s 5 A:wikikube-worker and A:eqiad 'run-puppet-agent -q'
Add node specific hiera data
If the node has some kubernetes related special features, you can add them via hiera
This can be done by creating the file hieradata/hosts/foo-node1001.yaml
:
profile::kubernetes::node::kubelet_node_labels:
- label-bar/foo=value1
- label-foo/bar=value2
Note: In this past, we used this to populate region (datacentre) and zone (rack row). This no longer is needed, we do this automatically.
Step 3: Add to calico
All nodes are BGP peers for each other. So we need to extend the the cluster_nodes
for this Kubernetes cluster in hieradata/common/kubernetes.yaml with the new nodes FQDN:
kubernetes::clusters:
<your cluster group>:
<your cluster>:
cluster_nodes:
- foo_controll_plane.eqiad.wmnet
[...]
- foo_node1001.eqiad.wmnet
Merge the change, and run puppet on all kubernetes masters and workers, so to apply the appropriate Ferm rules.
Step 4: Add to conftool/LVS
If the Kubernetes cluster is exposing services via LVS (production clusters usually do, staging ones don't), you need to add the nodes FQDN to the cluster in conftool-data as well. For eqiad in conftool-data/node/eqiad.yaml like:
eqiad:
foo:
[...]
foo_node1001.eqiad.wmnet: [kubesvc]
# example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894701
Merge the change, and run puppet on the datacentre's LVS hosts.
Done! You made it!
Please ensure you've followed all necessary steps from Server_Lifecycle#Staged_->_Active
Your node should now join the cluster and have workload scheduled automatically (like calico daemonsets). You can login to a deploy server and check the status:
kubectl get nodes
Removing a node
Drain workload
First step to remove a node is to drain workload from it. This is also to ensure that the workload actually still fits the cluster:
kubectl drain --ignore-daemonsets foo-node1001.datacenter.wmnet
However, some workloads might be using volumes mounted to local storage and to drain those you need to add a second option:
kubectl drain --ignore-daemonsets --delete-emptydir-data foo-node1.datacenter.wmnet
You can verify success by looking at what is still scheduled on the node:
kubectl describe node foo-node1001.datacenter.wmnet
Decommission
You can now follow the steps outlined in Server_Lifecycle#Active_->_Decommissioned
Ensure to also remove:
- The node specific hiera data (from Kubernetes/Clusters/Add_or_remove_nodes#Add node specific hiera data)
- The BGP config for homer (from Kubernetes/Clusters/Add or remove nodes#Step 1: Add node to BGP)
- Remove the node from the
cluster_nodes
list (from Kubernetes/Clusters/Add or remove nodes#Step 3: Add to calico)
Delete the node from Kubernetes API
The step left is to delete the node from Kubernetes:
kubectl delete node foo-node1001.datacenter.wmnet