Kubernetes/Clusters/Add or remove nodes

From Wikitech
This guide assumes you have a basic understanding of the various kubernetes components. If you don't, please refer to https://kubernetes.io/docs/concepts/overview/components/
This guide has been written to instruct a WMF SRE, it is NOT meant to be followed by non-SRE people.

Intro

This is a guide for adding or removing nodes from existing Kubernetes clusters.

Adding a node

Adding a node is a 4 step process, first we add the node to BGP via our network configuration manager, Homer, and then we create 3 puppet patches, which we merge one by one.

Step 0: DNS

Make sure that the node's DNS is properly configured both for IPv4 and IPv6.

Step 1: Add node to BGP

We have a calico setup, so nodes need to be able to establish BGP with their top of rack switch or the core routers.

To do so:

  1. in Netbox, set the server's BGP custom field to True
  2. On a Cumin host, run homer. The exact target depends on the node's location:
    • If it's a VM or if it/s connected to eqiad/codfw rows A-D, target the core routers (cr*eqiad* or cr*codfw*)
    • If it's a physical server in eqiad row E/F, target its top of rack switch (eg. lsw1-e1-eqiad)
Doing so before the reimage will cause BGP Status alerts. They are not a big deal, but be aware and either keep going with the reimage as soon as possible, or do your homer commit after the reimage is done.

Step 2: Node installation

Reimaging will make the node join the cluster automatically. But for them to be fully functional, we need a puppet run on docker-registry nodes, see: task T273521. Try to avoid joining nodes in deployment windows.
  • Command help:
sudo cumin 'A:docker-registry' 'run-puppet-agent -q'
sudo cumin 'A:wikikube-master and A:eqiad'  'run-puppet-agent -q'
sudo cumin -b 2 -s 5 'A:wikikube-worker and A:eqiad'  'run-puppet-agent -q'

Add node specific hiera data

If the node has some kubernetes related special features, you can add them via hiera

This can be done by creating the file hieradata/hosts/foo-node1001.yaml:

profile::kubernetes::node::kubelet_node_labels:
  - label-bar/foo=value1
  - label-foo/bar=value2

Note: In this past, we used this to populate region (datacentre) and zone (rack row). This no longer is needed, we do this automatically.

Step 3: Add to calico

All nodes are BGP peers for each other. So we need to extend the the cluster_nodes for this Kubernetes cluster in hieradata/common/kubernetes.yaml with the new nodes FQDN:

kubernetes::clusters:
  <your cluster group>:
    <your cluster>:
      cluster_nodes: 
        - foo_controll_plane.eqiad.wmnet
        [...]
        - foo_node1001.eqiad.wmnet

Merge the change, and run puppet on all kubernetes masters and workers, so to apply the appropriate Ferm rules.

Then, check the nodes' BGP status using calicoctl:

calicoctl node status

Step 4: Add to conftool/LVS

If the Kubernetes cluster is exposing services via LVS (production clusters usually do, staging ones don't), you need to add the nodes FQDN to the cluster in conftool-data as well. For eqiad in conftool-data/node/eqiad.yaml like:

eqiad:
  foo:
    [...]
    foo_node1001.eqiad.wmnet: [kubesvc]

# example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894701

Merge the change, and run puppet on the datacentre's LVS hosts.

Then, pool your nodes using conftool (check the weight of your cluster's nodes first):

sudo confctl select 'name=foo_node1001.eqiad.wmnet,cluster=kubernetes,service=kubesvc' set/weight=10
sudo confctl select 'name=foo_node1001.eqiad.wmnet,cluster=kubernetes,service=kubesvc' set/pooled=yes

Done! You made it!

Please ensure you've followed all necessary steps from Server_Lifecycle#Staged_->_Active

Your node should now join the cluster and have workload scheduled automatically (like calico daemonsets). You can login to a deploy server and check the status:

kubectl get nodes

Removing a node

Drain workload

First step to remove a node is to drain workload from it. This is also to ensure that the workload actually still fits the cluster:

kubectl drain --ignore-daemonsets foo-node1001.datacenter.wmnet

However, some workloads might be using volumes mounted to local storage and to drain those you need to add a second option:

kubectl drain --ignore-daemonsets --delete-emptydir-data foo-node1.datacenter.wmnet

You can verify success by looking at what is still scheduled on the node:

kubectl describe node foo-node1001.datacenter.wmnet

Decommission

You can now follow the steps outlined in Server_Lifecycle#Active_->_Decommissioned

Ensure to also remove:

Delete the node from Kubernetes API

The step left is to delete the node from Kubernetes:

kubectl delete node foo-node1001.datacenter.wmnet