Add stacked control-plane

The etcd part of this needs to be done one-by-one for each new control-plane as it it not possible to add more than one "unstarted" etcd node to the cluster and new nodes joining do expect all of the nodes in the server SRV record to be reachable.

It might be wise to image the new control-plane initially with insetup role to speed up the process later (by just having to run puppet)
Ensure the new control-plane has a DNS record (adding to the etcd SRV record fails CI otherwise)
Ensure the existing etcd cluster is not in bootstrap mode (check that profile::etcd::v3::cluster_bootstrap is false)
Add the new nodes FQDN to the server SRV record of the etcd cluster: gerrit, DNS - Wikitech
Add the new node as a member to the etcd cluster

export ETCDCTL_API=3
NEW_FQDN=kubestagemaster2005.codfw.wmnet
etcdctl --endpoints https://$(hostname -f):2379 member add "${NEW_FQDN%%.*}" --peer-urls="https://${NEW_FQDN}:2380"

Stop puppet on the new control-plane
Add the master_stacked role to the new control-pane, add it to the cluster and conftool: gerrit, Kubernetes/Clusters/Add or remove nodes - Wikitech
Run puppet on existing apiservers and nodes

DC=codfw
sudo cumin -b 1 -s 60 "A:wikikube-staging-master and A:${DC}" 'run-puppet-agent -q'
sudo cumin -b 15 -s 5 "A:wikikube-staging-worker and A:${DC}" 'run-puppet-agent -q'

Run puppet on the new control-planes
Label the control-plane as master

kubectl label nodes kubestagemaster2005.codfw.wmnet node-role.kubernetes.io/master=""

Set the node to BGP: True in Netbox
Run homer, check with calicoctl node status on the new control-planes if BGP sessions are established
Pool the new control-planes

NEW_FQDN=kubestagemaster2005.codfw.wmnet
sudo confctl select "name=${NEW_FQDN}" set/pooled=yes:weight=10

Uncordon the new control-plane: kubectl uncordon kubestagemaster2005.codfw.wmnet

Remove control-plane(s)

Downtime: sudo cookbook sre.hosts.downtime -t T363307 -D 2 -r decom 'kubestagemaster200[1-2].codfw.wmnet'
Depool

sudo confctl select "name=kubestagemaster200[12].codfw.wmnet" set/pooled=inactive

Set the node to BGP: False in Netbox
Run homer Homer
Drain and cordon control-planes

kubectl drain --ignore-daemonsets --delete-emptydir-data kubestagemaster2001.codfw.wmnet kubestagemaster2002.codfw.wmnet

Disable puppet and stop k8s api-server components

sudo cumin kubestagemaster200[1-2].codfw.wmnet "disable-puppet decom && systemctl stop kube-apiserver.service kube-controller-manager.service kube-scheduler.service"

Run decom cookbook: sudo cookbook sre.hosts.decommission -t T363307 kubestagemaster200[1-2].codfw.wmnet
Remove from site.pp, conftool, hieradata/common/kubernetes.yaml: Decom kubestagemaster200[12]
kubectl delete node kubestagemaster2001.codfw.wmnet kubestagemaster2002.codfw.wmnet
Run puppet on remaining control-planes and nodes

DC=codfw
sudo cumin -b 1 -s 60 "A:wikikube-staging-master and A:${DC}" 'run-puppet-agent -q'
sudo cumin -b 15 -s 5 "A:wikikube-staging-worker and A:${DC}" 'run-puppet-agent -q'

Remove etcd node(s)

Downtime nodes: sudo cookbook sre.hosts.downtime -t T363307 -D 2 -r decom 'kubestagetcd200[1-3].codfw.wmnet'
Move leadership to a node that is not going to be removed

sudo cumin -o txt A:wikikube-staging-etcd-codfw 'ETCDCTL_API=3 etcdctl --endpoints https://$(hostname -f):2379 -w table endpoint status'
# if a to be removed node is leader:
sudo cumin -o txt <FQDN of leader> 'ETCDCTL_API=3 etcdctl --endpoints https://$(hostname -f):2379 move-leader <ID of a different node>'

Remove the nodes from the cluster:

# run on one of the remaining etcd nodes
ETCDCTL_API=3 etcdctl --endpoints https://$(hostname -f):2379 member list
# for all nodes to remove
ETCDCTL_API=3 etcdctl --endpoints https://$(hostname -f):2379 member remove <ID>

Remove the nodes FQDN from the server and client SRV record of the etcd cluster: gerrit, DNS - Wikitech

Run decom

sudo cookbook sre.hosts.decommission -t T363307 'kubestagetcd200[1-3].codfw.wmnet'

Run puppet on remaining control-planes

DC=codfw
sudo cumin -b 1 -s 60 "A:wikikube-staging-master and A:${DC}" 'run-puppet-agent -q'