PAWS/Tools/Admin/Chico's notes

Notes on setting up a PAWS staging env in toolsbeta VPS project (T188428)

Using https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ as docs

Doesn't seem like PAWS is properly puppetized
- I see the apt-pinning defnitions, but not where tools-paws-master-01 installs needed packages (k8s, docker, etc)
Docker and k8s repos are defined for Xenial, though tools-paws-master-01 is stretch
- Docker does have current stretch versions, k8s does not (left docker as strech, k8s as xenial)
Unsure about cgroup driver used by docker. Official docs says to place { "exec-opts": ["native.cgroupdriver=systemd"] } in /etc/docker/daemon.json since the prod version does not have that I'm ommiting it for now
swap needs to be turned off for docker, done manually
started the k8s cluster with flannel
- kubeadm init --pod-network-cidr=10.244.0.0/16
Allow user chicocvenancio to use k8s
- chicocvenancio@toolsbeta-paws-master-01:~$ mkdir -p $HOME/.kube
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
Docs say to set /proc/sys/net/bridge/bridge-nf-call-iptables to 1, it already was 1
get flannel pods (docs mentions v0.9.1 I used latest version)
- kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
exported Yuvi's git-crypt key
- yuvipanda@tools-paws-master-01:~/paws$ sudo -E git-crypt export-key /tmp/paws-key
- chicocvenancio@tools-paws-master-01:~$ sudo chown chicocvenancio /tmp/paws-key
- chicocvenancio@tools-paws-master-01:~$ scp /tmp/paws-key toolsbeta-paws-master-01.toolsbeta:~/paws-key
Intalled git-crypt
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo apt-get install git-crypt
ran into Error: could not find tiller
- fixed with `helm init`
Tiller won't start due to a lack of nodes
Create new node
- Is there a way to adhere to naming convention when using instance count in horizon?
- toolsbeta-paws-worker-1001
  - Since its not puppetized, going for manual again
- Node joining brings up tiller
Once tiller is up we can run "sudo ./build.py deploy prod --install" to install PAWS
- We need to fix two things Yuvi did CLI and did not push to repo (yuvi's .bash_history invaluable to get these right)
  - Tiller RBAC
    - Done now by setting a non ideal permissive clusterrolebinding
      - chicocvenancio@toolsbeta-paws-master-01:~$ kubectl create clusterrolebinding permissive-binding --clusterrole=cluster-admin --user=admin --user=kubelet --group=system:serviceaccounts
  - the hub_db pvc is not defined at all in the repo, found the definition in yuvi's .bash_history
    - chicocvenancio@toolsbeta-paws-master-01:~$ kubectl -n prod apply -f /mnt/nfs/labstore-secondary-home/yuvipanda/paw-c/hub-pv.yaml

~~TODO: setup a new OAuth consumer for PAWS-beta~~
- Waiting aproval https://meta.wikimedia.org/w/index.php?title=Special:OAuthListConsumers/view/bb705f2027fbdb8fc434d4fdf9a7c482&name=PAWS-beta&publisher=&stage=0
~~TODO: stop these annoying pre-puller deamonsets~~
- This is actually not annoying and good once I pointed them to working docker repositories
Right now paws-beta uses a completely different way (WMCS-wise) for traffic ingress, I thought this simpler than copying the paws-proxy instance, in fact we can probably drop those instances (VPS cloud project) and improve production after some testing
- I did not get the ideal k8s LoadBalancer service to work with external IPs, instead I used a NodePort service and pointed a webproxy to one of the nodes (any will do)
  - This does mean that if that node fails the site will proxy will fail, which is NOT ok for production
Differences between PAWS-beta and prod:
- Already without the query-killer image
- Deploy-hook image uses artful and not zesty
- Per above, traffic ingress is different

Notes on K8S upgrade in PAWS-beta

Using docs https://kubernetes.io/docs/tasks/administer-cluster/kubeadm-upgrade-1-9/

cluster control plane

Upgrade kubeadm
- curl -sSL https://dl.k8s.io/release/v1.9.4/bin/linux/amd64/kubeadm > /usr/bin/kubeadm
- chmod a+rx /usr/bin/kubeadm
Verify kubeadm version
- chicocvenancio@toolsbeta-paws-master-01:~$ kubeadm version
- kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
root@toolsbeta-paws-master-01:~# kubeadm upgrade plan
kubeadm upgrade apply v1.9.4

Upgrade nodes

drain each node
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- toolsbeta-paws-master-01 Ready master 13d v1.9.3
- toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.3
- toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.3
- toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.3
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl drain toolsbeta-paws-master-01 --ignore-daemonsets
upgrade k8s:
- sudo apt-get update
- sudo apt-get install kubeadm=1.9.4-00 kubectl=1.9.4-00 kubelet=1.9.4-00
verify kubelet is running
- systemctl status kubelet
Uncordon node and verify it is ready
- kubectl uncordon toolsbeta-paws-master-01
- kubectl get nodes
Rinse, repeat for each node until all nodes in version v1.9.4:
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- toolsbeta-paws-master-01 Ready master 13d v1.9.4
- toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.4
- toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.4
- toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.4

k8s upgrade in PAWS

Went through the same steps as beta, but there are more nodes, and ran into a few new issues
apt-pinning by puppet
- This is done by raising a version's priority, to keep the workflow and guarantee the "drain => upgrade => uncordon => next node" order used "=1.9.4-00" to request specific version to apt-get install
nodes taking a very long time to drain
- Google led me to https://medium.com/@felipedutratine/when-you-try-to-drain-a-kubernetes-node-but-it-blocks-5aba9592d7c9
  - get the pods on the node and delete the ones still there
    - kubectl get pods -o wide --all-namespaces|grep worker-1001
      - kube-system kube-flannel-ds-stbq6 1/1 Running 1 63d 10.68.23.135 tools-paws-worker-1001
      - kube-system kube-proxy-zzrj2 1/1 Running 0 3h 10.68.23.135 tools-paws-worker-1001
      - prod proxy-5cd7d56555-tm4p6 2/2 Running 0 20d 10.244.7.45 tools-paws-worker-1001
      - support support-nginx-ingress-controller-sbl64 1/1 Running 4 63d 10.68.23.135 tools-paws-worker-1001
    - Mind kube-flannel, support-nginx-ingress-controller, and kube-proxy are DeamonSet controlled, so they wouldn't interfere
      - chicocvenancio@tools-paws-master-01:~$ kubectl delete pod proxy-5cd7d56555-tm4p6 -n prod
Created bash script to go through each one (/home/chicocvenancio/update_nodes.sh)
Sent Change 419599 to pin k8s to version 1.9.4 in PAWS.