PAWS/Tools/Admin/Chico's notes
Appearance
Notes on setting up a PAWS staging env in toolsbeta VPS project (T188428)
Using https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ as docs
- Doesn't seem like PAWS is properly puppetized
- I see the apt-pinning defnitions, but not where tools-paws-master-01 installs needed packages (k8s, docker, etc)
- Docker and k8s repos are defined for Xenial, though tools-paws-master-01 is stretch
- Docker does have current stretch versions, k8s does not (left docker as strech, k8s as xenial)
- Unsure about cgroup driver used by docker. Official docs says to place { "exec-opts": ["native.cgroupdriver=systemd"] } in /etc/docker/daemon.json since the prod version does not have that I'm ommiting it for now
- swap needs to be turned off for docker, done manually
- started the k8s cluster with flannel
- kubeadm init --pod-network-cidr=10.244.0.0/16
- Allow user chicocvenancio to use k8s
- chicocvenancio@toolsbeta-paws-master-01:~$ mkdir -p $HOME/.kube
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Docs say to set /proc/sys/net/bridge/bridge-nf-call-iptables to 1, it already was 1
- get flannel pods (docs mentions v0.9.1 I used latest version)
- exported Yuvi's git-crypt key
- yuvipanda@tools-paws-master-01:~/paws$ sudo -E git-crypt export-key /tmp/paws-key
- chicocvenancio@tools-paws-master-01:~$ sudo chown chicocvenancio /tmp/paws-key
- chicocvenancio@tools-paws-master-01:~$ scp /tmp/paws-key toolsbeta-paws-master-01.toolsbeta:~/paws-key
- Intalled git-crypt
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo apt-get install git-crypt
- ran into Error: could not find tiller
- fixed with `helm init`
- Tiller won't start due to a lack of nodes
- Create new node
- Is there a way to adhere to naming convention when using instance count in horizon?
- toolsbeta-paws-worker-1001
- Since its not puppetized, going for manual again
- Node joining brings up tiller
- Once tiller is up we can run "sudo ./build.py deploy prod --install" to install PAWS
- We need to fix two things Yuvi did CLI and did not push to repo (yuvi's .bash_history invaluable to get these right)
- Tiller RBAC
- Done now by setting a non ideal permissive clusterrolebinding
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl create clusterrolebinding permissive-binding --clusterrole=cluster-admin --user=admin --user=kubelet --group=system:serviceaccounts
- Done now by setting a non ideal permissive clusterrolebinding
- the hub_db pvc is not defined at all in the repo, found the definition in yuvi's .bash_history
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl -n prod apply -f /mnt/nfs/labstore-secondary-home/yuvipanda/paw-c/hub-pv.yaml
- Tiller RBAC
- We need to fix two things Yuvi did CLI and did not push to repo (yuvi's .bash_history invaluable to get these right)
TODO: setup a new OAuth consumer for PAWS-betaTODO: stop these annoying pre-puller deamonsets- This is actually not annoying and good once I pointed them to working docker repositories
- Right now paws-beta uses a completely different way (WMCS-wise) for traffic ingress, I thought this simpler than copying the paws-proxy instance, in fact we can probably drop those instances (VPS cloud project) and improve production after some testing
- I did not get the ideal k8s LoadBalancer service to work with external IPs, instead I used a NodePort service and pointed a webproxy to one of the nodes (any will do)
- This does mean that if that node fails the site will proxy will fail, which is NOT ok for production
- I did not get the ideal k8s LoadBalancer service to work with external IPs, instead I used a NodePort service and pointed a webproxy to one of the nodes (any will do)
- Differences between PAWS-beta and prod:
- Already without the query-killer image
- Deploy-hook image uses artful and not zesty
- Per above, traffic ingress is different
Notes on K8S upgrade in PAWS-beta
cluster control plane
- Upgrade kubeadm
- curl -sSL https://dl.k8s.io/release/v1.9.4/bin/linux/amd64/kubeadm > /usr/bin/kubeadm
- chmod a+rx /usr/bin/kubeadm
- Verify kubeadm version
- chicocvenancio@toolsbeta-paws-master-01:~$ kubeadm version
- kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
- root@toolsbeta-paws-master-01:~# kubeadm upgrade plan
- kubeadm upgrade apply v1.9.4
Upgrade nodes
- drain each node
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- toolsbeta-paws-master-01 Ready master 13d v1.9.3
- toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.3
- toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.3
- toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.3
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl drain toolsbeta-paws-master-01 --ignore-daemonsets
- upgrade k8s:
- sudo apt-get update
- sudo apt-get install kubeadm=1.9.4-00 kubectl=1.9.4-00 kubelet=1.9.4-00
- verify kubelet is running
- systemctl status kubelet
- Uncordon node and verify it is ready
- kubectl uncordon toolsbeta-paws-master-01
- kubectl get nodes
- Rinse, repeat for each node until all nodes in version v1.9.4:
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- toolsbeta-paws-master-01 Ready master 13d v1.9.4
- toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.4
- toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.4
- toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.4
k8s upgrade in PAWS
- Went through the same steps as beta, but there are more nodes, and ran into a few new issues
- apt-pinning by puppet
- This is done by raising a version's priority, to keep the workflow and guarantee the "drain => upgrade => uncordon => next node" order used "=1.9.4-00" to request specific version to apt-get install
- nodes taking a very long time to drain
- Google led me to https://medium.com/@felipedutratine/when-you-try-to-drain-a-kubernetes-node-but-it-blocks-5aba9592d7c9
- get the pods on the node and delete the ones still there
- kubectl get pods -o wide --all-namespaces|grep worker-1001
- kube-system kube-flannel-ds-stbq6 1/1 Running 1 63d 10.68.23.135 tools-paws-worker-1001
- kube-system kube-proxy-zzrj2 1/1 Running 0 3h 10.68.23.135 tools-paws-worker-1001
- prod proxy-5cd7d56555-tm4p6 2/2 Running 0 20d 10.244.7.45 tools-paws-worker-1001
- support support-nginx-ingress-controller-sbl64 1/1 Running 4 63d 10.68.23.135 tools-paws-worker-1001
- Mind kube-flannel, support-nginx-ingress-controller, and kube-proxy are DeamonSet controlled, so they wouldn't interfere
- chicocvenancio@tools-paws-master-01:~$ kubectl delete pod proxy-5cd7d56555-tm4p6 -n prod
- kubectl get pods -o wide --all-namespaces|grep worker-1001
- get the pods on the node and delete the ones still there
- Google led me to https://medium.com/@felipedutratine/when-you-try-to-drain-a-kubernetes-node-but-it-blocks-5aba9592d7c9
- Created bash script to go through each one (/home/chicocvenancio/update_nodes.sh)
- Sent Change 419599 to pin k8s to version 1.9.4 in PAWS.