PAWS/Tools/Admin
This document is deprecated and describes an old cluster.
Introduction
PAWS is a Jupyterhub deployment that runs in it's own Kubernetes cluster (separate from Toolforge K8s) in the tools project on Wikimedia Cloud VPS. It is accessible at https://hub-paws.wmcloud.org, and is a public service that can authenticated to via Wikimedia OAuth. More end-user info is at PAWS/Tools.
Kubernetes cluster
Deployment
The PAWS Kubernetes cluster is not puppetized at this point (see T188912). It was originally deployed using custom scripts hosted at https://github.com/data-8/kubeadm-bootstrap that use kubeadm, a tool that helps bootstrap a Kubernetes cluster on a bare-metal cluster.
We have the kubeadm-bootstrap repo cloned at /home/yuvipanda
on tools-paws-master-01.tools
(and since this is on NFS, accessible everywhere else on tools). The cluster specific config/secrets are at ~/kubeadm-bootstrap/data
.
The README at https://github.com/data-8/kubeadm-bootstrap/blob/master/README.md has good info on how to set up the cluster. Our cluster has small variations to accommodate for installing on Debian (not Ubuntu Xenial as the current kubeadm-bootstrap code is setup for).
# install-kubeadm.bash looks slightly different
yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ git diff install-kubeadm.bash
diff --git a/install-kubeadm.bash b/install-kubeadm.bash
index c6ce3bb..3aae948 100755
--- a/install-kubeadm.bash
+++ b/install-kubeadm.bash
@@ -1,24 +1,29 @@
#!/bin/bash
-apt-get update
+apt-get update
apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
+curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
+deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable
EOF
+
apt-get update
# Install docker if you don't have it already.
-apt-get install -y docker-engine
+systemctl stop docker
+apt-get purge -y docker-engine
+rm -rf /var/lib/docker/*
+apt-get install -y docker-ce
-# Make sure you're using the overlay driver!
-# Note that this gives us docker 1.11, which does *not* support overlay2
systemctl stop docker
modprobe overlay
-echo '{"storage-driver": "overlay"}' > /etc/docker/daemon.json
rm -rf /var/lib/docker/*
+echo '{"storage-driver": "overlay2"}' > /etc/docker/daemon.json
systemctl start docker
# Install kubernetes components!
apt-get install -y kubelet kubeadm kubernetes-cni
# init-worker.bash looks slightly different too:
yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ git diff init-worker.bash
diff --git a/init-worker.bash b/init-worker.bash
index 579e9fa..36557ee 100755
--- a/init-worker.bash
+++ b/init-worker.bash
@@ -5,4 +5,4 @@ set -e
source data/config.bash
source data/secrets.bash
-kubeadm join --token "${KUBEADM_TOKEN}" "${KUBE_MASTER_IP}":6443
+kubeadm join --skip-preflight-checks --token "${KUBEADM_TOKEN}" "${KUBE_MASTER_IP}":6443
We also have a change-docker.bash
script:
yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ cat change-docker.bash
#!/bin/bash
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -A
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
$(lsb_release -cs) \
stable"
sudo apt-get update
All the above changes should be tracked persistently in git somewhere, but logged here while that's not the case.
Architecture
[Coming Soon]
Current Setup
- Currently, the k8s keys are on yuvipanda and chicocvenancio's
$HOME/.kube/config
folders so admins may impersonate either of these users to runkubectl
on this k8s cluser, or copy the file into their own home folder. - The k8s master is at
tools-paws-master-01.tools.eqiad1.wikimedia.cloud
- For k8s nodes summary, see
kubectl get node
yuvipanda@tools-paws-master-01:~$ kubectl get node NAME STATUS AGE VERSION tools-paws-master-01 Ready 113d v1.7.3 tools-paws-worker-1001 Ready 35d v1.8.0 tools-paws-worker-1002 Ready 98d v1.7.3 tools-paws-worker-1003 Ready 98d v1.7.3 tools-paws-worker-1005 Ready 98d v1.7.3 tools-paws-worker-1006 Ready 98d v1.7.3 tools-paws-worker-1007 Ready 98d v1.7.3 tools-paws-worker-1010 Ready 98d v1.7.3 tools-paws-worker-1013 Ready 98d v1.7.3 tools-paws-worker-1016 Ready 98d v1.7.3 tools-paws-worker-1017 Ready,SchedulingDisabled 98d v1.7.3 tools-paws-worker-1019 Ready 35d v1.8.0
- Helm is used to deploy kubernetes applications on the cluster. It is installed during the cluster bootstrap process, and is also used in turn to install add-ons
nginx-ingress
andkube-lego
. Helm has two parts: a client (helm) and a server (tiller). - To see status of k8s control plane pods (running kubedns, kube-proxy, flannel, etcd, kube-apiserver, kube-controller-manager, tiller), see
kubectl --namespace=kube-system get pod -o wide
.
Jupyterhub deployment
Jupyterhub & PAWS Components
Jupyterhub is a set of systems deployed together that provide Jupyter notebook servers per user. The three main subsystems for Jupyterhub are the Hub, Proxy, and the Single-User Notebook Server. Really good overview of these systems is available at http://jupyterhub.readthedocs.io/en/latest/reference/technical-overview.html.
Paws is a Jupyterhub deployment (Hub, Proxy, Single-User Notebook Server) with some added bells and whistles. The additional PAWS specific parts of our deployment are:
- db-proxy: Mysql-proxy plugin script to perform simple authentication to the Wiki Replicas. See https://github.com/yuvipanda/paws/blob/master/images/db-proxy/auth.lua
- nbserve and render:
nbserve
is the nginx proxy being run from toolforge in thepaws-public
tool that handles URL rewriting for paws-public URLs, andrender
handles the actual rendering of the ipynb notebook as a static page. Together they makepaws-public
possible.
Deployment
- The PAWS repository is at https://github.com/yuvipanda/paws, and is cloned at
/home/yuvipanda
on tools. - PAWS is deployed with Travis CI, and the dashboard is at https://travis-ci.org/yuvipanda/paws. The configuration for the Travis builds are at https://github.com/yuvipanda/paws/blob/master/.travis.yml, and builds and deploys launch the
travis-script.bash
script with appropriate parameters.
Database
JupyterHub uses a database to keep the user state, currently it uses ToolsDB.
Moving to sqlite
During toolsdb outages we can change the db to in memory sqlite without significant impact.
From tools-paws-master-01 (grab k8s credentials from /home/chicocvenancio/.kube
if necessary):
kubectl --namespace prod edit configmap hub-config
set hub.db_url
to "sqlite://"
Restart the hub with kubectl --namespace prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ')
To move it back you can set hub.db_url
to the previous value (see /home/chicocvenancio/paws/paws/secrets.yaml
at jupyterhub.hub.db.url
) and restart the hub with kubectl --namespace prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ')
.
In doubt you can also reset the cluster with helm by going into the /home/chicocvenancio/paws/
directory and run the deploy script with ./build.py deploy prod
.