PAWS/Tools/Admin

This page contains historical information. It may be outdated or unreliable.

This document is deprecated and describes an old cluster.

Introduction

PAWS is a Jupyterhub deployment that runs in it's own Kubernetes cluster (separate from Toolforge K8s) in the tools project on Wikimedia Cloud VPS. It is accessible at https://hub-paws.wmcloud.org, and is a public service that can authenticated to via Wikimedia OAuth. More end-user info is at PAWS/Tools.

Kubernetes cluster

Deployment

The PAWS Kubernetes cluster is not puppetized at this point (see T188912). It was originally deployed using custom scripts hosted at https://github.com/data-8/kubeadm-bootstrap that use kubeadm, a tool that helps bootstrap a Kubernetes cluster on a bare-metal cluster.

We have the kubeadm-bootstrap repo cloned at /home/yuvipanda on tools-paws-master-01.tools (and since this is on NFS, accessible everywhere else on tools). The cluster specific config/secrets are at ~/kubeadm-bootstrap/data.

The README at https://github.com/data-8/kubeadm-bootstrap/blob/master/README.md has good info on how to set up the cluster. Our cluster has small variations to accommodate for installing on Debian (not Ubuntu Xenial as the current kubeadm-bootstrap code is setup for).

# install-kubeadm.bash looks slightly different

yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ git diff install-kubeadm.bash
diff --git a/install-kubeadm.bash b/install-kubeadm.bash
index c6ce3bb..3aae948 100755
--- a/install-kubeadm.bash
+++ b/install-kubeadm.bash
@@ -1,24 +1,29 @@
 #!/bin/bash
-apt-get update
+apt-get update
 apt-get install -y apt-transport-https
 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
+curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
 cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
 deb http://apt.kubernetes.io/ kubernetes-xenial main
+deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable
 EOF
+
 apt-get update

 # Install docker if you don't have it already.

-apt-get install -y docker-engine
+systemctl stop docker
+apt-get purge -y docker-engine
+rm -rf /var/lib/docker/*
+apt-get install -y docker-ce

-# Make sure you're using the overlay driver!
-# Note that this gives us docker 1.11, which does *not* support overlay2

 systemctl stop docker
 modprobe overlay
-echo '{"storage-driver": "overlay"}' > /etc/docker/daemon.json
 rm -rf /var/lib/docker/*
+echo '{"storage-driver": "overlay2"}' > /etc/docker/daemon.json
 systemctl start docker

 # Install kubernetes components!
 apt-get install -y kubelet kubeadm kubernetes-cni

# init-worker.bash looks slightly different too:

yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ git diff init-worker.bash
diff --git a/init-worker.bash b/init-worker.bash
index 579e9fa..36557ee 100755
--- a/init-worker.bash
+++ b/init-worker.bash
@@ -5,4 +5,4 @@ set -e
 source data/config.bash
 source data/secrets.bash

-kubeadm join --token "${KUBEADM_TOKEN}"  "${KUBE_MASTER_IP}":6443
+kubeadm join --skip-preflight-checks --token "${KUBEADM_TOKEN}"  "${KUBE_MASTER_IP}":6443

We also have a change-docker.bash script:

yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ cat change-docker.bash
#!/bin/bash
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -A
sudo add-apt-repository \
       "deb [arch=amd64] https://download.docker.com/linux/debian \
          $(lsb_release -cs) \
             stable"
sudo apt-get update

All the above changes should be tracked persistently in git somewhere, but logged here while that's not the case.

Architecture

[Coming Soon]

Current Setup

Currently, the k8s keys are on yuvipanda and chicocvenancio's $HOME/.kube/config folders so admins may impersonate either of these users to run kubectl on this k8s cluser, or copy the file into their own home folder.
The k8s master is at tools-paws-master-01.tools.eqiad1.wikimedia.cloud

For k8s nodes summary, see kubectl get node

yuvipanda@tools-paws-master-01:~$ kubectl get node
NAME                     STATUS                     AGE       VERSION
tools-paws-master-01     Ready                      113d      v1.7.3
tools-paws-worker-1001   Ready                      35d       v1.8.0
tools-paws-worker-1002   Ready                      98d       v1.7.3
tools-paws-worker-1003   Ready                      98d       v1.7.3
tools-paws-worker-1005   Ready                      98d       v1.7.3
tools-paws-worker-1006   Ready                      98d       v1.7.3
tools-paws-worker-1007   Ready                      98d       v1.7.3
tools-paws-worker-1010   Ready                      98d       v1.7.3
tools-paws-worker-1013   Ready                      98d       v1.7.3
tools-paws-worker-1016   Ready                      98d       v1.7.3
tools-paws-worker-1017   Ready,SchedulingDisabled   98d       v1.7.3
tools-paws-worker-1019   Ready                      35d       v1.8.0

Helm is used to deploy kubernetes applications on the cluster. It is installed during the cluster bootstrap process, and is also used in turn to install add-ons nginx-ingress and kube-lego. Helm has two parts: a client (helm) and a server (tiller).
To see status of k8s control plane pods (running kubedns, kube-proxy, flannel, etcd, kube-apiserver, kube-controller-manager, tiller), see kubectl --namespace=kube-system get pod -o wide.

Jupyterhub deployment

Jupyterhub & PAWS Components

Jupyterhub is a set of systems deployed together that provide Jupyter notebook servers per user. The three main subsystems for Jupyterhub are the Hub, Proxy, and the Single-User Notebook Server. Really good overview of these systems is available at http://jupyterhub.readthedocs.io/en/latest/reference/technical-overview.html.

Paws is a Jupyterhub deployment (Hub, Proxy, Single-User Notebook Server) with some added bells and whistles. The additional PAWS specific parts of our deployment are:

db-proxy: Mysql-proxy plugin script to perform simple authentication to the Wiki Replicas. See https://github.com/yuvipanda/paws/blob/master/images/db-proxy/auth.lua
nbserve and render: nbserve is the nginx proxy being run from toolforge in the paws-public tool that handles URL rewriting for paws-public URLs, and render handles the actual rendering of the ipynb notebook as a static page. Together they make paws-public possible.

Deployment

The PAWS repository is at https://github.com/yuvipanda/paws, and is cloned at /home/yuvipanda on tools.
PAWS is deployed with Travis CI, and the dashboard is at https://travis-ci.org/yuvipanda/paws. The configuration for the Travis builds are at https://github.com/yuvipanda/paws/blob/master/.travis.yml, and builds and deploys launch the travis-script.bash script with appropriate parameters.

Database

JupyterHub uses a database to keep the user state, currently it uses ToolsDB.

Moving to sqlite

During toolsdb outages we can change the db to in memory sqlite without significant impact.

From tools-paws-master-01 (grab k8s credentials from /home/chicocvenancio/.kube if necessary):

kubectl --namespace prod edit configmap hub-config

set hub.db_url to "sqlite://"

Restart the hub with kubectl --namespace prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ')

To move it back you can set hub.db_url to the previous value (see /home/chicocvenancio/paws/paws/secrets.yaml at jupyterhub.hub.db.url) and restart the hub with kubectl --namespace prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ') .

In doubt you can also reset the cluster with helm by going into the /home/chicocvenancio/paws/ directory and run the deploy script with ./build.py deploy prod.