PAWS/Tools/Admin

From Wikitech
< PAWS‎ | Tools
This page contains historical information. It may be outdated or unreliable.
This document is deprecated and describes an old cluster.

Introduction

PAWS is a Jupyterhub deployment that runs in it's own Kubernetes cluster (separate from Toolforge K8s) in the tools project on Wikimedia Cloud VPS. It is accessible at https://hub-paws.wmcloud.org, and is a public service that can authenticated to via Wikimedia OAuth. More end-user info is at PAWS/Tools.

Kubernetes cluster

Deployment

The PAWS Kubernetes cluster is not puppetized at this point (see T188912). It was originally deployed using custom scripts hosted at https://github.com/data-8/kubeadm-bootstrap that use kubeadm, a tool that helps bootstrap a Kubernetes cluster on a bare-metal cluster.

We have the kubeadm-bootstrap repo cloned at /home/yuvipanda on tools-paws-master-01.tools (and since this is on NFS, accessible everywhere else on tools). The cluster specific config/secrets are at ~/kubeadm-bootstrap/data.

The README at https://github.com/data-8/kubeadm-bootstrap/blob/master/README.md has good info on how to set up the cluster. Our cluster has small variations to accommodate for installing on Debian (not Ubuntu Xenial as the current kubeadm-bootstrap code is setup for).

# install-kubeadm.bash looks slightly different

yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ git diff install-kubeadm.bash
diff --git a/install-kubeadm.bash b/install-kubeadm.bash
index c6ce3bb..3aae948 100755
--- a/install-kubeadm.bash
+++ b/install-kubeadm.bash
@@ -1,24 +1,29 @@
 #!/bin/bash
-apt-get update
+apt-get update
 apt-get install -y apt-transport-https
 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
+curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
 cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
 deb http://apt.kubernetes.io/ kubernetes-xenial main
+deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable
 EOF
+
 apt-get update

 # Install docker if you don't have it already.

-apt-get install -y docker-engine
+systemctl stop docker
+apt-get purge -y docker-engine
+rm -rf /var/lib/docker/*
+apt-get install -y docker-ce

-# Make sure you're using the overlay driver!
-# Note that this gives us docker 1.11, which does *not* support overlay2

 systemctl stop docker
 modprobe overlay
-echo '{"storage-driver": "overlay"}' > /etc/docker/daemon.json
 rm -rf /var/lib/docker/*
+echo '{"storage-driver": "overlay2"}' > /etc/docker/daemon.json
 systemctl start docker

 # Install kubernetes components!
 apt-get install -y kubelet kubeadm kubernetes-cni
# init-worker.bash looks slightly different too:

yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ git diff init-worker.bash
diff --git a/init-worker.bash b/init-worker.bash
index 579e9fa..36557ee 100755
--- a/init-worker.bash
+++ b/init-worker.bash
@@ -5,4 +5,4 @@ set -e
 source data/config.bash
 source data/secrets.bash

-kubeadm join --token "${KUBEADM_TOKEN}"  "${KUBE_MASTER_IP}":6443
+kubeadm join --skip-preflight-checks --token "${KUBEADM_TOKEN}"  "${KUBE_MASTER_IP}":6443

We also have a change-docker.bash script:

yuvipanda@tools-paws-master-01:~/kubeadm-bootstrap$ cat change-docker.bash
#!/bin/bash
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -A
sudo add-apt-repository \
       "deb [arch=amd64] https://download.docker.com/linux/debian \
          $(lsb_release -cs) \
             stable"
sudo apt-get update

All the above changes should be tracked persistently in git somewhere, but logged here while that's not the case.

Architecture

[Coming Soon]

Current Setup

  • Currently, the k8s keys are on yuvipanda and chicocvenancio's $HOME/.kube/config folders so admins may impersonate either of these users to run kubectl on this k8s cluser, or copy the file into their own home folder.
  • The k8s master is at tools-paws-master-01.tools.eqiad1.wikimedia.cloud
  • For k8s nodes summary, see kubectl get node
    yuvipanda@tools-paws-master-01:~$ kubectl get node
    NAME                     STATUS                     AGE       VERSION
    tools-paws-master-01     Ready                      113d      v1.7.3
    tools-paws-worker-1001   Ready                      35d       v1.8.0
    tools-paws-worker-1002   Ready                      98d       v1.7.3
    tools-paws-worker-1003   Ready                      98d       v1.7.3
    tools-paws-worker-1005   Ready                      98d       v1.7.3
    tools-paws-worker-1006   Ready                      98d       v1.7.3
    tools-paws-worker-1007   Ready                      98d       v1.7.3
    tools-paws-worker-1010   Ready                      98d       v1.7.3
    tools-paws-worker-1013   Ready                      98d       v1.7.3
    tools-paws-worker-1016   Ready                      98d       v1.7.3
    tools-paws-worker-1017   Ready,SchedulingDisabled   98d       v1.7.3
    tools-paws-worker-1019   Ready                      35d       v1.8.0
    
  • Helm is used to deploy kubernetes applications on the cluster. It is installed during the cluster bootstrap process, and is also used in turn to install add-ons nginx-ingress and kube-lego. Helm has two parts: a client (helm) and a server (tiller).
  • To see status of k8s control plane pods (running kubedns, kube-proxy, flannel, etcd, kube-apiserver, kube-controller-manager, tiller), see kubectl --namespace=kube-system get pod -o wide.

Jupyterhub deployment

Jupyterhub & PAWS Components

Jupyterhub is a set of systems deployed together that provide Jupyter notebook servers per user. The three main subsystems for Jupyterhub are the Hub, Proxy, and the Single-User Notebook Server. Really good overview of these systems is available at http://jupyterhub.readthedocs.io/en/latest/reference/technical-overview.html.

Paws is a Jupyterhub deployment (Hub, Proxy, Single-User Notebook Server) with some added bells and whistles. The additional PAWS specific parts of our deployment are:

  • db-proxy: Mysql-proxy plugin script to perform simple authentication to the Wiki Replicas. See https://github.com/yuvipanda/paws/blob/master/images/db-proxy/auth.lua
  • nbserve and render: nbserve is the nginx proxy being run from toolforge in the paws-public tool that handles URL rewriting for paws-public URLs, and render handles the actual rendering of the ipynb notebook as a static page. Together they make paws-public possible.

Deployment

Database

JupyterHub uses a database to keep the user state, currently it uses ToolsDB.

Moving to sqlite

During toolsdb outages we can change the db to in memory sqlite without significant impact.

From tools-paws-master-01 (grab k8s credentials from /home/chicocvenancio/.kube if necessary):

kubectl --namespace prod edit configmap hub-config

set hub.db_url to "sqlite://"

Restart the hub with kubectl --namespace prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ')

To move it back you can set hub.db_url to the previous value (see /home/chicocvenancio/paws/paws/secrets.yaml at jupyterhub.hub.db.url) and restart the hub with kubectl --namespace prod delete pod $(kubectl get pods --namespace prod|grep hub|cut -f 1 -d ' ') .

In doubt you can also reset the cluster with helm by going into the /home/chicocvenancio/paws/ directory and run the deploy script with ./build.py deploy prod.