From Wikitech
Jump to navigation Jump to search

System Overview

Our Jupyter installation works as follows.

anaconda-wmf is a custom Debian package of Anaconda that includes additional packages useful for analytics at WMF. anaconda-wmf is installed on all analytics client (AKA stat boxes) and worker nodes. It installs to /usr/lib/anaconda-wmf. See the Analytics/Systems/Anaconda documentation for how this works.

anaconda-wmf includes the Python and other packages we need to run JupyterHub. The configuration and setup of JupyterHub is done by Puppet. Users can ssh tunnel to an analytics client node and access JupyterHub over HTTP. JupyterHub is configured to authenticate users via LDAP (and also restricts them to a few POSIX groups). It is also configured to work with anconda-wmf and 'stacked' conda environments via a custom CondaEnvProfilesSpawner which can create and activate new user Conda environments. After authentication, the user is prompted with a list of Conda environments to use. Their Jupyter Notebook Server process is then launched using a SystemdSpawner running the jupyterhub-singleuser command out of the user's Conda environment, e.g. in /home/otto/.conda/envs/2020-12-13T19.40.09_otto/bin/jupyterhub-singleuser.

Note that JupyterHub runs out of /usr/lib/anaconda-wmf, and the user's Jupyter Notebook Server (to which JupyterHub proxies) runs out of the user's selected Conda environment. This means that the user's conda environment is ephemeral and can be discarded at will by the user, or if really needed, by an administrator. We can upgrade anaconda-wmf, and users can install whatever packages they might need into their Conda environments.