Data Platform/Systems/Jupyter/Administration
System Overview
conda-analytics
conda-analytics is a custom Debian package of Miniconda that includes additional packages useful for analytics at WMF. conda-analytics is installed on all analytics client (AKA stat boxes) and worker nodes. It installs to /opt/conda-analytics.
conda-analytics includes the Python and other packages we need to run JupyterHub. The configuration and setup of JupyterHub is done by Puppet. Users can ssh tunnel to an analytics client node and access JupyterHub over HTTP. JupyterHub is configured to authenticate users via LDAP (and also restricts them to a few POSIX groups). It is also configured to work with conda-analytics cloned conda environments via a custom CondaEnvProfilesSpawner which can create and activate new user Conda environments. After authentication, the user is prompted with a list of Conda environments to use. Their Jupyter Notebook Server process is then launched using a SystemdSpawner running the jupyterhub-singleuser command out of the user's Conda environment, e.g. in /home/otto/.conda/envs/2020-12-13T19.40.09_otto/bin/jupyterhub-singleuser.
Note that JupyterHub runs out of /opt/conda-analytics, and the user's Jupyter Notebook Server (to which JupyterHub proxies) runs out of the user's selected Conda environment. This means that the user's conda environment is ephemeral and can be discarded at will by the user, or if really needed, by an administrator. We can upgrade conda-analytics, and users can install whatever packages they might need into their Conda environments.