PAWS/Introduction

From Wikitech
Jump to navigation Jump to search

This is a short primer on notebooks, jupyter, motivation and the need for a notebooks as a service infrastructure. See also PAWS/Tools and PAWS/Internal.

What are Jupyter notebooks?

Jupyter notebooks is a web application that enables interactive computing, by allowing users to create and share documents that contain live code, visualizations such as graphs, rich text etc. They are a powerful tool that enables data analysis and scientific research, and also transforms the way in which programmers write code - by enabling an exploratory environment with a quick feedback loop, and a low barrier for entry through it's easy to use graphical interface.

Motivation for notebooks

Usability

(Some of these notes are drawn from User:Sumanah's talk on Inessential Weirdness in Free Software)

No command line tax

The norm for all programmers is the expectation to learn to use the command line along with any language or tool they are seeking to learn. This is merely incidental complexity and the following articles talk about why the command line is not usable due to the cognitive load it places on new users and how it is an unfriendly but necessary tax to be paid before doing any interesting work or research. [1][2][3]

No environment installation

Time to go from a set of instructions to a working development environment is steep for most open source projects. Error messages during setup are hardly user friendly.

Reproducibility

Reproducibility is defined as the ability of an entire experiment or study to be duplicated, either by the same researcher or by someone else working independently. In the context of software development and software enabled research, reproducibility involves replicating code, data and the environment - which is incredibly hard. Notebooks pave the way for this being a possibility - by enabling publishing work along with the data and environment, and allowing others to start exactly where they left off.

Mental Aid and Pedagogy

Notebooks become mental aids for learning - they minimize the user’s memory load by making actions and options visible, they help visualize and understand every step of the way as a problem is being solved, and they engage the learner by being an interactive tool.

A corollary would follow that they are also effective teaching tools. See http://norvig.com/ipython/README.html for a list of notebooks by Peter Norvig that explain complicated CS algorithms and more.

Efficiency

Why notebooks as a service?

Jupyter notebooks are already available for everyone to use, and can be setup by anyone on a local machine, and the notebooks published to the web. However, if you find an awesome notebook on the internet that you'd like to interact with - which is the point of notebooks - isn't it a bit ironic that you first have to download them, install jupyter, run the notebooks server, and then start using them? Notebooks attempt to abstract away the pains of setting up a development environment to start writing code in any language, but you still have to set the notebooks server up! Notebooks as a service goes one step further by providing a graphical web interface to launch notebooks, save your work, publish them, and allow others to fork them.

Why should we work on this?

  • Plenty of open data sources that can be exposed through the service
  • Lack of free and open source services for running notebooks on the web, and our unique position to build an open source service that will be stably maintained
  • Contributing to the Jupyter project upstream to help build features for running notebooks as a service
  • Help create a tool that makes data analysis and research on wikimedia data easy, and encourage collaborative work