PAWS/About Jupyter notebooks hosted on PAWS

From Wikitech


Overview

This page provides an explanation of what Jupyter notebooks are and their role in the PAWS service. This page will help you to understand whether Jupyter notebooks hosted on PAWS are the right tool for your project.

It also provides links to pages with example notebooks and offers information about alternatives to PAWS which may be more appropriate for certain projects.

Jupyter notebooks

Jupyter notebooks are open source web apps that allow you to create and share documents that contain live code, equations, visualizations, and text. Jupyter notebooks are incredibly flexible and have many uses. They can function as a lightweight browser-based development environment that allows you to execute code and display output in the form of text, equations, images and more on the same page.

Uses include:

  • Writing and running live code
  • Creating documentation and tutorials
  • Data cleaning, transformation, and analysis
  • Writing and running SQL queries
  • Writing and running scripts and bots to perform tasks on wikis
  • Much more...

Little or no programming skill is required for some typical uses of Jupyter notebooks. They are used by a robust community users in technology and the sciences. There are many examples and resources for new and advanced users to draw on, making them a welcoming and easy-to-use but powerful tool for users along the technical spectrum.

PAWS (A Web Shell)

PAWS: A Web Shell (PAWS) is a service that hosts Jupyter notebooks for use by Wikimedia's contributors. PAWS users can launch, publish, and fork notebooks without having to install Jupyter on a local computer. Users only need a Wikimedia SUL and an internet connected web-browser to use the service.

PAWS makes it easier for volunteers along the spectrum to work in technical spaces and make contributions to Wikimedia's technical projects.

Some ways you can use PAWS for Wikimedia technical projects include:

  • Creating documentation and tutorials
  • Perform queries against wiki replicas
  • Write and run scripts and bots to help support Wikimedia projects (Note here that for heavy duty or scheduled jobs folks should be using Toolforge.)
  • Keeping notes on your work

Resources

The intended usage of PAWS is smaller bots that do not need a great deal of resources. With PAWS you can expect to get access to:

  • 1 CPU
  • 2G RAM
  • 5G Storage

Usage beyond this is likely to be throttled or in the case of storage cleaned up periodically.

Example Uses

Why use PAWS?

  • You are working on a wiki and want to perform automated tasks and maintenance.
  • You are a researcher or data scientist, and you are working with a smaller dataset related to wikis.
  • You have a task or process you want to document for others to use.

What set-up and skills do you need?

Required software

No special software or development environment is needed. You will need access to an internet connected web browser and a Wikimedia account.

Programming languages

Some knowledge of programming, especially how to use a terminal, Python and Markdown can be very useful though not necessary. For many tasks, you'll be able to use recipes and examples to accomplish your goals.

The majority of examples in this documentation will use Python 3 notebooks or the PAWS terminal. You can also create Bash and R notebooks with PAWS. Note: if you are a JavaScript developer and wish to automate some tasks on wikis, consider exploring Gadgets.

Should you use Jupyter notebooks for your project?

Yes!

Jupyter notebooks may be a good fit if:

  • You are using them for exploration, not production.
  • You want to run scripts/bots to perform automated tasks on wikis.
  • You are working with smaller wikis or smaller datasets.
  • You want to create tutorials of documentation.

No!

Jupyter notebooks may not be a good fit if:

  • You want to use code versioning, run tests, or otherwise follow a regular development cycle.
  • You want to run a long, asynchronous task.
  • You are working with a large dataset.
  • You care about the quality of the code. Notebooks do not integrate with IDEs, have no linters or code style correction.
  • You are working on a project unrelated to the wikis.

Other options

While notebooks are a great option for many projects, you may find another tool or service will fit your project's needs better:

  • Data Services: includes services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores. Services currently offered are: Wiki Replicas, ToolsDB, Wikimedia Dumps, Shared Storage, Quarry and PAWS.
  • Toolforge: Toolforge is a hosting environment. Toolforge makes it easy for you to perform analytics, administer bots, run webservices, and create tools. Tools help project editors, technical contributors, and other volunteers who work on Wikimedia projects.

Use responsibly

  • All notebooks hosted on PAWS are available to the public. Don't share private information (passwords, private SSH keys, personal information).
  • PAWS is a service of Wikimedia. Notebooks should relate to and support Wikimedia technical projects.
  • Content hosted on PAWS should follow the Wikimedia Code of Conduct for technical spaces.