User:SRodlund/myfirstpywikibot (staging)

From Wikitech
Jump to navigation Jump to search

Overview

This page captures ongoing improvements to Help:Toolforge/My first Pywikibot tool

A great deal of collaboration has taken place on this document over the last few years. You can read the history here on the Phabricator task: https://phabricator.wikimedia.org/T134495#6265014

This document is closely linked to Help:Toolforge/Pywikibot, and both pages will be updated in this draft.

Initial notes for drafting

  • My First Pywikibot Tool is intended for users who are not familiar with Toolforge.
  • The tutorial should include some direction for users about how to get started with Toolforge and some pointers to useful pages on Toolforge.
  • While this document will not cover PAWS it should point to PAWS and local installation as alternatives to Toolforge.
  • This page should also link to mw:Manual:Pywikibot, but make it very clear that these are two separate processes.
  • Be aware of this doc: https://doc.wikimedia.org/pywikibot/master/index.html
  • Include a link to wayfinding doc(scheduled to be written) to help folks to decide which service is appropriate for their needs.
  • The ticket mentions drafting the tutorial in a style similar to Heroku's docs. Do an initial review of this.
  • RE: screenshots. There are some requests on the Phabricator task for screenshots. I'm reticent to add screenshots as these are likely to go out of date and may not be updated later. Code samples will be included.

Page staging

Overview

Pywikibot is a Python library and collection of scripts that automate work on MediaWiki sites. Originally designed for Wikipedia, it is now used throughout the Wikimedia Foundation's projects and on many other wikis.

Short explainer of what Pywikibot is/does, with real examples

Installing and running Pywikibot

Pywikibot can be installed and run on your own computer, with PAWS, or Toolforge. These instructions cover installing and running Piwikibot on Toolforge. See mw:Manual:Pywikibot/Installation for information on other methods.

Pointer to service explainer doc

Getting started with Toolforge

Follow these steps to get started with Toolforge. Once these steps are complete you can create your PyWikibot Tool account.

Create your Pywikibot Tool account

In addition to your own Toolforge account, you also need to create a Tool acoount for your Pywikibot tool.

  1. Navigate to the Create Tool Account page in the Toolforge admin console.
  2. Enter a Unique Tool Name for your tool account.
    Do not prefix your tool name with tools. as this prefix will cause errors.

Note: If you only recently received access to the tools project, you may get an error about appropriate credentials. Log out and back in to fix the issue.

Within a minute or two, Toolforge creates the Tool account and grants you access. If you were logged in through ssh when you created the Tool account, you must log off and log in again

Access your Pywikibot tool

After you create your Pywikibot's tool account, you can log into your tool project with SSH.

$ ssh <unix shell username>@login.tools.wmflabs.org
$ ssh <unix shell username>@login.tools.wmflabs.org
$ ssh <unix shell username>@login.tools.wmflabs.org

From the command line, switch to your tool account:

become <toolname>

You should see the command prompt change to:

tools.<toolname>@tools-bastion:~$

Install Pywikibot

Using your tool account, do the following:

  1. Use git to download Pywikibot:
    git clone --recursive --branch stable https://gerrit.wikimedia.org/r/pywikibot/core.git pywikibot-core
    cd pywikibot-core
    
  2. Set up your bot for selected wikis by running:
    python generate_user_files.py
    

Choose a license

Pywikibot comes with an MIT LICENSE file. Make sure to choose a license for your tool early on. Also see, Help:Tool_Labs/Developing#Licensing_your_source_code

Documentation

Set up a webpage for your tool

You can add webpage for your tool under http://tools.wmflabs.org/TOOLNAME. For example, http://tools.wmflabs.org/my-first-pywikibot-tool. Once you set up a webpage for your tool, the Tool list will link to it.

To set up webpage for you tool:

  1. Log in with your Tool account.
  2. Create a ~/public_html directory.
  3. Create ~/public_html/index.html.
  4. Start the web service:
$ webservice start

Communication and support

We communicate and provide support through several primary channels. Please reach out with questions and to join the conversation.

Communicate with us
Connect Best for
Phabricator Workboard #Cloud-Services Task tracking and bug reporting
IRC Channel #wikimedia-cloud connect General discussion and support
Mailing List cloud@ Information about ongoing initiatives, general discussion and support
Announcement emails cloud-announce@ Information about critical changes (all messages mirrored to cloud@)
News wiki page News Information about major near-term plans
Cloud Services Blog Clouds & Unicorns Learning more details about some of our work
Wikimedia Technology Blog techblog.wikimedia.org News and stories from the Wikimedia technical movement

See: The Pywikibot manual for additional documentation.

The Pywikibot Framework is a collection of Python tools that automate work on MediaWiki sites. Please review mw:Manual:Pywikibot/Installation first.

The stable version of the Pywikibot 'core' branch (formerly 'rewrite') is accessible at /shared/pywikibot/stable. If you are a developer and/or would like to control when the code is updated, you may also choose to install 'core' locally in your tool directory.

Note that the shared 'core' code consists only of the source files; each bot operator will need to create their own configuration files (such as 'user-config.py') and set up a PYTHONPATH and other environment variables. Please see Using the shared Pywikibot files for more information.

Using the shared Pywikibot files (recommended setup)

For most purposes, using the centralized 'core' files is recommended. The shared files are available at /data/project/shared/pywikibot/stable, and steps for configuring your tool account are provided below. The configuration files themselves are stored in your tool account in the $HOME/.pywikibot directory, or another directory, where they can be used via the -dir option (all of this is described in more detail in the instructions).

If you are a developer and/or would like to control when the code is updated, or if you would like to use the 'compat' branch instead of 'core' (not all the Pywikibot scripts have been ported to 'core'), please see Installing Pywikibot locally for instructions.

To set up your Tools account to use the shared 'core' framework:

1. Become your tool-account

maintainer@tools-login:~$ become toolname

2. In your home directory, create (or edit, if it exists already) a '.bash_profile' file to include the following line. The path should be on one line, though it may appear to be on multiple lines depending on your screen width. When you save the .bash_profile file, your settings will be updated for all future shell sessions:

export PYTHONPATH=/data/project/shared/pywikibot/stable:/data/project/shared/pywikibot/stable/scripts

3. Import the path settings into your current session:

tools.tool@tools-login$ source .bash_profile

4. In your home directory, create a subdirectory named '.pywikibot' (the '.' is important!) for bot-related files:

tools.tool@tools-login$ mkdir $HOME/.pywikibot
example of configuration for commons.wikimedia.org

5. Configure Pywikibot.

To create configuration files, use the following command and then follow the instructions. You may also use an existing configuration file (e.g., 'user-config.py') that works on another system by copying it into your .pywikibot directory:

tools.tool@tools-login$ python3 /data/project/shared/pywikibot/stable/generate_user_files.py

6. Test out your setup. In general, all jobs should be run on the grid, but it's fine to test your setup on the command line. You should see the following terminal output (or something similar):

tools.tool@tools-login$ python3 /data/project/shared/pywikibot/stable/scripts/version.py
Pywikibot: [https] r-pywikibot-core.git (df69134, g1, 2020/03/30, 11:17:54, OUTDATED)
Release version: 3.1.dev0
requests version: 2.12.4
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516]

Note that you do not run scripts using pwb.py, but run scripts directly, e.g., python3 /data/project/shared/pywikibot/stable/scripts/version.py. Setting PYTHONPATH means that you no longer need the pwb.py helper script to make, say, import pywikibot work.

If you need to use multiple user-config.py files, you can do so by adding -dir:<path where you want your user-config.py> to every python command. To use the local directory, use -dir:. (colon dot).

For more information about Pywikibot, please see the Pywikibot documentation. The pywikibot mailing list (pywikibot at lists.wikimedia.org) and IRC (irc://irc.freenode.net/pywikibot) channel are good places to go for additional help. Other useful information about using the centralized 'core' files is available here: User:Russell Blau/Using pywikibot on Labs

Setup pywikibot on Toolforge (locally)

Installing pywikibot local to your tool allows you to upgrade whenever it suits you, instead of always running the latest version.

Clone pywikibot git repo

Clone the 'core' git repository:

$ git clone --recursive --branch stable "https://gerrit.wikimedia.org/r/pywikibot/core" $HOME/pywikibot-core

Setup a Python virtual environment for library dependencies

When using a local pywikibot install, we recommend that you use a Python virtual environment (venv) to manage Python library dependencies. The Toolforge environment does provide system packages for many Python libraries, but these are installed using Debian packages which means that they are often older versions and not likely to be upgraded often.

Create a venv. You can give this venv any name you would like. We will use 'pwb' in this example.

$ python3 -m venv $HOME/pwb

Once you have created the venv, you can "activate" it to setup your shell's $PATH so that the python3 and pip3 binaries in the virtual environment are used by default.

$ source $HOME/pwb/bin/activate
(pwb) $

Now that the venv is created and active for your current shell session, we can install the pywikibot code from the git clone we made earlier into this venv. This basically installs the pywikibot core code as a symlink in the venv. This way, if you modify the directory, you don't need to install it again.

(pwb) $ pip3 install --upgrade pip
...
Successfully installed pip-20.0.2
(pwb) $ pip3 install --upgrade setuptools
...
Successfully installed setuptools-46.1.3
(pwb) $ cd $HOME/pywikibot-core
(pwb) $ python3 setup.py develop
...
Finished processing dependencies for pywikibot==3.0.20200326.dev0

Using the virtual environment without activating it

To use the code from outside the virtual environment (for example to submit jobs to the grid engine), use the full paths to the python3 inside your venv directory and the full path to the script you want to run:

$ $HOME/pwb/bin/python3 $HOME/path/to/script.py

Setup job submission

After installing, you can run your bot directly via a shell command, though this is highly discouraged. You should use the grid to run jobs instead.

In order to setup the submission of the jobs you want to execute and use the grid engine you should first read Help:Toolforge/Grid.

To run a bot using the grid, you might want to be in the pywikibot directory (this is not needed) - which means you have to write a small wrapper script. The following example script (versiontest.sh) is used to run version.py:

$ cat versiontest.sh
#!/bin/bash
cd /data/project/shared/pywikibot/stable
python3 version.py

To submit a job, set the permissions for the script and then use the 'jsub' command to send the job to the grid:

$ chmod 0755 versiontest.sh
$ jsub versiontest.sh

Job output will be written to output and error files in your home directory called YOURJOBNAME.out and YOURJOBNAME.err, respectively (versiontest.out and versiontest.err in this example):

$ cat ~/versiontest.out
pywikibot [https] r/pywikibot/compat (r10211, 8fe6bdc, 2013/08/18, 14:00:57, ok)
Python 2.7.3 (default, Aug  1 2012, 05:14:39)
[GCC 4.6.3]
config-settings:
use_api = True
use_api_login = True
unicode test: ok

Example

An infinitely running job such as an irc-bot can be started like this:

$ jsub -once -continuous -l h_vmem=256M -N script_wui python3 $HOME/pywikibot-core/pwb.py script_wui.py -log

or shorter

$ jstart -l h_vmem=256M -N script_wui python3 $HOME/pywikibot-core/pwb.py script_wui.py -log

If you experience problems with your jobs, like e.g.

Fatal Python error: Couldn't create autoTLSkey mapping

you can try increasing the memory value:

$ jstart -l h_vmem=512M -N script_wui python3 $HOME/pywikibot-core/pwb.py script_wui.py -log

Now in order to create a crontab follow scheduling jobs at regular intervals with cron and setup for crontab file like:

$ crontab -e

and enter

PATH=/usr/local/bin:/usr/bin:/bin

# Run script_wui.py at 00:17 UTC each day
17 0 * * * jstart -l h_vmem=512M -N script_wui python3 $HOME/pywikibot-core/pwb.py script_wui.py -log

Using pip

The pip package manager is not installed for global use on the Toolforge servers, but it can be used through the use of virtual environments. The first step is to create a virtual environment, and get the latest version of pip installed in it:

$ python3 -m venv venv
$ source venv/bin/activate
$ pip3 install --upgrade pip

Installing specific packages from pip3 is as simple as loading the environment and then running the pip3 install command, for example:

$ source venv/bin/activate
$ pip3 install PACKAGENAME

Lastly, running a pywikibot script that depends on a pip package will also require loading the environment first, for instance:

$ source venv/bin/activate
$ python3 foo/bar/pwb.py SCRIPTNAME -page:"SOMEPAGE"

The venv does not get automatically activated in Grid job submissions. Two common workarounds are having wrapping shell scripts that activates the venv, or use absolute paths to the binaries within:

$ jstart -N jobname venv/bin/python3 foo/bar/pwb.py SCRIPTNAME -page:"SOMEPAGE"