|Tool Labs||Help • FAQ • Rules • Developing • Databases • Job grid • Kubernetes • Web • How to • List of Tools • Glossary|
- 1 Tips for developing a successful tool
- 2 Licensing your source code
- 3 Heavy processing
- 4 Where to put shared Tool code and files
- 5 Pywikibot
- 6 How to use <programming-language-X> to write tools on labs
Tips for developing a successful tool
- License your source code and document that with a LICENSE or COPYING file in the tool's home directory and header comments in the source code. See Help:Tool_Labs/Developing § Licensing your source code for more help on why and how to select a license.
- Use public version control (gerrit, diffusion, GitHub, Bitbucket, ...) for your tool's source code and deploy changes to the Tool Labs servers by updating a checkout of that public version control. See Help:Tool_Labs § Setting up code review and version control for additional information.
- Keep passwords and other credentials (OAuth secrets, etc) separated from the main application code so that they are not exposed publicly in your version control system of choice.
- Create a page in the
Tool:namespace documenting the basics of what your tool does and how to start and stop it.
- Find co-maintainers for your tools who can help out at least with starting/stopping jobs when needed.
- Make many small tools that each do one specific task rather than a catch-all tool that does many different tasks.
Licensing your source code
All code in the ‘tools’ project must be published under an OSI approved open source license. Please add a license at the beginning!
The absence of a license means that default copyright laws apply. Without a clear license you are implicitly claiming copyright without providing an explanation of the rights you are willing to grant to others who wish to use or modify your software. This means that you retain all rights to your source code and that nobody else may reproduce, distribute, or create derivative works from your work until standard copyright lapses. In the United States today that means until 70 years after your death. This is counter to the general principals of the Wikimedia movement.
The two easiest choices are GPL-2.0+ if you want to ensure that all derivative works are made available under the same license terms (this is the license used for MediaWiki itself) or MIT if you only want to ensure that your original work is mentioned in a derivative project. This is a gross simplification of course. See choosealicense.com as a light primer on choosing a license.
If you will be doing heavy processing (e.g., compiles or tool test runs), please use the development environment (tools-dev.wmflabs.org) instead of the primary login host (tools-login.wmflabs.org) so as to help maintain the interactive performance of the primary login host.
The tools-dev host is functionally identical to tools-login.
Sharing files via packages or version control
- Package shared libraries using the package manager for your implementation language: PHP Composer packages, Python PyPI packages, Ruby gems, etc. This is the recommended and most portable and future-proof method of sharing code with multiple projects.
- Shared code can be stored in git submodules, which allow users to keep a git repository within another git repository. Sharing code in this way retains the maintainability and other source controls advantages of git. For more information about git submodules, please see the git documentation.
Sharing files via NFS
Programs running on the job grid and in Kubernetes webservice containers have access to shared NFS directories. This shared access can be used to share files between multiple tool accounts. This is the least portable method of sharing and may not be supported in all future Tool Labs services.
- Access to a tool's code can be delegated to other tools by adding them as service users. The list of service users for a tool can be accessed from the "Manage members" link on Special:NovaServiceGroup. (It may be appropriate to create a new 'tool' to house the shared code.)
- Shared config or other files may be placed in the
/data/project/shareddirectory, which is readable (and potentially writeable) by all Tool Labs tools and users. In this directory are available, for instance:
- a full MediaWiki checkout (core and all extensions in gerrit),
A snapshot of the Pywikibot ‘core’ branch (formerly ‘rewrite’) is maintained at ‘/shared/pywikipedia/core’. The ‘compat’ (formerly ‘trunk’) branch is maintained at ‘/shared/pywikipedia/trunk,’ but because of the possibility of session cookie leaks, as well as the difficulty of using compat in a centralized way, we recommend that you install ‘compat’ locally if you need to use this.
In general, we recommend using the shared ‘core’ files because the code is updated frequently. If you are a developer and/or would like to control when the code is updated, you may also choose to install 'core' locally in your tool directory.
Note that the shared 'core' code consists only of the source files; each bot operator will need to create his or her own configuration files (such as ‘user-config.py’) and set up a PYTHONPATH and other environment variables. Please see Using the shared Pywikibot files for more information.
For most purposes, using the centralized ‘core’ files is recommended as the code is updated frequently. The shared files are available at
/data/project/shared/pywikipedia/core, and steps for configuring your tool account are provided below. The configuration files themselves are stored in your tool account in the
$HOME/.pywikibot directory, or another directory, where they can be used via the -dir option (all of this is described in more detail in the instructions).
If you are a developer and/or would like to control when the code is updated, or if you would like to use the ‘compat’ branch instead of 'core' (not all the Pywikibot scripts have been ported to ‘core’), please see Installing Pywikibot locally for instructions.
To set up your Tools account to use the shared ‘core’ framework:
1. Become your tool-account
maintainer@tools-login:~$ become toolname
2. In your home directory, create (or edit, if it exists already) a ‘.bash_profile’ file to include the following line. The path should be on one line, though it may appear to be on multiple lines depending on your screen width. When you save the .bash_profile file, your settings will be updated for all future shell sessions:
3. Import the path settings into your current session:
tools.tool@tools-login$ source .bash_profile
4. In your home directory, create a subdirectory named ‘.pywikibot’ (the ‘.’ is important!) for bot-related files:
tools.tool@tools-login$ mkdir .pywikibot
5. Configure Pywikibot. To create configuration files, use the following command and then follow the instructions. You may also use an existing configuration file (e.g., ‘user-config.py’) that works on another system by copying it into your .pywikibot directory:
tools.tool@tools-login$ python /data/project/shared/pywikipedia/core/generate_user_files.py
6. Test out your setup. In general, all jobs should be run on the grid, but it’s fine to test your setup on the command line:
tools.tool@tools-login$ python /data/project/shared/pywikipedia/core/scripts/version.py
You should see the following terminal output (or something similar):
Pywikibot [http] branches/rewrite (r11526, 2013/05/12, 18:51:23, OUTDATED) Python 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] unicode test: ok
Note that you do not run scripts using pwb.py, but run scripts directly, e.g.,
python /data/project/shared/pywikipedia/core/scripts/version.py. Setting PYTHONPATH means that you no longer need pwb.py to make, say,
import pywikibot work.
If you need to use multiple user-config.py files, you can do so by adding -dir:<path where you want your user-config.py> to every python command. To use the local directory, use -dir:. (colon dot).
For more information about Pywikibot, please see the Pywikibot documentation. The Pywikipedia mailing list (firstname.lastname@example.org) and IRC (irc://irc.freenode.net/pywikipediabot) channel are good places to go for additional help. Other useful information about using the centralized 'core' files is available here: User:Russell Blau/Using pywikibot on Labs
Setup pywikibot on Labs (locally)
If you want to use the compat branch, we highly recommend installing it locally (it's almost impossible to use the shared files correctly and, if you try, you might leak session cookies to a location where anyone can read them, you might need additional libraries, etc.). For core, you can also install the files locally -- this would allow you to upgrade whenever it suits you, instead of always running the latest version.
Similar to the instructions given in this mail do:
Clone the 'core' git repository:
$ git clone --recursive https://gerrit.wikimedia.org/r/pywikibot/core.git pywikibot-core $ cd pywikibot-core
then you can compress the git repository by running
$ git gc --aggressive --prune $ cd scripts/i18n/ $ git gc --aggressive --prune $ cd ../../externals/httplib2/ $ git gc --aggressive --prune
which results in a repo of size ~9MB.
You have 2 choices on how you want to proceed now and setup core. You can do so by using an additional tool called
virtualenv and install it as module into a virtual environment, or you can run it from sources - similiar like compat - by using the integrated
pwb.py wrapper. For the second method no installation is needed.
- install as module - virtualenv
If you would like to install a local version of the 'core' branch, we recommend that you use virtualenv, which is particularly useful if your code uses a lot of externals (e.g. IRC bots, image handling bots, etc.).
To set up the Pywikibot core branch from cloned repo:
Create a virtualenv. You can call it whatever you'd like (e.g., 'pwb', in this example); shorter names are easier:
$ virtualenv pwb
This will install Python v2.7. To install the version 3:
$ virtualenv -p /usr/bin/python3 pwb
$ source ~/pwb/bin/activate
and then do the following, which basically installs pwb-core as a symlink. This way, if you modify the directory, you don't need to install it again. This will also call python generate_user_files.py:
$ cd pywikibot-core $ python setup.py develop
To use the code from outside the virtual environment (e.g. to submit jobs to the grid engine), use:
$ /data/project/tooluser/pwb/bin/python /data/project/tooluser/path/to/script.py
$ $HOME/pwb/bin/python /home/path/to/script.py
Note: If you want to run a script in interactive mode to debug, you'll need to run
source ~/pwb/bin/activate first.
- run from sources - pwb.py wrapper
After cd'ing into pywikibot-core, run
$ python pwb.py login.py
which will ask a series of questions on how you want to configure your local copy. This will generate the required config files for you. Alternatively, if you have already config file from previous version, you can copy those existing config files into the pywikibot-core directory.
Some bot scripts require extra packages to be installed -- see the file externals/README for more details.
Clone the 'compat' git repository:
$ git clone --recursive https://gerrit.wikimedia.org/r/pywikibot/compat.git pywikibot-compat
Now you have to setup pywikibot, by running
login.py (in fact running any bot script – like e.g. your favourite one – works):
$ cd pywikibot-compat $ python login.py -all
similar as described in the core section above.
You may setup all externals manually if you want - but this is not needed in compat, confer mw:Manual:Pywikibot/Installation#Dependencies for further info. If you do not install them, you may be asked to install some extra packages depending on what scripts you run.
You will also have to enter the password for your bot eventually.
Now you have finished the configuration of compat and can continue setting up the webspace and jobs to execute.
If you want to provide data for download, you need to start a webservice; see the section "Web services" for how to do that.
If you run a bot with the
-log option, you will find the log files within the logs/ directory. If you want to allow users to access it from the web, do
$ cd ~/public_html $ mkdir logs $ cd logs $ ln -s ~/pywikibot-core/logs core
If you want a specific file type to be handled different by your browser, e.g. .log files like text files, see the example under "Header, mimetype, error handler" for how to configure that and (don't forget to) clear your browser's cache afterwards.
Next you might want to consider your cgi-bin directory:
$ cd ~/cgi-bin
follow the hints given at Nova Resource:Tools/Help#Logs exactly, e.g. even the two commands
$ /usr/bin/python # valid $ /usr/bin/env python # in-valid
work and do the same in shell, only the first one is valid and works here, the second is invalid! Another point to mention is that PHP scripts go into public_html, not cgi-bin. Python scripts on the other hand can be placed in public_html or cgi-bin as you wish. I would recommend to use public_html for documents and keep it listable, whereas cgi-bin should be used for CGI scripts and be protected (not listable).
Setup job submission
After installing, you can run your bot directly via a shell command, though this is highly discouraged. You should use the grid to run jobs instead.
In order to setup the submission of the jobs you want to execute and use the grid engine you should first consider Nova Resource:Tools/Help#Submitting, managing and scheduling jobs on the grid and if you are familiar with the Toolserver and its architecture consult Migrating from toolserver also.
In general labs uses SGE and its commands like qsub et al, this is explained in this document which you should use in order to get an idea which command and what parameters you want to use. Please don't use the
-daemonize parameter as it is unneeded on the grid.
To run a bot using the grid, you might want to be in the pywikipedia directory (this is not needed) - which means you have to write a small wrapper script. The following example script (versiontest.sh) is used to run version.py:
$ cat versiontest.sh #!/bin/bash cd /path/to/pywikipedia python version.py
To submit a job, set the permissions for the script and then use the 'jsub' command to send the job to the grid:
$ chmod 755 versiontest.sh $ jsub -N job_name versiontest.sh
Job output will be written to output and error files in your home directory called YOURJOBNAME.out and YOURJOBNAME.err, respectively (e.g., versiontest.out and versiontest.err in this example):
$ cat ~/versiontest.out Pywikipedia [https] r/pywikibot/compat (r10211, 8fe6bdc, 2013/08/18, 14:00:57, ok) Python 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] config-settings: use_api = True use_api_login = True unicode test: ok
An infinitely running job (e.g. irc-bot) like this (cronie entry from TS submit host):
06 0 * * * qcronsub -l h_rt=INFINITY -l virtual_free=200M -l arch=lx -N script_wui $HOME/rewrite/pwb.py script_wui.py -log
$ jsub -once -continuous -l h_vmem=256M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log
$ jstart -l h_vmem=256M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log
the first expression is good for debugging. Memory values smaller than 256MB seam not to work here, since that is the minimum. If you experience problems with your jobs, like e.g.
Fatal Python error: Couldn't create autoTLSkey mapping
you can try increasing the memory value - which is also needed here, because this script uses a second thread for timing and this thread needs memory too. Therefore use finally
$ jstart -l h_vmem=512M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log
Now in order to create a crontab follow Scheduling jobs at regular intervals with cron and setup for crontab file like:
$ crontab -e
06 0 * * * jstart -l h_vmem=512M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log
How to use <programming-language-X> to write tools on labs
Do you have experience that might help another user? Please share it (or point to it) here!