Help:Tool Labs

From Wikitech
Jump to: navigation, search
Tool Labs HelpFAQGetting startedRulesAccessDatabasesJob gridWebHow toList of ToolsGlossary

Tool Labs is a hosting environment for community developers working on tools and bots that help users maintain and use wikis. Tool Labs provides access to replicas of Wikimedia databases, allowing developers to easily re-use this information, for analytics, bot work, or by creating tools that help editors and other volunteers in their work. The infrastructure is supported by a dedicated group of Wikimedia Foundation staff and volunteers.

Contents

Quick start

  1. On wikitech, visit Create an account and create your Labs wiki account.
    • make careful note of the wiki username and "Instance shell account name" you choose
  2. On wikitech, Fill out an access request for the Tools project.
  3. In a command-line terminal, generate an SSH-2 RSA key. See /Access § Generating and uploading an SSH key if you don't know how.
  4. In a command-line terminal, enter: $ cat ~/.ssh/id_rsa.pub (or similar) to display your public SSH key that you created above, then copy it.
  5. On wikitech, log in with your labs wiki account, visit Preferences > OpenStack tab and paste in your public SSH key.
  6. Wait for your requests to be completed (you should receive messages on your wikitech talk page).

Once this is all done you should be able to

  • Use SSH to login to Tool Labs. In a command-line terminal, enter: ssh -i ~/.ssh/id_rsa username@login.tools.wmflabs.org (username is the "Instance shell account name" you specified when you created an account, fingerprints)
  • Use SSH-based utilities such as scp and sftp to transfer files between Tool Labs and your computer.
  • Access MySQL from SSH (e.g. sql enwiki is a shortcut command to connect to the copy of English Wikipedia)
  • Create tools (see § Creating a new Tool account).

Gotchas

  • Your wikitech wiki username and your shell login username may be different. Visit Preferences > User profile and check "Instance shell account name".
  • The passwords you chose for your wikitech login and SSH key may be different.
  • You will notice that you have no public_html in your home folder, unlike the old Toolserver. This is because you need to create and use the Tool account to serve services.
  • When you login with SSH you are in your personal folder. To quickly go to your tool account enter: become tool_name
  • If you login with WinSCP you transfer files as yourself (not as your tool). The group is the same so you can just do:
become <tool_name>
chmod -R g+rw ./
  • You will also notice that web service for your tool is not started by default. To start it enter: webservice start
  • When doing file system intensive tasks (like git clone), it may be useful to perform the task on a host-local filesystem (like /tmp) to avoid the speed penalty because of NFS. For example, cloning git repositories in /tmp and then moving it to the location you actually need it in can be much faster than cloning it in the original location.
  • If you need to use the server for a long time, and want to execute a task even when you're not connected to the ssh, you can use screen. Note that screen needs to be run before become <tool_name>.
  • You might need to use ssh-add after creating a new key.

What is Tool Labs

Rationale

Tool Labs was developed in response to the need to support external tools and their developers and maintainers. The system is designed to make it easy for maintainers to share responsibility for their tools and bots, which helps ensure that no useful tool gets ‘orphaned’ when one person needs a break. The system is designed to be reliable, scalable and simple to use, so that developers can hit the ground and start coding.

Features

In addition to a well-supported hosting environment, Tool Labs provides:

  • support for Web services, continuous bots, and scheduled tasks
  • access to replicated production databases
  • easily shared management of tool accounts, where tools and bots are stored
  • a grid engine for dispatching jobs
  • support for mosh, SSH, SFTP without complicated proxy setup
  • version control via Gerrit and Git
  • support for Redis
  • support for Elasticsearch

Shared storage

You will have access to some of the shared storage, see Help:Shared storage (for instance the /shared/mediawiki/ checkout).

Architecture and terminology

Tool Labs has four components: the bastion hosts , the grid, the web cluster, and the databases.

Bastion hosts

You log in to Tool Labs through a bastion host. As of May 2015, Tool Labs has two bastion hosts:

tools-login.wmflabs.org
user login to access tools interactively, also named login.tools.wmflabs.org
dev.tools.wmflabs.org
functionally identical, please use this for heavy processing such as compiles
The grid

The Tool Labs grid, implemented with Open Grid Engine (the open-source fork of Sun Grid Engine) permits users to submit jobs from either a log-in account on the bastion host or from a web service. Submitted jobs are added to a work queue, and the system finds a host to execute them. Jobs can be scheduled synchronously or asynchronously, continuously, or simply executed once. If a continuous job fails, the grid will automatically restart the job so that it keeps going. For more information about the grid, please see § Submitting, managing and scheduling jobs on the grid.

The web cluster

The Tool Labs web cluster is fronted by a web proxy, which supports SSL and is open to the Internet. The proxy distributes web requests among the web servers in this cluster; any server in this web cluster can serve any of the hosted web tools because Tool Labs uses a shared storage system. For more information, please see § Web server.

Each tool has its own lighttpd Web server, with full configuration options. FCGI scripts are supported with configuration options, and WSGI is supported using flup.server.fcgi. See § Web server for more information.

The databases

Tool Labs supports two sets of databases: the production replicas and user-created databases, which are used by individual tools. The production replicas follow the same setup as production, and the information that can be accessed from them is the same as that which normal registered users (i.e.: not +sysop or other types of advanced permissions) can access on-wiki or via the API. Note that some data has been removed from the replicas for privacy reasons. User-created databases can be created by either a user or a tool on the replica servers or on a local ‘tools’ project database.

No "instances"

Developers working in Tool Labs do not have to create or set up virtual machines (i.e., Labs "instances"), because the Tool Labs project admins create and manage them. The term may appear in documentation on Wikitech, otherwise, don’t worry about it.

Rules of use

Tool Labs policies

All tools and bots developed and maintained on Tool Labs must adhere to the terms of use that will be available here when they are finalized:

Specifically, tools must be

Private information must be handled carefully, if at all. Note that private user information has been redacted from the replicated databases provided by the system.

As the Tool Labs environment is shared, we ask that you strive not to break things for others, and to be considerate when using system resources.

Individual wiki policies (these differ!)

When developing on Tool Labs, please adhere to the bot policies of the wikis your bot interacts with. Each wiki has its own guidelines and procedures for obtaining approval. The English Wikipedia, for example, requires that a bot be approved by the Bot Approvals Group before it is deployed, and that the bot account be marked with a 'bot' flag. See Wikipedia Bot policy for more information on the English Wikipedia.

For general information and guidelines, please see Bot policy.

Contact

We’d love to hear from you! Our main point of contact is our phabricator project. If you need support, please file a Task there.

Other contact options are:

IRC
#wikimedia-labs connect on Freenode, a great place to ask questions, get help, and meet other Tool Labs developers. See Help:IRC for more information.
Mailing list
Labs-l@lists.wikimedia.org A list for announcements and discussion related to the Wikimedia Labs project. (archives): Labs-announce@lists.wikimedia.org The announce-only version. If you run any Tool Labs projects you should subscribe to this at a minimum, as changes that may impact your project are communicated here. (archives)

Tool Labs is a joint WMF-Volunteer run project, and we welcome contributions to the infrastructure. The current maintainers are:

Getting access to Tool Labs

This is a brief summary of the /Access documentation page.


To access Tool Labs you need:

  • to create a Labs account, which will allow you shell access; and
  • to request access to the 'tools' project

Sign up for a Labs account here: Request account (you will be asked to enter the new account's information)

The "Instance shell account name" you specify in the Create Account form will be your Unix username on all Labs projects. If you forget your username, you can always find it under Preferences > Instance shell account name.

In order to access Labs servers using SSH, you must provide a public SSH key. Once you have created a Labs account, you can specify a public key on the 'OpenStack' tab of your Wikitech preferences.

Once you have created a Labs account, you must request access to the ‘tools’ project by submitting a Tools Access Request. Requests for access are generally dealt with within the day (often faster), though response-time may be longer depending on admin availability. If you need immediate assistance, please contact us on IRC.


Using Tool Labs and managing your files

Tool Labs can be accessed in a variety of ways – from its public IP to a GUI client. Please see Help:Access for general information about accessing Labs.

The tools list

The Tool labs tools list page is publicly available and contains a list of all currently-hosted Tool accounts along with their maintainers. Tool accounts that have an associated web page appear as links. Users with access to the 'tools' project can create new tool accounts here, and add or remove maintainers to and from existing tool accounts.

SSH

Once set up, you ssh to Tool Labs via its bastion host login.tools.wmflabs.org, provided that a public SSH key has been uploaded to the Labs account.

ssh yourshellaccountname@login.tools.wmflabs.org

Note that if you plan to do heavy processing (compiling, etc), you should SSH to dev.tools.wmflabs.org. Also, if you get disconnected frequently during ssh, consider setting the ServerAliveInterval option to a smaller number (~5-20 seconds) when connecting:

ssh -o ServerAliveInterval=5 yourshellaccountname@login.tools.wmflabs.org

Using multiple ssh agents

If you use multiple ssh-agents (to connect to your personal or company system, for example), see SSH access for more information about setting up a primary and a Labs agent.

Updating files

After you can ssh successfully, you can transfer files via sftp and scp. Note that the transferred files will be owned by you. You will likely wish to transfer ownership to your tool account. To do this:

1. become your tool account:

yourshellaccountname@tools-login:~$ become toolaccount
tools.toolaccount@tools-login:~$

2. As your tool account, take ownership of the files:

tools.toolaccount@tools-login:~$ take FILE

The take command will change the ownership of the file(s) and directories recursively to the calling user (in this case, the tool account).

Handling permissions

if you're getting permission errors, note that you can also transfer files the other way around: copy the files as your tool account to /data/projects/<projectname>.

Another, probably easier, way is to set the permission to group-writable for the tools directory. For example, if your shell account's name is alice and your tool name is alicetools you could do something like this after logged in as a shell user

become alicetools
chmod -R g+w /data/project/alicetools
logout
cp -rv /home/alice/* /data/project/alicetools/

Using git

The best option is to create a Git repository to which project participants commit files. To access the files, become the tool account, check that repository out in your tool's directory, and thereafter run a regular git pull whenever you want to deploy new files.

Putty and WinSCP

Note that instructions for accessing Tool Labs with Putty and WinSCP differ from the instructions for using them with other Labs projects. Please see Help:Access to ToolLabs instances with PuTTY and WinSCP for information specific to Tool Labs.

Other graphical file managers (e.g., Gnome/KDE)

For information about using a graphical file manager (e.g., Gnome/KDE), please see Accessing instances with a graphical file manager.

Installing MediaWiki core

Warning Warning: MediaWiki installations attract spammers faster than anything else, and the load caused makes Tools administrators grumpy. Do lock down your installation immediately after setup so that uninvited users cannot publish information. You should also re-read the Terms of use regarding the rules on wikis.

You want to install MediaWiki core and make your installation visible on the web.

One-time steps per tool

First, you have to do some preparatory steps which you need only once per tool.

become <YOURTOOL>

If you have not installed composer yet:

mkdir ~/bin
curl -sS https://getcomposer.org/installer | php -- --install-dir=$HOME/bin --filename=composer

If your local bin directory it not in your $PATH (use echo $PATH to find out), then create or alter the file ~/.profile and add the lines:

# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
   PATH="$HOME/bin:$PATH"
fi

Finish your session as <YOURTOOL> and start a new one, or:

. ~/.profile

Now you are done with the one-time preparations.

For each instance of core

The following steps are needed for each new installation of MediaWiki. We assume that you want to access MediaWiki via the web in a directory named MW — you are free to use another name. If not already done:

become <YOURTOOL>

Then:

cd ~/public_html

If you plan to submit changes:

git clone ssh://<YOURUSERNAME>@gerrit.wikimedia.org:29418/mediawiki/core.git MW

or else, if you only want to use MediaWiki without submitting changes:

git clone https://gerrit.wikimedia.org/r/p/mediawiki/core.git MW

will do and spares resources. Next, recent versions of MediaWiki have external dependencies, so you need to install those:

cd MW
composer install
git review -s

Now you should be able to access the initial pre-install screen of MediaWiki from your web browser as:

https://tools.wmflabs.org/<YOURTOOL>/MW/

and proceed as usual. See how to create new databases for your MediaWiki installations.

Joining and creating a Tool account

What is a Tool account?

A Tool account is the "user" associated with a Tool on Tool labs. Although each tool account has a user ID, they are not personal accounts (like a Labs account), rather services that consist of a user and group ID (i.e., a unix uid-gid pair) that are intended to run the actual tool or bot. Anyone who has access to Tool Labs can create a Tool account.

  • Unix user: tools.toolname
  • Unix group: tools.toolname

Members of the Tool account's Unix group include:

  • the tool account creator
  • the tool account itself
  • (optionally, but encouraged!) additional tool maintainers

Maintainers may have more than one tool account, and tool accounts may have more than one maintainer. Every member of the group has the authorization to sudo to the tool account. By default, only members of the group have access to tool account's code and data.

A simple way for maintainers to switch to the tool account is with become:

maintainer@tools-login:~$ become toolname
tools.toolname@tools-login:~$

In addition to the user/group pair, each tool account includes:

  • A home directory on shared storage: /data/project/toolname
  • A ~/public_html/ directory, which is visible at http://tools.wmflabs.org/toolname/
  • Database access credentials: ~/replica.my.cnf, which provide access to the production database replicas as well as to project-local databases.
  • Access to the continuous and task queues of the compute grid

Joining an existing Tool account

All tool accounts hosted in Tool Labs are listed on the Tools list. If you would like to be added to an existing account, you must contact the maintainer(s) directly.

If you would like to add (or remove) maintainers to a tool account that you manage, you may do so with the 'add' link found beneath the tool name on the Tools home page.

Creating a new Tool account

Members of the ‘tools’ project can create tool accounts from the Tools home page:

  1. Navigate to the Tools home page.
  2. Select the "create new tool" link (found in the "Develop your own tool" section).
  3. Enter a "Service group name". The service group name will be used as the name of your tool account.

Do not prefix your service group name with tools.. The management interface will do so automatically where appropriate, and there is a known issue that will cause the account to be created improperly if you do.

Note: If you have only recently been added to the 'tools' project, you may get an error about not having appropriate credentials. Simply log out and back in to Wikitech to fix this

The tool account will be created and you will be granted access to it within a minute or two. If you were already logged in to your Labs account through SSH, you will have to log off then back in before you can access the tool account.

Deleting a Tool account

You can't delete a tool account yourself, though you can delete the content of your directories. If you really want a tool account to be deleted, please contact an admin.

Customizing a Tool account

Once you have created a tool account, there are a few things that you can customize to make the tool more easily understood and used by other users. These include:

  • adding a tool account description (the description will appear on the Tools home page beside the tool name)
  • creating a home page for your tool (if you create a home page for the tool, it will be linked from the Tools home page automatically)

Tool Labs will soon support mail to both Labs users and tool accounts (mail to a tool account will go to all maintainers by default). You can customize mail settings as well.

Creating a tool web page

To create a web page for your tool account, simply place an index.html file in the tool account's ~/public_html/ directory. The page can be a simple description of the tool or bot with basic information on how to set it up or shut it down, or it contain an interface for the web service. To see examples of existing tool web pages, click any of the linked tool names on the Tools list.

Note that some files, such as PHP files, will give a 500 error unless the owner of the file is tool account.

You will also need to start a webservice for your tool.

1. Log into your Labs account and become your tool account:

maintainer@tools-login:~$ become toolname

2. Start the web service:

tools.toolname@tools-login:~$ webservice start

Make the tool translatable

If your tool is used from the web, and assuming you think it's worth something at all, you want to make it translatable. You can and should use the Intuition framework (PHP only), which allows you to use translatewiki.net and delivers you the localisation.

Don't waste your time, learn from our experience with MediaWiki: read the message documentation tips and other internationalization hints.

Creating a tool description

To create a tool description:

1. Log into your Labs account and become your tool account:

maintainer@tools-login:~$ become toolname

2. Create a .description file in the tool account’s home directory. Note that this file must be HTML:

tools.toolname@tools-login:~$ vim .description

3. Add a brief description (no more than 25 words or so) and save the file. You can use basic HTML markup in the file.

4. Navigate to the Tools list. Your tool account description should now appear beside your tool account name.

Configuring bots and tools

Tools and bot code should be stored in your tools account, where it can be managed by multiple users and accessed by all execution hosts. Specific information about configuring web services and bots, along with information about licensing, package installation, and shared code storage, is available at the § Developing on Tool Labs section.

Note that bots and tools should be run via the grid, which finds a suitable host with sufficient resources to run each. Simple, one-off jobs can be submitted to the grid easily with the jsub command. Continuous jobs, such as bots, can be submitted with jstart.

Setting up code review and version control

Although it's possible to just stick your code in the directory and mess with it manually every time you want to change something, your future self and your future collaborators will thank you if you instead use source control, a.k.a. version control and a code review tool. Wikimedia Labs makes it pretty easy to use Git for source control and Gerrit for code review, but you also have other options.

Setting up a local Git repository

It is fairly simple to set up a local Git repository to keep versioned backups of your code. However, if your tool directory is deleted for some reason, your local repository will be deleted as well. You may wish to request a Gerrit/Git repository to safely store your backups and/or to share your code more easily. Other backup/versioning solutions are also available. See User:Magnus Manske/Migrating from toolserver § GIT for some ideas.

To create a local Git repository:

1. Create an empty Git repository

maintainer@tools-login:~$ git init

2. Add the files you would like to backup. For example:

maintainer@tools-login:~$ git add public_html

3. Commit the added files

git commit -m 'Initial check-in'

For more information about using Git, please see the git documentation.

Enabling simple public HTTP access to local Git repository

If you've set up a local Git repository like the above in your tool directory, you can easily set up public read access to the repository through HTTP. This will allow you to, for instance, clone the Git repository to your own home computer without using an intermediary service such as GitHub.

First create the www/static/ subdirectory in your tool's home directory, if it does not already exist:

mkdir ~/www
mkdir ~/www/static/

Now go to the www/static/ directory, and make a symbolic link to your bare Git repository (the hidden .git subdirectory in the root of your repository):

cd ~/www/static/
ln -s ~/.git yourtool.git

Now change directory into the symbolic link you just created, and run the git update-server-info command to generate some auxiliary info files needed for the HTTP connectivity:

cd yourtool.git
git update-server-info

Enable a few Git hooks for updating said auxiliary info files every time someone commits, rewrites or pushes to the repository:

ln -s hooks/post-update.sample hooks/post-commit
ln -s hooks/post-update.sample hooks/post-rewrite
ln -s hooks/post-update.sample hooks/post-update
chmod a+x hooks/post-update.sample

You're done. You should now be able to clone the repository from any remote machine by running the command:

git clone http://tools-static.wmflabs.org/yourtool/yourtool.git

Requesting a Gerrit/Git repository for your tool

Tool Labs users may request a Gerrit/Git repository for their tools. Access to Git is managed via Wikimedia Labs and integrated with Gerrit, a code review system.

In order to use the Wikimedia Labs code review and version control, you must upload your ssh key to Gerrit and then request a repository for your tool.

  1. Log in to https://gerrit.wikimedia.org/ with your Labs account.
  2. Add your SSH public key (select “Settings” from the drop-down menu beside your user name in the upper right corner of the screen, and then “SSH Public Keys” from the Settings menu).
  3. Request a Gerrit project for your tool: Gerrit/New repositories

For more information, please see:

For more information about using Git and Gerrit in general, please see Git/Gerrit.

Using Github or other external service

Before you start you might want to setup your Git user account.

# Login to your tool account
become mytool
# Your name
git config user.name "Your Name"
# Your e-mail (use the one you set up in Github)
git config user.email "your-mail@example.com"

Then you can clone remote repo (as you always do):

git clone https://github.com/yourGithubName/yourGithubRepoName.git

You can do updates any way you want, but you might want to use this simple update script to securely update code:

#!/bin/bash

read -r -p "Stop the service and pull fresh code? (Y/n)" response
if ! [[ $response =~ ^([nN][oO]|[nN])$ ]]
then
	webservice stop
	cd ./public_html
	echo -e "\nUpdating the code..."
	git pull
	echo
	read -r -p "OK to start the service? (Y/n)" response
	if ! [[ $response =~ ^([nN][oO]|[nN])$ ]]
	then
		webservice start
	fi
fi

Save above in your tool account home folder as e.g. "update.sh". Don't forget to add executive rights to you and your tool group (i.e. `chmod 770 update.sh`).

Database access

This is a brief summary of the /Database documentation page.


Tool and Labs accounts are granted access to replicas of the production databases. Private user data has been redacted from these replicas (some rows are elided and/or some columns are made NULL depending on the table), but otherwise the schema is, for all practical purposes, identical to the production databases and the databases are sharded into clusters in much the same way.

Database credentials (credential user/password) are stored in the 'replica.my.cnf' file found in the tool account’s home directory. To use these credentials with command-line tools by default , copy 'replica.my.cnf' to '.my.cnf'.

If you do not have a 'replica.my.cnf' in your home directory, please create a ticket in Phabricator.


To connect to the English Wikipedia replica, specify the alias of the hosting cluster (enwiki.labsdb) and the alias of the database replica (enwiki_p) :

mysql --defaults-file="${HOME}"/replica.my.cnf -h enwiki.labsdb enwiki_p

To connect to the Wikidata cluster:

mysql --defaults-file=~/replica.my.cnf -h wikidatawiki.labsdb

To connect to Commons cluster:

mysql --defaults-file=~/replica.my.cnf -h commonswiki.labsdb

There is also a shortcut for connecting to the replicas: sql <dbname>[_p] The _p is optional, but implicit (i.e. the sql tool will add it if absent).

To connect to the English Wikipedia database replica using the shortcut, simply type:

sql enwiki

To connect to tools-db using the shortcut, type:

sql local

This sets server to "tools-db" and db to "". It's equivalent to typing-

mysql --defaults-file=~/replica.my.cnf -h tools-db

To connect to a given Labs database, say, 'labsdb1004.eqiad.wmnet':

mysql --defaults-file=replica.my.cnf --host labsdb1004.eqiad.wmnet


Connecting from a Servlet in Tomcat

  1. create directory "lib" in directory "public_tomcat"
  2. copy "mysql-connector-java-bin.jar" to "public_tomcat/lib"
  3. import org.apache.tomcat.jdbc.pool.DataSource;
    import org.apache.tomcat.jdbc.pool.PoolProperties;
    
    String DBURL    = "jdbc:mysql://tools-db.tools.eqiad.wmflabs:3306/";
    String DBDRIVER = "com.mysql.jdbc.Driver";
    String DATABASE = DBUSER + "__" + PROJECT;
    
    PoolProperties p = new PoolProperties();
    p.setUrl            (DBURL + DATABASE);
    p.setDriverClassName(DBDRIVER        );
    p.setUsername       (DBUSER          );
    p.setPassword       (DBPASSWORD      );
    p.setJdbcInterceptors(
    	"org.apache.tomcat.jdbc.pool.interceptor.ConnectionState;" +
    	"org.apache.tomcat.jdbc.pool.interceptor.StatementFinalizer");
    DataSource datasource = new DataSource();
    datasource.setPoolProperties(p);
    Connection connection = datasource.getConnection  ();	
    Statement  statement  = connection.createStatement();
    
  4. javac -classpath javax.servlet.jar:tomcat-jdbc.jar myhttpservlet.java

Submitting, managing and scheduling jobs on the grid

This is a brief summary of the /Grid documentation page.


Every non-trivial task performed in Tool Labs should be dispatched by the grid engine, which ensures that the job is run in a suitable place with sufficient resources. The basic principle of running jobs is fairly straightforward:

  • You submit a job to a work queue from a submission server (e.g., -login) or web server
  • The grid engine master finds a suitable execution host to run the job on, and starts it there once resources are available
  • As it runs, your job will send output and errors to files until the job completes or is aborted.

Jobs can be scheduled synchronously or asynchronously, continuously, or simply executed once. If a continuous job fails, the grid will automatically restart the job so that it keeps going.

To schedule jobs to be run at specific days or time of days, you can use cron to submit the jobs to the grid.

Scheduling a command more often than every five minutes (e.g. * * * * * command) is highly discouraged, even if the command is "only" jsub. In these cases, you very probably want to use 'jstart' instead. The grid engine ensures that jobs submitted with 'jstart' are automatically restarted if they exit.


Email

Mail to users

Mail sent to user@tools.wmflabs.org (where user is a shell account) will be forwarded to the email address that user has set in their Wikitech preferences, if it has been verified (the same as the 'Email this user' function on wikitech).

Any existing .forward in the user's home will be ignored.

Mail to tools

Mail can also be sent "to a tool" with:

toolname.anything@tools.wmflabs.org

Where "anything" is an arbitrary alphanumeric string. Mail will be forwarded to the first of:

  • The email(s) listed in the tool's ~/.forward.anything, if present;
  • The email(s) listed in the tool's ~/.forward, if present; or
  • The wikitech email of the tool's individual maintainers.

Additionally, tools.toolname@tools.wmflabs.org is an alias pointing to toolname.maintainers@tools.wmflabs.org mostly useful for automated email generating from within Labs.

~/.forward and ~/.forward.anything need to be readable by the user Debian-exim; to achieve that, you probably need to chmod o+r ~/.forward*.

Processing email programatically

In addition to mail forwarding, tools can have incoming mail sent to an arbitrary program by setting one of its .forwards (as above) to:

|jmail program

In that case, program will be invoked as a job on the grid and will have the email presented to it as its standard input. If program fails to run, or exits with a non-zero status, then the email will bounce with the standard error included it the bounce message.

Please be aware that mail processing on the grid is limited in memory and in runtime (30s CPU time, 60s wall clock) so you should not do heavy processing in your script. If you need more than this, then have the initial script simply queue the email for later processing from another component.

Mail from tools

When sending mail from a job, the usual command line method of piping the message body to /usr/bin/mail may not work correctly because /usr/bin/mail attempts to deliver the message to the local MSA in a background process which will be killed if it is still running when the job exits.

If piping to a subprocess to send mail is needed, the message including headers may be piped to /usr/sbin/exim -odf -i.

# This does not work when submitted as a job
echo "Test message" | /usr/bin/mail -s "Test message subject" user@example.com

# This does
echo -e "Subject: Test message subject\n\nTest message" | /usr/sbin/exim -odf -i user@example.com
  • Note: /usr/bin/echo supports -e in case your shell's internal echo command doesn't.

Web server

This is a brief summary of the /Web documentation page.


Every tool can have a dedicated web server running on the job grid. The default configuration will run a lighttpd web server which serves static files and PHP scripts from the tool's $HOME/public_html directory.

Options are available for easily running tomcat, nodejs, and wsgi web services. It is also possible to run a custom webserver process (e.g. to run a Scala-based tool). You can start a tool's web server with the webservice command:

$ become my_cool_tool
$ webservice start

You can also use the webservice command to stop, restart, check the status of the webserver.


Developing on Tool Labs

This is a brief summary of the /Developing documentation page.
  • License your source code and document that with a LICENSE or COPYING file in the tool's home directory and header comments in the source code. See Help:Tool_Labs/Developing § Licensing your source code for more help on why and how to select a license.
  • Use public version control (gerrit, diffusion, GitHub, Bitbucket, ...) for your tool's source code and deploy changes to the Tool Labs servers by updating a checkout of that public version control. See Help:Tool_Labs § Setting up code review and version control for additional information.
  • Keep passwords and other credentials (OAuth secrets, etc) separated from the main application code so that they are not exposed publicly in your version control system of choice.
  • Create a page in the Tool: namespace documenting the basics of what your tool does and how to start and stop it.
  • Find co-maintainers for your tools who can help out at least with starting/stopping jobs when needed.
  • Make many small tools that each do one specific task rather than a catch-all tool that does many different tasks.


The full documentation page provides tips and instructions for developing code in the Tool Labs, including specific language support.

Redis

Redis is a key-value store similar to memcache, but with more features. It can be easily used to do publish/subscribe between processes, and also maintain persistent queues. Stored values can be different data structures, such as hash tables, lists, queues, etc. Stored data persists across service restarts. For more information, please see the Wikipedia article on Redis.

A Redis instance that can be used by all tools is available on tools-redis, on the standard port 6379. It has been allocated a maximum of 12G of memory, which should be enough for most usage. You can set limits for how long your data stays in Redis; otherwise it will be evicted when memory limits are exceeded. See the Redis documentation for a list of available commands.

Libraries for interacting with Redis from PHP (phpredis) and Python (redis-py) have been installed on all the web servers and exec nodes. For an example of a bot using Redis, see SuchABot.

For quick & dirty debugging, you can connect directly to the Redis server with nc -C tools-redis 6379 and execute commands (for example "INFO").

Security

Redis has no access control mechanism, so other users can accidentally/intentionally overwrite and access the keys you set. Even if you are not worried about security, it is highly probable that multiple tools will try to use the same key (such as lastupdated, etc). To prevent this, it is highly recommended that you prefix all your keys with an application-specific, lengthy, randomly generated secret key.

You can very simply generate a good enough prefix by running the following command:

openssl rand -base64 32

PLEASE PREFIX YOUR KEYS! We have also disabled the redis commands that let users 'list' keys.

Can I use memcache?

There is no memcached on toollabs. Please use Redis instead.

Elasticsearch

This is a brief summary of the /Elasticsearch documentation page.


Elasticsearch is a full text search system built on Apache Lucene. It can be used to index and search data stored as JSON documents. It is the technology used to power Wikimedia's CirrusSearch system.

An Elasticsearch cluster that can be used by all tools is available on tools-elastic-0[123], on the non-standard port 80. This Elasticsearch cluster is a shared resource and all documents indexed in it can be read by anonymous users from within Tool Labs. Write access needed to create new indexes, and store or update documents requires a username and password.

See full documentation at /Elasticsearch for more information.

Dumps

The 'tools' project has access to a directory storing the public Wikimedia datasets (i.e. the dumps generated by Wikimedia). The most recent two dumps can be found in:

/public/dumps/public

This directory is read-only, but you can copy files to your tool's home directory and manipulate them in whatever way you like.

If you need access to older dumps, you must manually download them from the Wikimedia downloads server.

/public/dumps/pagecounts-raw contains some years of the pagecount/projectcount data derived by Erik Zachte from Domas Mituzas' archives.

CatGraph (aka Graphserv/Graphcore)

CatGraph is a custom graph database that provides tool developers fast access to the Wikipedia category structure. For more information, please see the documentation.

Troubleshooting

If you run into problems, please see the § Contact section. Specifically, please feel free to come into #wikimedia-labs connect and look for Coren (Marc-Andre Pelletier) or petan (Petr Bena). The labs-l mailing list is another good place to ask for help, especially if the people in chat are not responding.

You can also search help pages, or look more widely with the Google custom search.

Backups

What gets backed up?

The basic rule is: There is a lot of redundancy, but no user-accessible backups. Tool Labs users should make certain that they use source control to preserve their code, and make regular backups of irreplacable data. With luck, some files may be recoverable by Labs administrators in a manual process. But this requires human intervention and will likely not rescue the file that was created five minutes ago and deleted two minutes ago. If necessary, ask on IRC or file a Phabricator task.

References