User:SRodlund/Help (staging)

From Wikitech

This is a staging area for some changes that will be made to the Toolforge:Help page.

Stories

  • I am a new developer and/or new to the Wikimedia ecosystem, and I want to understand what is possible to do with Toolforge
  • I am a new developer and/or new to the Wikimedia ecosystem, and I want to understand what Toolforge is and how to use it to create a tool
  • I am an experienced developer, and I want to onboard experienced developers who are working on or with Toolforge
  • I am an experienced developer, and I want to share information about how to perform a task or complete a process with a less experienced developer
  • I am an experienced developer, and I want to find information about how Toolforge works

Sections which may become stand alone pages

  • Each need an appropriate title

Managing files in Toolforge

Using Toolforge and managing your files

Toolforge can be accessed in a variety of ways – from its public IP to a GUI client. Please see Help:Access for general information about accessing Cloud VPS projects.

Updating files

After you can ssh successfully, you can transfer files via sftp and scp. Note that the transferred files will be owned by you. You will likely wish to transfer ownership to your tool account. To do this:

1. become your tool account:

yourshellaccountname@tools-login:~$ become toolaccount
tools.toolaccount@tools-login:~$

2. As your tool account, take ownership of the files:

tools.toolaccount@tools-login:~$ take FILE

The take command will change the ownership of the file(s) and directories recursively to the calling user (in this case, the tool account).

FIX THIS Installation

Handling permissions

if you're getting permission errors, note that you can also transfer files the other way around: copy the files as your tool account to /data/project/<projectname>.

Another, probably easier, way is to set the permission to group-writable for the tools directory. For example, if your shell account's name is alice and your tool name is alicetools you could do something like this after logged in as a shell user

become alicetools
chmod -R g+w /data/project/alicetools
logout
cp -rv /home/alice/* /data/project/alicetools/

What gets backed up?

The basic rule is: there is a lot of redundancy, but no user-accessible backups. Toolforge users should make certain that they use source control to preserve their code, and make regular backups of irreplaceable data. With luck, some files may be recoverable by Cloud Services administrators in a manual process. But this requires human intervention and will likely not rescue the file that was created five minutes ago and deleted two minutes ago. If necessary, ask on IRC or file a Phabricator task.

Repositories / Version control

Setting up code review and version control

Although it's possible to just stick your code in the directory and mess with it manually every time you want to change something, your future self and your future collaborators will thank you if you instead use source control, a.k.a. version control and a code review tool. Wikimedia Cloud VPS makes it pretty easy to use Git for source control and Gerrit for code review, but you also have other options.

Using git

The best option is to create a Git repository to which project participants commit files. To access the files, become the tool account, check that repository out in your tool's directory, and thereafter run a regular git pull whenever you want to deploy new files.

Putty and WinSCP

Note that instructions for accessing Toolforge with Putty and WinSCP differ from the instructions for using them with other Cloud VPS projects. Please see Help:Access to Toolforge instances with PuTTY and WinSCP for information specific to Toolforge.

Other graphical file managers (e.g., Gnome/KDE)

For information about using a graphical file manager (e.g., Gnome/KDE), please see Accessing instances with a graphical file manager.

Using Diffusion

  • Go to toolsadmin
  • Find your tool
  • Click the create new repository button

Requesting a Gerrit/Git repository for your tool

Toolforge users may request a Gerrit/Git repository for their tools. Access to Git is managed via Wikimedia Cloud VPS and integrated with Gerrit, a code review system.

In order to use the Wikimedia Cloud VPS code review and version control, you must upload your ssh key to Gerrit and then request a repository for your tool.

  1. Log in to https://gerrit.wikimedia.org/ with your Wikimedia developer account username and password.
  2. Add your SSH public key (select “Settings” from the drop-down menu beside your user name in the upper right corner of the screen, and then “SSH Public Keys” from the Settings menu).
  3. Request a Gerrit project for your tool: Gerrit/New repositories

For more information, please see:

For more information about using Git and Gerrit in general, please see Git/Gerrit.

Setting up a local Git repository

It is fairly simple to set up a local Git repository to keep versioned backups of your code. However, if your tool directory is deleted for some reason, your local repository will be deleted as well. You may wish to request a Gerrit/Git repository to safely store your backups and/or to share your code more easily. Other backup/versioning solutions are also available. See User:Magnus Manske/Migrating from toolserver § GIT for some ideas.

To create a local Git repository:

1. Create an empty Git repository

maintainer@tools-login:~$ git init

2. Add the files you would like to backup. For example:

maintainer@tools-login:~$ git add public_html

3. Commit the added files

git commit -m 'Initial check-in'

For more information about using Git, please see the git documentation.

Enabling simple public HTTP access to local Git repository

If you've set up a local Git repository like the above in your tool directory, you can easily set up public read access to the repository through HTTP. This will allow you to, for instance, clone the Git repository to your own home computer without using an intermediary service such as GitHub.

First create the www/static/ subdirectory in your tool's home directory, if it does not already exist:

mkdir ~/www
mkdir ~/www/static/

Now go to the www/static/ directory, and make a symbolic link to your bare Git repository (the hidden .git subdirectory in the root of your repository):

cd ~/www/static/
ln -s ~/.git yourtool.git

Now change directory into the symbolic link you just created, and run the git update-server-info command to generate some auxiliary info files needed for the HTTP connectivity:

cd yourtool.git
git update-server-info

Enable a few Git hooks for updating said auxiliary info files every time someone commits, rewrites or pushes to the repository:

ln -s hooks/post-update.sample hooks/post-commit
ln -s hooks/post-update.sample hooks/post-rewrite
ln -s hooks/post-update.sample hooks/post-update
chmod a+x hooks/post-update.sample

You're done. You should now be able to clone the repository from any remote machine by running the command:

git clone http://tools-static.wmflabs.org/yourtool/yourtool.git

Using Github or other external service

Before you start you might want to setup your Git user account.

# Login to your tool account
become mytool
# Your name
git config user.name "Your Name"
# Your e-mail (use the one you set up in Github)
git config user.email "your-mail@example.com"

Then you can clone remote repo (as you always do):

git clone https://github.com/yourGithubName/yourGithubRepoName.git

You can do updates any way you want, but you might want to use this simple update script to securely update code:

#!/bin/bash

read -r -p "Stop the service and pull fresh code? (Y/n)" response
if ! [[ $response =~ ^([nN][oO]|[nN])$ ]]
then
	webservice stop
	cd ./public_html
	echo -e "\nUpdating the code..."
	git pull
	echo
	read -r -p "OK to start the service? (Y/n)" response
	if ! [[ $response =~ ^([nN][oO]|[nN])$ ]]
	then
		webservice start
	fi
fi

Save above in your tool account home folder as e.g. "update.sh". Don't forget to add executive rights to you and your tool group (i.e. `chmod 770 update.sh`).

MediaWiki Core integrations

Installing MediaWiki core

MediaWiki installations attract spammers faster than anything else, and the load caused makes Tools administrators grumpy. Do lock down your installation immediately after setup so that uninvited users cannot publish information. You should also re-read the Terms of use regarding the rules on wikis.

You want to install MediaWiki core and make your installation visible on the web.

One-time steps per tool

First, you have to do some preparatory steps which you need only once per tool.

become <YOURTOOL>

If you have not installed composer yet:

mkdir ~/bin
curl -sS https://getcomposer.org/installer | php -- --install-dir=$HOME/bin --filename=composer

If your local bin directory it not in your $PATH (use echo $PATH to find out), then create or alter the file ~/.profile and add the lines:

# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
   PATH="$HOME/bin:$PATH"
fi

Finish your session as <YOURTOOL> and start a new one, or:

. ~/.profile

Now you are done with the one-time preparations.

For each instance of core

The following steps are needed for each new installation of MediaWiki. We assume that you want to access MediaWiki via the web in a directory named MW — you are free to use another name. If not already done:

become <YOURTOOL>

Then:

cd ~/public_html

If you plan to submit changes:

git clone ssh://<YOURUSERNAME>@gerrit.wikimedia.org:29418/mediawiki/core.git MW

or else, if you only want to use MediaWiki without submitting changes:

git clone https://gerrit.wikimedia.org/r/mediawiki/core.git MW

will do and spares resources. Next, recent versions of MediaWiki have external dependencies, so you need to install those:

cd MW
composer install
git review -s

Run webservice start and then you should be able to access the initial pre-install screen of MediaWiki from your web browser as:

https://tools.wmflabs.org/<YOURTOOL>/MW/

and proceed as usual. See how to create new databases for your MediaWiki installations.

Email

Mail to users

Mail sent to user@tools.wmflabs.org (where user is a shell account) will be forwarded to the email address that user has set in their Wikitech preferences, if it has been verified (the same as the 'Email this user' function on wikitech).

Any existing .forward in the user's home will be ignored.

Mail to a Tool

Mail can also be sent "to a tool" with:

toolname.anything@tools.wmflabs.org

Where "anything" is an arbitrary alphanumeric string. Mail will be forwarded to the first of:

  • The email(s) listed in the tool's ~/.forward.anything, if present;
  • The email(s) listed in the tool's ~/.forward, if present; or
  • The wikitech email of the tool's individual maintainers.

Additionally, tools.toolname@tools.wmflabs.org is an alias pointing to toolname.maintainers@tools.wmflabs.org mostly useful for automated email generating from within Cloud VPS.

~/.forward and ~/.forward.anything need to be readable by the user Debian-exim; to achieve that, you probably need to chmod o+r ~/.forward*.

Mail from Tools

From the Grid

When sending mail from a job, the usual command line method of piping the message body to /usr/bin/mail may not work correctly because /usr/bin/mail attempts to deliver the message to the local MSA in a background process which will be killed if it is still running when the job exits.

If piping to a subprocess to send mail is needed, the message including headers may be piped to /usr/sbin/exim -odf -i.

# This does not work when submitted as a job
echo "Test message" | /usr/bin/mail -s "Test message subject" user@example.com

# This does
echo -e "Subject: Test message subject\n\nTest message" | /usr/sbin/exim -odf -i user@example.com
  • Note: /usr/bin/echo supports -e in case your shell's internal echo command doesn't.

From within a container

To send mail from within a Kubernetes container, use the mail.tools.wmflabs.org SMTP server.

Containers running on the Toolforge Kubernetes cluster do not install and configure a local mailer service like the exim service that is installed on grid engine nodes. Tools running in Kubernetes should instead send email using an external SMTP server. The mail.tools.wmflabs.org service name should be usable for this. This service name is used as the public MX (mail exchange) host for inbound SMTP messages to the tools.wmflabs.org domain and points to a server that can process both inbound and outbound email for Toolforge.


Web Services

  • Web pages for tools

Can I have a subdomain for my web service?

Sorry, not yet. This is still in discussion at phab:T125589. Currently, your web services are available under tools.wmflabs.org/<YOURTOOL>.

Databases

This is a brief summary of the /Database documentation page.

User:SRodlund/Help (staging)/Database

Is there a GUI tool for database work?

Not in Toolforge, but you can run one locally on your computer (for example the MySQL Workbench http://dev.mysql.com/downloads/tools/workbench/). Here is how you connect to the database:

>For the login: username@login.tools.wmflabs.org
>For the database, it depends on the exact one you want to use, of course - for example: enwiki.labs

Quarry is a public querying interface for the Cloud database replicas. See the documentation on meta for more information.

How do I access the database replicas?

  • You will find a tool accounts credentials for mariadb in the file $HOME/replica.my.cnf. You need to specify this file and the server you want to connect to. Some examples:
mysql --defaults-file=~/replica.my.cnf -h enwiki.labsdb enwiki_p # <- for English Wikipedia
mysql --defaults-file=~/replica.my.cnf -h dewiki.labsdb dewiki_p # <- for German Wikipedia
mysql --defaults-file=~/replica.my.cnf -h wikidatawiki.labsdb wikidatawiki_p # <- for Wikidata
mysql --defaults-file=~/replica.my.cnf -h commonswiki.labsdb commonswiki_p # <- for Commons
  • You can create a symlink from replica.my.cnf to .my.cnf by running ln -s replica.my.cnf .my.cnf and leave off the --defaults-file flag:
mysql -h commonswiki.labsdb commonswiki_p # <- for Commons
  • Alternatively, use the sql utility that provides convenient shortcuts:
sql enwiki # <- for English Wikipedia
sql commonswiki # <- for Commons
sql commons # <- for Commons (shortcut)
sql wikidata # <- for Wikidata (shortcut)

Developing

  • Best practices for Toolforge development
This is a brief summary of the /Developing documentation page.

User:SRodlund/Help (staging)/Developing

Web Services

This is a brief summary of the /Web documentation page.

User:SRodlund/Help (staging)/Web

  • Web pages for tools

Job Grid

This is a brief summary of the /Grid documentation page.

User:SRodlund/Help (staging)/Grid

  • Using Open Grid Engine to run jobs

Elasticsearch

This is a brief summary of the /Elasticsearch documentation page.

User:SRodlund/Help (staging)/Elasticsearch

Redis

Redis is a key-value store similar to memcache, but with more features. It can be easily used to do publish/subscribe between processes, and also maintain persistent queues. Stored values can be different data structures, such as hash tables, lists, queues, etc. Stored data persists across service restarts. For more information, please see the Wikipedia article on Redis.

A Redis instance that can be used by all tools is available on tools-redis, on the standard port 6379. It has been allocated a maximum of 12G of memory, which should be enough for most usage. You can set limits for how long your data stays in Redis; otherwise it will be evicted when memory limits are exceeded. See the Redis documentation for a list of available commands.

Libraries for interacting with Redis from PHP (phpredis) and Python (redis-py) have been installed on all the web servers and exec nodes. For an example of a bot using Redis, see gerrit-to-redis.

For quick & dirty debugging, you can connect directly to the Redis server with nc -C tools-redis 6379 and execute commands (for example "INFO").

Security

Redis has no access control mechanism, so other users can accidentally/intentionally overwrite and access the keys you set. Even if you are not worried about security, it is highly probable that multiple tools will try to use the same key (such as lastupdated, etc). To prevent this, it is highly recommended that you prefix all your keys with an application-specific, lengthy, randomly generated secret key.

You can very simply generate a good enough prefix by running the following command:

openssl rand -base64 32

PLEASE PREFIX YOUR KEYS! We have also disabled the redis commands that let users 'list' keys. This protection however should not be trusted to protect any secret data. Do not store plain text secrets or decryption keys in Redis for your own protection.

Can I use memcache?

There is no memcached on Toolforge. Please use Redis instead.

Dumps

The 'tools' project has access to a directory storing the public Wikimedia datasets (i.e. the dumps generated by Wikimedia). The most recent two dumps can be found in:

/public/dumps/public

This directory is read-only, but you can copy files to your tool's home directory and manipulate them in whatever way you like.

If you need access to older dumps, you must manually download them from the Wikimedia downloads server.

/public/dumps/pagecounts-raw contains some years of the pagecount/projectcount data derived by Erik Zachte from Domas Mituzas' archives.

CatGraph (aka Graphserv/Graphcore)

CatGraph is a custom graph database that provides tool developers fast access to the Wikipedia category structure. For more information, please see the documentation.

Celery

It is possible to run a celery worker in a kubernetes container as a continuous job (for instance to execute long-running tasks triggered by a web frontend). The redis service can be used as a broker between the worker and the web frontend. Make sure you use your own queue name so that your tasks get sent to the right workers.

Phabricator and task tracking

Make sure there is a place for people to find out how to use this.