User:SRodlund/Help (staging)
This is a staging area for some changes that will be made to the Toolforge:Help page.
Stories
- I am a new developer and/or new to the Wikimedia ecosystem, and I want to understand what is possible to do with Toolforge
- I am a new developer and/or new to the Wikimedia ecosystem, and I want to understand what Toolforge is and how to use it to create a tool
- I am an experienced developer, and I want to onboard experienced developers who are working on or with Toolforge
- I am an experienced developer, and I want to share information about how to perform a task or complete a process with a less experienced developer
- I am an experienced developer, and I want to find information about how Toolforge works
Sections which may become stand alone pages
- Each need an appropriate title
Managing files in Toolforge
Using Toolforge and managing your files
Toolforge can be accessed in a variety of ways – from its public IP to a GUI client. Please see Help:Access for general information about accessing Cloud VPS projects.
Updating files
After you can ssh successfully, you can transfer files via sftp and scp. Note that the transferred files will be owned by you. You will likely wish to transfer ownership to your tool account. To do this:
1. become
your tool account:
yourshellaccountname@tools-login:~$ become toolaccount tools.toolaccount@tools-login:~$
2. As your tool account, take
ownership of the files:
tools.toolaccount@tools-login:~$ take FILE
The take
command will change the ownership of the file(s) and directories recursively to the calling user (in this case, the tool account).
Handling permissions
if you're getting permission errors, note that you can also transfer files the other way around: copy the files as your tool account to /data/project/<projectname>
.
Another, probably easier, way is to set the permission to group-writable for the tools directory. For example, if your shell account's name is alice
and your tool name is alicetools
you could do something like this after logged in as a shell user
become alicetools
chmod -R g+w /data/project/alicetools
logout
cp -rv /home/alice/* /data/project/alicetools/
What gets backed up?
The basic rule is: there is a lot of redundancy, but no user-accessible backups. Toolforge users should make certain that they use source control to preserve their code, and make regular backups of irreplaceable data. With luck, some files may be recoverable by Cloud Services administrators in a manual process. But this requires human intervention and will likely not rescue the file that was created five minutes ago and deleted two minutes ago. If necessary, ask on IRC or file a Phabricator task.
Repositories / Version control
Setting up code review and version control
Although it's possible to just stick your code in the directory and mess with it manually every time you want to change something, your future self and your future collaborators will thank you if you instead use source control, a.k.a. version control and a code review tool. Wikimedia Cloud VPS makes it pretty easy to use Git for source control and Gerrit for code review, but you also have other options.
Using git
The best option is to create a Git repository to which project participants commit files. To access the files, become the tool account, check that repository out in your tool's directory, and thereafter run a regular git pull
whenever you want to deploy new files.
Putty and WinSCP
Note that instructions for accessing Toolforge with Putty and WinSCP differ from the instructions for using them with other Cloud VPS projects. Please see Help:Access to Toolforge instances with PuTTY and WinSCP for information specific to Toolforge.
Other graphical file managers (e.g., Gnome/KDE)
For information about using a graphical file manager (e.g., Gnome/KDE), please see Accessing instances with a graphical file manager.
Using Diffusion
- Go to toolsadmin
- Find your tool
- Click the create new repository button
Requesting a Gerrit/Git repository for your tool
Toolforge users may request a Gerrit/Git repository for their tools. Access to Git is managed via Wikimedia Cloud VPS and integrated with Gerrit, a code review system.
In order to use the Wikimedia Cloud VPS code review and version control, you must upload your ssh key to Gerrit and then request a repository for your tool.
- Log in to https://gerrit.wikimedia.org/ with your Wikimedia developer account username and password.
- Add your SSH public key (select “Settings” from the drop-down menu beside your user name in the upper right corner of the screen, and then “SSH Public Keys” from the Settings menu).
- Request a Gerrit project for your tool: Gerrit/New repositories
For more information, please see:
- Gerrit/New repositories -- request a repository
- Git/New repositories/Requests -- a list of existing requests, as well as a place to make new ones. You can see the status of your request as well.
For more information about using Git and Gerrit in general, please see Git/Gerrit.
Setting up a local Git repository
It is fairly simple to set up a local Git repository to keep versioned backups of your code. However, if your tool directory is deleted for some reason, your local repository will be deleted as well. You may wish to request a Gerrit/Git repository to safely store your backups and/or to share your code more easily. Other backup/versioning solutions are also available. See User:Magnus Manske/Migrating from toolserver § GIT for some ideas.
To create a local Git repository:
1. Create an empty Git repository
maintainer@tools-login:~$ git init
2. Add the files you would like to backup. For example:
maintainer@tools-login:~$ git add public_html
3. Commit the added files
git commit -m 'Initial check-in'
For more information about using Git, please see the git documentation.
Enabling simple public HTTP access to local Git repository
If you've set up a local Git repository like the above in your tool directory, you can easily set up public read access to the repository through HTTP. This will allow you to, for instance, clone the Git repository to your own home computer without using an intermediary service such as GitHub.
First create the www/static/
subdirectory in your tool's home directory, if it does not already exist:
mkdir ~/www
mkdir ~/www/static/
Now go to the www/static/
directory, and make a symbolic link to your bare Git repository (the hidden .git
subdirectory in the root of your repository):
cd ~/www/static/
ln -s ~/.git yourtool.git
Now change directory into the symbolic link you just created, and run the git update-server-info
command to generate some auxiliary info files needed for the HTTP connectivity:
cd yourtool.git
git update-server-info
Enable a few Git hooks for updating said auxiliary info files every time someone commits, rewrites or pushes to the repository:
ln -s hooks/post-update.sample hooks/post-commit
ln -s hooks/post-update.sample hooks/post-rewrite
ln -s hooks/post-update.sample hooks/post-update
chmod a+x hooks/post-update.sample
You're done. You should now be able to clone the repository from any remote machine by running the command:
git clone http://tools-static.wmflabs.org/yourtool/yourtool.git
Using Github or other external service
Before you start you might want to setup your Git user account.
# Login to your tool account
become mytool
# Your name
git config user.name "Your Name"
# Your e-mail (use the one you set up in Github)
git config user.email "your-mail@example.com"
Then you can clone remote repo (as you always do):
git clone https://github.com/yourGithubName/yourGithubRepoName.git
You can do updates any way you want, but you might want to use this simple update script to securely update code:
#!/bin/bash
read -r -p "Stop the service and pull fresh code? (Y/n)" response
if ! [[ $response =~ ^([nN][oO]|[nN])$ ]]
then
webservice stop
cd ./public_html
echo -e "\nUpdating the code..."
git pull
echo
read -r -p "OK to start the service? (Y/n)" response
if ! [[ $response =~ ^([nN][oO]|[nN])$ ]]
then
webservice start
fi
fi
Save above in your tool account home folder as e.g. "update.sh". Don't forget to add executive rights to you and your tool group (i.e. `chmod 770 update.sh`).
MediaWiki Core integrations
Installing MediaWiki core
![]() | MediaWiki installations attract spammers faster than anything else, and the load caused makes Tools administrators grumpy. Do lock down your installation immediately after setup so that uninvited users cannot publish information. You should also re-read the Terms of use regarding the rules on wikis. |
You want to install MediaWiki core and make your installation visible on the web.
One-time steps per tool
First, you have to do some preparatory steps which you need only once per tool.
become <YOURTOOL>
If you have not installed composer yet:
mkdir ~/bin curl -sS https://getcomposer.org/installer | php -- --install-dir=$HOME/bin --filename=composer
If your local bin
directory it not in your $PATH
(use echo $PATH
to find out), then create or alter the file ~/.profile
and add the lines:
# set PATH so it includes user's private bin if it exists if [ -d "$HOME/bin" ] ; then PATH="$HOME/bin:$PATH" fi
Finish your session as <YOURTOOL> and start a new one, or:
. ~/.profile
Now you are done with the one-time preparations.
For each instance of core
The following steps are needed for each new installation of MediaWiki. We assume that you want to access MediaWiki via the web in a directory named MW
— you are free to use another name. If not already done:
become <YOURTOOL>
Then:
cd ~/public_html
If you plan to submit changes:
git clone ssh://<YOURUSERNAME>@gerrit.wikimedia.org:29418/mediawiki/core.git MW
or else, if you only want to use MediaWiki without submitting changes:
git clone https://gerrit.wikimedia.org/r/mediawiki/core.git MW
will do and spares resources. Next, recent versions of MediaWiki have external dependencies, so you need to install those:
cd MW composer install git review -s
Run webservice start
and then you should be able to access the initial pre-install screen of MediaWiki from your web browser as:
https://tools.wmflabs.org/<YOURTOOL>/MW/
and proceed as usual. See how to create new databases for your MediaWiki installations.
Mail to users
Mail sent to user@tools.wmflabs.org
(where user is a shell account) will be forwarded to the email address that user has set in their Wikitech preferences, if it has been verified (the same as the 'Email this user' function on wikitech).
Any existing .forward in the user's home will be ignored.
Mail to a Tool
Mail can also be sent "to a tool" with:
toolname.anything@tools.wmflabs.org
Where "anything" is an arbitrary alphanumeric string. Mail will be forwarded to the first of:
- The email(s) listed in the tool's
~/.forward.anything
, if present; - The email(s) listed in the tool's
~/.forward
, if present; or - The wikitech email of the tool's individual maintainers.
Additionally, tools.toolname@tools.wmflabs.org
is an alias pointing to toolname.maintainers@tools.wmflabs.org
mostly useful for automated email generating from within Cloud VPS.
~/.forward
and ~/.forward.anything
need to be readable by the user Debian-exim
; to achieve that, you probably need to chmod o+r ~/.forward*
.
Mail from Tools
From the Grid
When sending mail from a job, the usual command line method of piping the message body to /usr/bin/mail
may not work correctly because /usr/bin/mail attempts to deliver the message to the local MSA in a background process which will be killed if it is still running when the job exits.
If piping to a subprocess to send mail is needed, the message including headers may be piped to /usr/sbin/exim -odf -i
.
# This does not work when submitted as a job
echo "Test message" | /usr/bin/mail -s "Test message subject" user@example.com
# This does
echo -e "Subject: Test message subject\n\nTest message" | /usr/sbin/exim -odf -i user@example.com
- Note:
/usr/bin/echo
supports-e
in case your shell's internalecho
command doesn't.
From within a container
To send mail from within a Kubernetes container, use the mail.tools.wmflabs.org
SMTP server.
Containers running on the Toolforge Kubernetes cluster do not install and configure a local mailer service like the exim service that is installed on grid engine nodes. Tools running in Kubernetes should instead send email using an external SMTP server. The mail.tools.wmflabs.org
service name should be usable for this. This service name is used as the public MX (mail exchange) host for inbound SMTP messages to the tools.wmflabs.org
domain and points to a server that can process both inbound and outbound email for Toolforge.
Web Services
- Web pages for tools
Can I have a subdomain for my web service?
Sorry, not yet. This is still in discussion at phab:T125589. Currently, your web services are available under tools.wmflabs.org/<YOURTOOL>.
Databases
- This is a brief summary of the /Database documentation page.
User:SRodlund/Help (staging)/Database
Is there a GUI tool for database work?
Not in Toolforge, but you can run one locally on your computer (for example the MySQL Workbench http://dev.mysql.com/downloads/tools/workbench/). Here is how you connect to the database:
- >For the login: username@login.tools.wmflabs.org
- >For the database, it depends on the exact one you want to use, of course - for example: enwiki.labs
Quarry is a public querying interface for the Cloud database replicas. See the documentation on meta for more information.
How do I access the database replicas?
- You will find a tool accounts credentials for mariadb in the file
$HOME/replica.my.cnf
. You need to specify this file and the server you want to connect to. Some examples:
mysql --defaults-file=~/replica.my.cnf -h enwiki.labsdb enwiki_p # <- for English Wikipedia
mysql --defaults-file=~/replica.my.cnf -h dewiki.labsdb dewiki_p # <- for German Wikipedia
mysql --defaults-file=~/replica.my.cnf -h wikidatawiki.labsdb wikidatawiki_p # <- for Wikidata
mysql --defaults-file=~/replica.my.cnf -h commonswiki.labsdb commonswiki_p # <- for Commons
- You can create a symlink from replica.my.cnf to .my.cnf by running
ln -s replica.my.cnf .my.cnf
and leave off the--defaults-file
flag:
mysql -h commonswiki.labsdb commonswiki_p # <- for Commons
- Alternatively, use the
sql
utility that provides convenient shortcuts:
sql enwiki # <- for English Wikipedia
sql commonswiki # <- for Commons
sql commons # <- for Commons (shortcut)
sql wikidata # <- for Wikidata (shortcut)
Developing
- Best practices for Toolforge development
- This is a brief summary of the /Developing documentation page.
User:SRodlund/Help (staging)/Developing
Web Services
- This is a brief summary of the /Web documentation page.
User:SRodlund/Help (staging)/Web
- Web pages for tools
Job Grid
- This is a brief summary of the /Grid documentation page.
User:SRodlund/Help (staging)/Grid
- Using Open Grid Engine to run jobs
Elasticsearch
- This is a brief summary of the /Elasticsearch documentation page.
User:SRodlund/Help (staging)/Elasticsearch
Redis
Redis is a key-value store similar to memcache, but with more features. It can be easily used to do publish/subscribe between processes, and also maintain persistent queues. Stored values can be different data structures, such as hash tables, lists, queues, etc. Stored data persists across service restarts. For more information, please see the Wikipedia article on Redis.
A Redis instance that can be used by all tools is available on tools-redis
, on the standard port 6379
. It has been allocated a maximum of 12G of memory, which should be enough for most usage. You can set limits for how long your data stays in Redis; otherwise it will be evicted when memory limits are exceeded. See the Redis documentation for a list of available
commands.
Libraries for interacting with Redis from PHP (phpredis
) and Python (redis-py
) have been installed on all the web servers and exec nodes. For an example of a bot using Redis, see gerrit-to-redis.
For quick & dirty debugging, you can connect directly to the Redis server with nc -C tools-redis 6379
and execute commands (for example "INFO").
Security
Redis has no access control mechanism, so other users can accidentally/intentionally overwrite and access the keys you set. Even if you are not worried about security, it is highly probable that multiple tools will try to use the same key (such as lastupdated
, etc). To prevent
this, it is highly recommended that you prefix all your keys with an application-specific, lengthy, randomly generated secret key.
You can very simply generate a good enough prefix by running the following command:
openssl rand -base64 32
PLEASE PREFIX YOUR KEYS! We have also disabled the redis commands that let users 'list' keys. This protection however should not be trusted to protect any secret data. Do not store plain text secrets or decryption keys in Redis for your own protection.
Can I use memcache?
There is no memcached on Toolforge. Please use Redis instead.
Dumps
The 'tools' project has access to a directory storing the public Wikimedia datasets (i.e. the dumps generated by Wikimedia). The most recent two dumps can be found in:
/public/dumps/public
This directory is read-only, but you can copy files to your tool's home directory and manipulate them in whatever way you like.
If you need access to older dumps, you must manually download them from the Wikimedia downloads server.
/public/dumps/pagecounts-raw
contains some years of the pagecount/projectcount data derived by Erik Zachte from Domas Mituzas' archives.
CatGraph (aka Graphserv/Graphcore)
CatGraph is a custom graph database that provides tool developers fast access to the Wikipedia category structure. For more information, please see the documentation.
Celery
It is possible to run a celery worker in a kubernetes container as a continuous job (for instance to execute long-running tasks triggered by a web frontend). The redis service can be used as a broker between the worker and the web frontend. Make sure you use your own queue name so that your tasks get sent to the right workers.
Phabricator and task tracking
Make sure there is a place for people to find out how to use this.