Help talk:Tool Labs

From Wikitech
Jump to: navigation, search

What I miss are best practice hints for two things:

  • How to publish code? Should we all use github? Or is gerrit the better way? A link to a small tutorial would be nice.
  • How to document a tool and how to communicate with the users. I can say, we use often a page inside Wikipedia because this was the easiest way for normal user to come in contact with us, we had an english version and mostly a version in our native language (German). Is it better to use the wikitech-wiki or the Mediawiki-wiki? Which wiki use "Single User Login".

Everybody should be free to go other ways but it would help if the majority use one way. --kolossos (talk) 10:04, 31 May 2013 (UTC)

How to add and modify a description for an project

I want to add a description at the list of for one of my tools. How can I do this? --kolossos (talk) 10:04, 31 May 2013 (UTC)

See Nova Resource:Tools/Help#Tool Labs's landing page. Anomie (talk) 13:18, 31 May 2013 (UTC)
Thanks.--kolossos (talk) 15:27, 31 May 2013 (UTC)

GUI tool for databasework

At the hackathon in Amsterdam it was mentioned to be possible to use an external GUI tool to interact with the database that runs on your own machine rather then to use PHPMyAdmin. What settings should be used for that?

Henna (talk) 11:21, 23 June 2013 (UTC)

Basically, it's the same settings as at Toolserver, except that you replace with enwiki.labs (and with --Tim Landscheidt (talk) 12:21, 23 June 2013 (UTC)

300-350M? Really?

I was tempted to add this to Nova Resource:Tools/Help#Why am I getting errors about must be installed for pthread_cancel to work?, but then I thought it might be too much detail so I decided to put it here for others to decide about.

Experiments in July 2013 with "do nothing for 60 seconds" scripts in various languages gave the following results:

Language h_vmem needed Code
C 5M

#include <unistd.h>
int main(void){ return sleep(60); }

Lua 14M local clock = os.clock local t0 = clock() + 60 while clock() < t0 do end
Perl 20M sleep 60;
Python 30M from time import sleep; sleep(60);
PHP 350M sleep(60);
NodeJS 750M var e = new Date().getTime() + 60000; while (new Date().getTime() <= e) {}

Of course, real scripts in most of these languages will load various modules/libaries and consequently have higher memory requirements. PHP may be somewhat of an exception here, as its 350M already includes a large number of such extensions (although any additional extensions or libraries (e.g. from PEAR) loaded would still increase the number). Anomie (talk) 02:21, 21 August 2013 (UTC)

I also tried a simple Java program. But I must have been doing something wrong, because it complained that it didn't have enough memory until I gave it 3.2G. Anomie (talk) 02:21, 21 August 2013 (UTC)
Java (un)helfully allocates most of its heap on startup; you can control that behavior with the -Xms and -Xmx command-line options. — Coren/Marc (talk) 22:55, 22 August 2013 (UTC)
Specifying 32M for both of those options makes it slightly better. Now it "only" needs 1.2G. Anomie (talk) 13:11, 23 August 2013 (UTC)
With jamvm, I can get a max heap space of 300M with "just" 750M. Shrinking the heap space doesn't help much anymore. Growing the availible space to 2750M allowed me to run with a max heap space of 2300M, so that at least seems to scale. Jamvm is installed on all nodes, and is called with -jamvm. Martijn Hoekstra (talk) 21:04, 15 October 2013 (UTC)

Public tool database?

Can I create a database as a tool user that is visible to other tools? On toolserver, we did that by adding "_p" to the database name. Should be documented, even if it's "you can't do that here". --Magnus Manske (talk) 15:47, 22 August 2013 (UTC)

It should indeed. You have two choices:
  1. We do keep to the "_p" convention, by default every database user has "SELECT on %_p.*" and so can read databases named that way; or
  2. Since users have all privileges on databases they are allowed to create (username__%) with grant option, you can grant the appropriate to any or all users with a GRANT statement.
— Coren/Marc (talk) 22:52, 22 August 2013 (UTC)

Does 20% of this page really need to be about pywikipedia?

I know that pywikipedia is a popular tool, but it seems excessive for a fifth of this general help page to be about using it. At the moment and not counting the "TOCright" template, 16417 of 85463 bytes of wikitext (19.2%) and 230 of 1172 lines of wikitext (19.6%) are in the "Pywikipedia" section. Anomie (talk) 13:08, 3 September 2013 (UTC)

Hosted jQuery etc.

Does Tool Labs offer hosted resources like jQuery? Tools like Pathoschild's important [1] and friends started using e.g. [2] on which means agreeing to [3] -> [4] -> [5] i.e. to send all data to Google for any use. I block them so it's no big privacy issue for me, but it would be nice to fix. --Nemo 09:12, 10 September 2013 (UTC)

Can we just load the appropriate files from or Anomie (talk) 17:16, 10 September 2013 (UTC)
That would be ideal, but are they "free access"? Maybe they are and Pathoschild just didn't think about it. --Nemo 17:25, 10 September 2013 (UTC)
As jQuery is available under the MIT licence, why pull it from other hosts at all and not just host it locally so that the tool developer has full control over it? After the first use of a tool by a user there shouldn't be a difference in caching. --Tim Landscheidt (talk) 01:24, 27 October 2013 (UTC)
There are a number of very good reasons to prefer to avoid having umpteen different copies of jQuery (and others). We can't use the one in bits (it'd introduce a great deal of complications related to the resource loader) but it'd make a great deal of sense to have one available for all tools on our own infrastructure. I'll try to see if we can rely on our own caching servers for that but if not, I'll make a repository for shared browser objects like this for Tools. — Coren/Marc (talk) 16:12, 29 December 2013 (UTC)

Naming of user-created databases

Do the user-created databases really have to start with some "random" string like "p50252g21636"? Why? Couldn't the database names be made more user-friendly, like using the normal user name? svick (talk) 12:34, 26 September 2013 (UTC)

Because usernames are not unique across projects. — Coren/Marc (talk) 20:02, 15 October 2013 (UTC)
So what about something like projectname_username? I think that would still be a big improvement over p50252g21636. Svick (talk) 20:21, 15 October 2013 (UTC)
Potential for disallowed characters in the project or user names, probably. "p50252g21636" is more or less projectid_userid. Anomie (talk) 13:04, 16 October 2013 (UTC)
The biggest issue is Mysql's hard-coded username length limit. — Coren/Marc (talk) 16:13, 29 December 2013 (UTC)

Getting help

Hi. I am not sure where to post this so please let me know if there is a more appropriate place. As described in more detail here, I have gotten a Wikipedia template filler to work "internally" from the command line on the Tools Lab server, but I am having trouble accessing from an external web browser. I would be very grateful if someone with knowledge of perl and Tools Lab could help me. Thanks. Boghog (talk) 20:11, 27 October 2013 (UTC)

The above mentioned problem is the inability of the perl cgi script to access a local library even though the library in question has world read and execute privileges. There is no mention of this on the help page. Any suggestions on how to get this to work would be greatly appreciated. Thanks. Boghog (talk) 12:02, 28 October 2013 (UTC)

OK, I think I have solved most the problems. The key was to add the following to the cgi script. This made debugging the library much easier:
use CGI::Carp 'fatalsToBrowser';
Adding this single line gave some very useful feedback that indicated despite using cpanm for the installs, there were a still a few unfulfilled dependencies. After installing these missing modules, the script finally worked. I still need to get url and isbn options to work, but at least the most commonly used options now work. The link for running the tool is here: citation-template-filling. Boghog (talk) 15:57, 1 November 2013 (UTC)

MySQL Workbench

MySQL Workbench doesn't work for me the way described. I always get the following error message: "Bad authentication type, the server is not accepting this type of authentication.". I'm using the same private key file which works for normal SSH access. Maybe this is something like . I wish I could just use phpMyAdmin... I'm really no friend of managing databases on command line. --APPER (talk) 19:07, 31 October 2013 (UTC)

I now did the SSH tunneling using my SSH client and used MySQL Workbench only for the SQL stuff. --APPER (talk) 13:47, 2 November 2013 (UTC)

submitting git update on a directory


I would like to run a git update everyday, but can't find how to submit it. My Pywikipedia repository is "/data/project/herculebot/pywikibot-compat/". I would like to run "git pull" on this directory with qsub. How to do so ?

--Hercule (talk) 21:27, 26 November 2013 (UTC)

Schedule a cronjob to run a command along the lines of cd /data/project/herculebot/pywikibot-compat/ && /usr/local/bin/jsub -cwd -N git-pull /usr/bin/git pull. I haven't tested it, but that should be about right. If you really want to use qsub instead of jsub, see /usr/local/bin/jsub and work out what qsub command it winds up using. Anomie (talk) 14:53, 27 November 2013 (UTC)

Impossible to submit Java program


I'm unable to run my Java script using jsub.

In the terminal, logged on tools.login I can run the following command line with no problem :

java -Xmx256M -Xms256M -jar /data/project/herculebot/Wikipedia.jar

When I edit my crontab, I add the following line :

54 * * * * cd /data/project/herculebot/ && /usr/local/bin/jsub -once -mem 256m -quiet -N Cacographies -j y java -Xmx256M -Xms256M -jar /data/project/herculebot/Wikipedia.jar

I always have the same error on my Cacographies.out file : Error occurred during initialization of VM Could not reserve enough space for object heap Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.

I tried to give more memory, with -mem 768m, but it's not better.

Can someone help me to solve this issue ?


--Hercule (talk) 21:01, 1 December 2013 (UTC)

To close this here as well: This was discussed on the mailing list, and the suggestion there was to increase the memory requested massively with 2 GByte apparently enough to satisfy Java :-). --Tim Landscheidt (talk) 02:49, 11 December 2013 (UTC)

add permissions around take to docs?

from IRC just now:

<brainwane> I'm having trouble "take"ing a file 
<brainwane> $ take /home/brainwane/biographies.txt
<brainwane> /home/brainwane/biographies.txt: you must own the containing directory
<brainwane> where do I put this file so my tools acct can "take" it?
<brainwane> petan: any ideas?
<andrewbogott> tools accounts usually have their own home directory
<Coren> brainwane: In any directory your tool account owns.
* chippy has quit (Remote host closed the connection)
<andrewbogott> in, I think… /data/project/<toolname>
<brainwane> ok, so I should have scp
<Coren> I.e. mv it before you take it.  :-)
<brainwane> I tried....
<Coren> Ah, your own home doesn't give permission to your tool.  :-)
<brainwane> $ mv biographies.txt /data/project/missing-from-wikipedia/public_html/missing-from-wikipedia/
<brainwane> mv: cannot create regular file `/data/project/missing-from-wikipedia/public_html/missing-from-wikipedia/biographies.txt': Permission denied
* dr0ptp4kt has quit (Quit: dr0ptp4kt)
<Coren> brainwane: Ah!  Your permissions are too strict then.
<brainwane> oh?
<Coren> brainwane: You probably want your tool's home to be g+w anyways; otherwise you can't properly have more than one maintainer without a lot of trouble.
<brainwane> g+w? groupwrite?
<Coren> brainwane: from your tool account, "chmod -R g+w ~"
<brainwane> got it. done
<Coren> That'll give your tool maintainers write access to the tool, and that includes yourself.  :-)
<brainwane> cool. was that in the docs and I missed it?
<Coren> That's normal unixy stuff which we mostly don't cover in the labs-specific docs.
<brainwane> :/
<Coren> It would be good if we could find a nice tutorial we could link to though.
<brainwane> I've been using Unix since 1999 and I missed this.
<brainwane> I presumed that creating a tool account would automatically give me as maintainer the necessary permissions.
<Coren> brainwane: It does; but if you did things like use some scp clients or unpacked tarballs with restrictive permissions, it'll override.

Sumana Harihareswara, a volunteer (talk) 23:30, 3 December 2013 (UTC)

Python files in public_html

How are they handled? mod_python? wsgi? I am unable to create a working hello-world page. --Chricho (talk) 18:03, 1 January 2014 (UTC)

Ah, it is run using cgi, too… --Chricho (talk) 18:22, 1 January 2014 (UTC)

Docs inaccuracies?

The Tools/Help page says that the public_html directory is at ~/public_html. Well, that is not correct, it is in fact at /data/project/mytool/public_html, so you need to "become" the tool for it to be in your home directory. It would be nice if this were clarified at the first location in the doc where public_html is mentioned. Thanks! Oleg Alexandrov (talk) 05:53, 2 January 2014 (UTC)

~ is the abbreviation for an (any) account's home directory (as ~account is an abbreviation for a specific account's home directory). Tool accounts have their home directories at /data/project/$TOOL, so ~/public_html is the subdirectory of the tool's home directory.
The first (:-)) mention of public_html even says:
Note that individual tool accounts have both a ~/public_html/ and a ~/cgi-bin/ directory in the home directory for storing Web files.
If you feel that the documentation could be improved by a clarification, be bold! :-) It's probably easier for you to find a wording that is easier understandable than for someone who is already used to the terminology. --Tim Landscheidt (talk) 16:15, 3 January 2014 (UTC)

Access databases from PHP

How do I access databases from PHP? Also, not sure if it should, but /data/svwiktionary/ doesn't contain a password to use (I might have deleted it a long time ago if it did).

I'm interested in both the replicas and using my own db. I haven't yet created my own db, but intend to at some point. skalman (talk) 10:33, 25 January 2014 (UTC)

I had apparently reset the password in /data/svwiktionary/ to something of my choice, so I incorrectly assumed that the "correct password" was lost. However, it seems like it works with the password if I use the password that's in there. skalman (talk) 16:32, 25 January 2014 (UTC)

PHP upgrade

Currently PHP 5.3.10 is installed. Would it be possible to upgrade to 5.5? I don't have any hard dependencies, but I'm lazy and it would be nice to use the shorter array syntax, especially for nested arrays. Furthermore, version 5.3 will be unsupported by July this year. skalman (talk) 22:24, 25 January 2014 (UTC)

A newbie issue: WinSCP and Putty

When using Toolserver, I was used to open two windows - a Putty console and a WinSCP connection. I did the same into Labs with much frustration by obvious mismatch of logged user, since when using WinSCP I am "alebot", while while using Putty i "become itsource". Files were owned by one user, or another one, a simple and obvious evidence for unix skilled users but an enigma for persistent newbies as I am getting frustrating messages when trying to edit anything using the wrong user.

The solution of such an enigma was so obvious that no one of skilled users I asked for help gave me its banal solution.... I had to run a ls -l command to grab the question. --Alex brollo (talk) 09:11, 3 March 2014 (UTC)

Shell access vs Tools access - doc is confusing

In section Getting access to Tool Labs it says I automatically requested shell access and after some time I was added to group "shell". So far OK.

But then it mentions I should request Tools access, but says later (#Notification) that "You will also receive email explaining that your user rights have been changed and that you are now a member of the group 'shell'". But I already have it. Confusing. That's a typo, right? --Eccenux (talk) 20:55, 2 May 2014 (UTC)

Well, from the perspective of the help page, everything is in the future :-). The whole section is very confusing and TL;DR, especially with the technicalities of generating an ssh key thrown in between. IMHO it should be written much simpler and actionable:
  1. Sign up at wikitech for a Labs account.
  2. Fill out an access request for the Tools project.
  3. Wait for the request to be completed.
  4. Generate/upload an ssh key, etc.
However, to me this is all very natural, so I don't want to axe stuff that is essential for others. But: It's a wiki! :-) --Tim Landscheidt (talk) 00:50, 3 May 2014 (UTC)

.htaccess is ignored

Is there a way to mess up the setup of a project? Because what's described on this page about .htaccess just doesn't work. The file is ignored, no matter what I put in it (it should at least bring up a 500). --TMg (talk) 15:58, 10 May 2014 (UTC)

No, you didn't do anything wrong. The information about .htaccess refers to the time when we used Apache and is now obsolete. I'll delete some of it later. Our lighttpd setup uses ~/.lighttpd.conf (caution: You need to webservice restart after changing it). --Tim Landscheidt (talk) 16:25, 10 May 2014 (UTC)


The page currently says "Any script or job invoked with jlocal should not be running more than a few seconds and use minimal resources". I have a script that submits multiple jobs, waits for them to complete, then concatenates the results for emailing to me. So it uses minimal resources (most of its wall time is sleeping) but could be running for more than a few seconds if it takes more than a few seconds for the grid to actually schedule and run the submitted jobs. Is this still allowed? Anomie (talk) 14:24, 12 May 2014 (UTC)

Yeah, that's okay. — Coren/Marc (talk) 18:47, 16 July 2014 (UTC)

Bad MySQL performance?

Hi, I have a MySQL table with about 50,000 items, but queries are already slow. All the indexes are there, and the same software is fast on two other machines with an even larger amount of data. "Slow" means that a click on the web page, which causes three or four queries, sometimes takes several seconds. An example page is this one. The queries aren't totally trivial (group by), but they should be fast anyway, like they are on other machines. Are there known performance problems? Any idea what to do? --Dnaber (talk) 17:05, 21 June 2014 (UTC)

Found it out myself: the table was using InnoDB. Using MyISAM, it's as fast as expected. --Dnaber (talk) 20:48, 21 June 2014 (UTC)

Log Rotation?

Hi, what's the proper way to rotate the logs of the jobs started with "jstart"? I tried "logrotate", but when it removes the old files, the system doesn't catch up and new logs are not created (and not used when I run "touch appname.out"). --Dnaber (talk) 18:00, 23 June 2014 (UTC)

Tomcat crashing?

Our Tomcat application had crashed several times, sometimes without any notice in the error logs (~/error.log and public_tomcat/logs/*). By "crash" I mean that it simply didn't run anymore (application not available and no Tomcat job shown with the "qstat" command). What could be the cause for this? Could the "webservice" script be modified to automatically restart Tomcat if needed, like jobs started with "jstart"? --Dnaber (talk) 08:42, 4 July 2014 (UTC)

Some of the crashes are due to "There is insufficient memory for the Java Runtime Environment to continue.", so it seems the the JVM doesn't get enough memory from the system. The JAVA_OPTS I set in public_tomcat/bin/ say that the JVM gets 300M. --Dnaber (talk) 16:12, 7 July 2014 (UTC)
It seems it was a memory problem - I have just added a solution that works for me to --Dnaber (talk) 12:45, 29 July 2014 (UTC)

Hay's tool for metadata?

Is someone able to add some information about Hay's tool to add metadata? Seems that as it is now operational, that the addition of some text for this would be beneficial. Thanks. Billinghurst (talk) 11:14, 30 August 2014 (UTC)


This page has grown organically, and I think a few issues have crept up on us:

To address this, I'd like to rename this page to "Help:Tool Labs", break it apart into more specific subpages, pull in useful separate info, and expand missing documentation. This page would become an index that looks something like this:

Tool Labs is a reliable, scalable hosting environment for community developers working on tools and bots that help users maintain and use wikis. The cloud-based infrastructure was developed by the Wikimedia Foundation and is supported by a dedicated group of Wikimedia Foundation staff and volunteers. Tool Labs is a part of the Labs project.


  • What is Tool Labs? — learn the core concepts (rationale, features, architecture & terminology).
  • Rules — learn about the rules that apply to all tools we host (licensing, privacy, resource usage, etc).

Getting started

  • Getting started with Tool Labs — get started as a Tool Labs developer (request access, SSH login, create or join a tool instance).
  • Getting started with...
    • Java — create a basic Java application using Tomcat.
    • node.js — create a basic node.js application using the webgrid.
    • PHP — create a basic PHP application with translations using webservice.
    • Python — create a basic Python tool using Flask and MVC.
  • Accessing the databases — querying Wikimedia SQL databases, creating user databases, and using Redis.
  • Publishing your tools — add your tools to the public directory so others can find them.

How to

I think this format would be easier to maintain and use, would make it easier to add missing information, and would let us go into more depth on specific pages.

Any thoughts or objections? —Pathoschild 08:33, 09 November 2014 (UTC)

Let me point out that "assumes you're using PHP" is incorrect: it's just assumed that Intuition can and should be made to work with whatever one is using.
Splitting and reorganising is nice, but I hope you don't mean to create 15 separate pages! For a gradual approach, I suggest that you move the page and reshuffle its sections across different parent sections (but keeping current section titles so that links don't break), and after that split out as much content as possibly to existing pages like Help:Git and others. Then it will be easier to see how much content is left and how to reorganise it. --Nemo 19:58, 9 November 2014 (UTC) P.s.: A half-page index can and should be on Help:Contents; one outdated index is enough.
I do think we should split the page; the scope is too extensive to cover with a single page. We can use supersections instead if that's preferred, but I think that would be less effective. —Pathoschild 20:29, 09 November 2014 (UTC)
I moved the page since that seems uncontroversial. —Pathoschild 03:09, 10 November 2014 (UTC)
I don't particularly like the split-up. Yesterday I was looking for the syntax of .lighttpd.conf, and I had to go to a sub-page (that I needed to add to my watchlist) to see the file mentioned. But even there a search for the bit that I remembered came up empty because the advice I was looking for was hidden in a collapsed "Extended instructions" section.
I think the overarching problem is that the page(s) address a very wide range of audiences. Some pieces seem to have a total novice in mind, while others present concepts that you need very advanced skills to understand. --Tim Landscheidt (talk) 17:41, 13 November 2014 (UTC)

Jobs submitted to the grid via 'cron'


I don't want to receive an email for each job working with 'cron', so what I should do? Thank you, Automatik (talk) 19:21, 25 December 2014 (UTC)

If you use jsub, you can use the -quiet option to suppress output unless the job submission failed. --Tim Landscheidt (talk) 15:52, 28 December 2014 (UTC)
Thanks for your help! --Automatik (talk) 20:04, 30 December 2014 (UTC)

Most recent dumps


The most recent dump for frwiktionary is available since 15 hours [6], but the timestamp for this dump is not yet available in /public/dumps/public. Could not it be faster? Automatik (talk) 13:49, 5 January 2015 (UTC)


I'm not necessarily convinced that promoting & Co. is a good idea as apart from DNS the user encounters tools-login & Co. everywhere else, notably at the prompt. --Tim Landscheidt (talk) 21:08, 8 January 2015 (UTC)

Main contributor(s)

Hi, is there a tool to list all articles in a language version of wikipedia, in which a certain user is the main contributor? Cheers, --Ghilt (talk) 20:14, 26 January 2015 (UTC)

Missing table in DB

Hi Tool Labs, even as a newby i could easily connect to the replica by following the described steps. However, what i now wonder about is where i can find the wikitexts of pages? The DB scheme describes that there is a text table but i cannot find it on the replica. Any help appriciated... Thank you. --Arnd (talk) 12:30, 1 February 2015 (UTC)

In the WMF cluster, the article contents are not stored in the database, but on separate servers. These are neither replicated to Labs nor can they be directly accessed otherwise. You need to use …/w/index.php?title=…&action=raw (for simple cases) or the API (…/w/api.php) for that. You can find an example here. Remember to change the …?format= parameter to the format your script should use. --Tim Landscheidt (talk) 22:03, 1 February 2015 (UTC)
Thank you for the answer. This REST-API approach seems rather slow when looking for special patterns in articles. Maybe then its better to use a data dump instead. --Arnd (talk) 18:18, 4 February 2015 (UTC)

Need to update page

I was following a link to a tool that apparently no longer exists and eventually ended up at the Tools page. But when I clicked on most of the links, I received error messages or messages that the tool or page no longer exists. Is there any way this page can be kept up-to-date and tools that no longer exist or function can be removed from the list? It would really increase the usefulness of this list for editors. Liz (talk) 18:14, 21 February 2015 (UTC)

Which "most" links do not work for you? The list is created dynamically inter alia by looking at whether a tool has a running webservice. So if a tool's link yields an error message, you should contact its maintainer directly. --Tim Landscheidt (talk) 04:25, 22 February 2015 (UTC)