Help:Tool Labs/Web

From Wikitech
Jump to: navigation, search
Tool Labs HelpFAQGetting startedRulesAccessDevelopingDatabasesJob gridWebHow toList of ToolsGlossary

Every tool can have a dedicated web server running on the job grid. The default configuration will run a lighttpd web server which serves static files and PHP scripts from the tool's $HOME/public_html directory.

Options are available for easily running tomcat, nodejs, and wsgi web services. It is also possible to run a custom webserver process (e.g. to run a Scala-based tool). See #Other / generic web servers below for more information.

  • Error logs from the webserver process are stored in $HOME/error.log
  • PHP scripts are automatically run using a FastCGI helper process.
  • The lighttpd web server is configurable (including adding other FastCGI handlers). A $HOME/.lighttpd.conf file can be used to change the default configuration.
  • Everything runs as the tool user, regardless of file ownership.
  • Similar to other Wikimedia servers, HTTP requests to Tool Labs require a User-Agent header (see also User-Agent policy on Meta).

Using cookies

Since all tools in the 'tools' project reside under the same domain, you should prefix the name of any cookie you set with your tool's name. In addition, you should be aware that cookies you set may be read by every other web tool your user visits.

Accordingly, you should avoid storing privacy-related or security information in cookies. A simple workaround is to store session information in a database, and use the cookie as an opaque key to that information. Additionally, you can explicitly set a path in a cookie to limit its applicability to your tool; most clients should obey the Path directive properly.

Default web server

You can start a tool's web server with the webservice command:

$ become my_cool_tool
$ webservice start

You can also use the webservice command to stop, restart, check the status of the webserver.


Configuring the web server

As it starts, the web server reads any configuration in $HOME/.lighttpd.conf, and merges it with the default configuration (which is likely to be adequate for most tools).

Ambox notice.png Sometimes merge fails if an option is already set in the default configuration. So instead of using   option = value   try   option += value.

Default configuration

This is the default (if you don't specify any other/additional settings in your tool's .lighttpd.conf)

See lighttpdwebservice.py in operations/software/tools/webservice for the canonical configuration.

Example configurations

FCGI Flask config
fastcgi.server += ( "/gerrit-patch-uploader" =>
    ((
        "socket" => "/tmp/patchuploader-fcgi.sock",
        "bin-path" => "/data/project/gerrit-patch-uploader/src/gerrit-patch-uploader/app.fcgi",
        "check-local" => "disable",
        "max-procs" => 1,
    ))
)

For Flask, the fcgi handler looks like this: https://github.com/valhallasw/gerrit-patch-uploader/blob/master/app.fcgi

URL rewrite
Documentation: ModRewrite

Note that rewrite rules always execute before redirect rules (regardless of their order in the config file).

url.rewrite-once += ( "/toolname/id/([0-9]+)" => "/toolname/index.php?id=$1",
                      "/toolname/link/([a-zA-Z]+)" => "/toolname/index.php?link=$1" )

If you are rewriting the tool's entire path (as is common where an application will handle URL routing), don't forget that you may also need to rewrite files such as stylesheets to ensure they can still be accessed.

url.rewrite-once = (
    ".*\.(js|css)" => "$0",
    "^/toolname(/.*)" => "/toolname/index.php$1"
)

The "$0" matches the entire match from the left-hand side.

Header, mimetype, character encoding, error handler
# Allow Cross-Origin Resource Sharing (CORS) 
setenv.add-response-header  += ( "Access-Control-Allow-Origin" => "en.wikipedia.org",
                                 "Access-Control-Allow-Methods" => "POST, GET, OPTIONS" )

# Set cache-control directive for static files and resources
$HTTP["url"] =~ "\.(jpg|gif|png|css|js|txt|ico)$" {
	setenv.add-response-header += ( "Cache-Control" => "max-age=86400, public" )
}

mimetype.assign  += (
    # Add custom mimetype
    ".bulk"  => "text/plain",
    # Avoid [[Mojibake]] in JavaScript files
    ".js"   => "application/javascript; charset=utf-8",
    # Default MIME type with UTF-8 character encoding
    ""      => "text/plain; charset=utf-8"
)

# Add custom error-404 handler
server.error-handler-404  += "/error-404.php" 

Details: ModSetEnv  Mimetype-Assign   Error-Handler-404   HTTP access control (CORS)

Directory or file index
# Enable basic directory index
$HTTP["url"] =~ "^/?" {
	dir-listing.activate = "enable"
}
Deny access to hidden files
# Deny access to hidden files
$HTTP["url"] =~ "/\." {
	url.access-deny = ("")
}

Details: ModAccess

Custom index
# Enable index for specific directory 
$HTTP["url"] =~ "^/download($|/)" {
	dir-listing.activate = "enable" 
}

# Custom index file or custom directory generator
index-file.names += ("index.py")

Details: ModDirlisting

Request logging
Documentation: DebugVariables

Add the line:

# Enable request logging
debug.log-request-handling = "enable"

The debug output will be written to the error.log file.

Apache-like cgi-bin directory

Add the following stanza:

$HTTP["url"] =~ "^/your_tool/cgi-bin" {
	cgi.assign = ( "" => "" )
}

This does require that cgi-bin be under your public_html rather than alongside it.

To run CGI from any directory under your public_html only need this one line (w/out the $HTTP["url"] .. block)

cgi.assign += ( ".cgi" => "" )

The part to the left is the file name or extension ("" = any). The part to the right is the program which will run it ("" = any). Another example

cgi.assign += ( "script.sh" => "/bin/bash" )
Enable status and statistics
# modify <toolname> for your tool
# this will enable counters  http://tools.wmflabs.org/<toolname>/server-status (resp: .../server-statistics)
server.modules += ("mod_status")
status.status-url = "/<toolname>/server-status"
status.statistics-url = "/<toolname>/server-statistics"

Details: ModStatus

Web logs

Your tool's web logs are placed in the tool account's $HOME/access.log in common format. Please note that the web logs are anonymized in accordance with the Foundation’s privacy policy. Each user IP address will appear to be that of the local host, for example. In general, the privacy policy precludes the logging of personally identifiable information; special permission from Foundation legal counsel is required if such information is required.

Error logs can be found in the tool account's $HOME/error.log; this includes the standard error of invoked scripts.

Error pages

The proxy provides its own error pages when your application returns HTTP/500, HTTP/502 or HTTP/503. This behavior is currently under review, and might change in the near future.

You can bypass the proxy error pages by passing an X-Wikimedia-Debug header.

Changing the document root

With symlinks

The easiest way to change the document root is with a symlink to $HOME/public_html. However, before this is done, the existing public_html directory needs to be deleted or moved. This is because ln -s $HOME/foo $HOME/public_html would make $HOME/public_html/foo if the $HOME/public_html directory exists. Deleting the directory is done with rm -rf $HOME/public_html to delete the directory and all of it's contents, if you do not need anything in there, or with mv $HOME/public_html $HOME/oldpublic_html to move the directory to $HOME/oldpublic_html.

To make the symlink, ln -s $HOME/foo $HOME/public_html/ would make the contents of $HOME/foo available in $HOME/public_html. Replace $HOME/foo in the example with the directory you want lighttpd to serve.

With aliases

Note that you cannot add an alias URL for /toolname because this has already been defined and can't be overridden in the local conf file. You can add an alias for subdirectories with:

alias.url += ("/toolname/subdir" => "/data/project/toolname/physical/path/of/subdir/")

Static file server

Static files in a tool's www/static directory are available directly from the URL tools-static.wmflabs.org/toolname. This does not require any action on the tool's part — putting the files in the appropriate folder (and making the directory readable) should 'just work'. You can use this to quickly serve static assets (CSS, HTML, JS, etc).

Kubernetes

Kubernetes (k8s) is a a platform for running containers that is slowly replacing the grid engine in Tool Labs. Kubernetes webservices have access to newer versions of most software than the grid engine provides. K8s also provides a more robust system for restarting tools manually or automatically following an application crash.

Use of Kubernetes is currently available to beta testers. See the help page for Kubernetes webservices for more detailed information.

PHP

The default lighttpd configuration includes support for running PHP scripts found in $HOME/public_html using a FastCGI helper process.

PHP ini settings such as setting the default timezone with date.timezone can be set for your tool by creating a $HOME/public_html/.user.ini configuration file. See documentation at php.net for more details.

node.js web services

NodeJS (with websocket support) can run fairly well on toollabs now. They all run on trusty nodes, and run node version v0.10.25.

  1. Use trusty.tools.wmflabs.org as bastion for everything.
  2. Put your node application in $HOME/www/js in your tool's home directory.
  3. Make sure your server starts up properly when npm start is executed. The default way to do this is to name your main script server.js
  4. Your server should bind to a port that is passed in as an environment variable (PORT). You can access this via process.env.PORT. Without this your tool would not work.
  5. Run webservice nodejs start to start your webserver (or webservice nodejs restart to restart it after a code change)
  6. PROFIT! :)

This is an example code for a node.js web server running as a tool:

var http = require('http');
var port = parseInt(process.env.PORT, 10) ; // IMPORTANT!! You HAVE to use this environment variable as port!

http.createServer(function (req, res) {
	res.writeHead(200, {'Content-Type': 'text/plain'});
	res.end('Hello World\n');
}).listen(port);

Keeping this in $HOME/www/js/server.js and doing a webservice nodejs start should work

Troubleshooting

If you run into errors doing npm install, try LINK=g++ npm install

Java (Tomcat)

Similar to the lighttpd webservice, there is also a Tomcat webservice for Java applications.

Before using Tomcat, you have to setup Tomcat:
$ setup-tomcat
This will create a local Tomcat installation at $HOME/public_tomcat/. You can manage the Tomcat webservice similar to lighttpd using webservice tomcat (start|stop|restart). (Note: If there is a running lighttpd webservice, Tomcat won’t work.)

To deploy a Web Application Archive (WAR), move it to $HOME/public_tomcat/webapps/$TOOL.war where $TOOL is the name of your tool. Archive extraction, deployment, and configuration is done automatically by Tomcat. A Tomcat restart may be required. The application will be available at tools.wmflabs.org/$TOOL/.

To test the Tomcat webservice, you can use the Tomcat sample application (available on tomcat.apache.org).

When reading Tomcat tutorials, it is helpful to know that $CATALINA_HOME under our configuration is the $HOME/public_tomcat directory created by setup-tomcat. The default Tomcat classloader will read jar files such as a MySQL JDBC driver jar that are placed in $HOME/public_tomcat/lib (i.e. $CATALINA_HOME/lib).

If your Java application is more complex, the standard memory settings might not work. You might get errors like There is insufficient memory for the Java Runtime Environment to continue and Tomcat will simply stop working. In that case, try to copy /usr/bin/webservice to your home directory and adapt the memlimit setting to e.g. 6g and then use your copy to restart the service. The way to adjust the memory allocation on grid engine using the webservice script is by asking for a /data/project/.system/config/$TOOLNAME.web-memlimit configuration file to be created granting a higher limit to your tool on Phabricator or on #wikimedia-labs connect. On the other hand, the settings for the JVM can be modified in public_tomcat/bin/setenv.sh. If that memory setting from JAVA_OPTS is too low, you'll get the well-known OutOfMemoryError from Java. In same cases, Tomcat may not stop anymore then. You can use ssh tools-webgrid-tomcat and kill your Tomcat process manually (only consider this if you know what you're doing).

Python 2 (uWSGI)

There is specialized support for Python uWSGI applications (such as Flask) provided via PythonWebService.

Place your application in $HOME/www/python/src/app.py in a variable named "app" (example).

Create a virtualenv in $HOME/www/python/venv (virtualenv $HOME/www/python/venv). This should be created on a trusty bastion as the code will be run on a trusty instance.

Run webservice uwsgi-python start and watch your application run!

You can put custom additional config for uWSGI in ini file form in $HOME/www/python/uwsgi.ini

The logs will be in $HOME/uwsgi.log

Python 3 is not yet supported, unfortunately, but see the section on uwsgi-plain below. See also phab:T104374 for status and a workaround.

Python3

See Help:Tool Labs/Web/Kubernetes#python (uwsgi + python3.4)

Play and similar JVM-based frameworks

Play Framework projects (and other JVM-based projects that have one executable to start the application) can be run on Tool Labs. Play Framework uses JDK 8, so we need to use Kubernetes.

In order to work on Tool Labs, the following Play configuration changes need to be made:

# Secret key
# ~~~~~
# The secret key is used to secure cryptographics functions.
# If you deploy your application to several instances be sure to use the same key!
# On Tool Labs, we will make a startup script that specifies play.crypto.secret
# using a command line option reading from a private file.
play.crypto.secret="changeme"

# Port
# ~~~~~
# On WMF Tool Labs, the port used by kubernetes webservice is 8000/TCP
http.port=8000

# HTTP context
# ~~~~~
# Your tool will be available at https://tools.wmflabs.org/$TOOLNAME/.
# Play usually expects to be operating at the root of a domain, so this setting is
# required for routes to work properly.
play.http.context="/$TOOLNAME/"

The application secret can be stored in a private file with 440 permissions.

After building the project, start webservice using webservice --backend=kubernetes jdk8 start '$EXECUTABLE -Dplay.crypto.secret="$(cat /data/project/$TOOLNAME/app_secret)"'.

For details, see User:Sn1per/Play on Tool Labs.

Other / generic web servers

You can easily run other web servers that are not directly supported. This can be accomplished using the generic webservice type.

To start a webserver that is launched by a script at /data/project/toolname/code/server.bash, you would launch it with:

webservice generic start /data/project/toolname/code/server.bash

The launched script will get an environment variable called PORT that it must use to find which port to bind to. That is all!

Ambox notice.png Note that your tool will receive URL's that include your tool prefix - e.g. /MyToolName/index.html instead of /index.html. You may need to adapt your tool configuration to handle this.

Web proxy servers

The proxies are currently: Tools-proxy-01.tools.eqiad.wmflabs (10.68.21.49), Tools-proxy-02.tools.eqiad.wmflabs (10.68.21.81)