Help:Toolforge/Web

From Wikitech
Jump to: navigation, search
Toolforge HelpFAQRulesDevelopingDatabasesJob gridKubernetesWebHow toList of ToolsGlossary

Web Service Introduction

Every tool can have a dedicated web server running on either the job grid or kubernetes. The default 'lighttpd' webservice type runs a lighttpd web server configured to serve static files and PHP scripts from the tool's $HOME/public_html directory.

You can start a tool's web server with the webservice command:

$ become my_cool_tool
$ webservice start

You can also use the webservice command to stop, restart, and check the status of the webserver.


In addition to the default 'lighttpd' lighttpd+PHP type, webservice provides types that support easily running python2 and python3 wsgi, nodejs, and tomcat web services. It is also possible to run a custom webserver process with the 'generic' type (e.g. to run a Scala-based tool).


Common issues

Static file server

Static files in a tool's www/static directory are available directly from the URL tools-static.wmflabs.org/toolname. This does not require any action on the tool's part — putting the files in the appropriate folder (and making the directory readable) should 'just work'. You can use this to quickly serve static assets (CSS, HTML, JS, etc).

External assets

Tools are recommended to avoid embedding assets (images, css, javascript) from servers outside of Wikimedia Foundation control to preserve the privacy of our users.

Libraries
Toolforge provides a mirror of cdnjs. Browse libraries
Fonts
Toolforge provides a reverse proxy to Google fonts. Search fonts
Maps
Wikimedia provides maps servers with data from OpenStreetMap. Documentation

Using HTTP cookies

Since all tools in the 'tools' project reside under the same domain, you should prefix the name of any cookie you set with your tool's name and if possible also add a Path attribute to limit the URLs that the browser will send the cookie back to.

You should be aware that cookies you set may be read by every other tool your user visits. Accordingly, you should avoid storing privacy-related or security information in cookies. A simple workaround is to store session information in a database, and use the cookie as an opaque key to that information.

Memory limit

The default memory limit for grid engine webservice jobs is 4G. For Kubernetes the default limit is 2G for most runtimes (Java's limit is 4G).

If your tool needs more memory, additional quota can be requested by opening a Phabricator task requesting an increase and optionally notifying people on the #wikimedia-cloud connect freenonde irc channel that you have done so. An administrator can create a /data/project/.system/config/$TOOLNAME.web-memlimit configuration file that will adjust the limit.

Response buffering

An Nginx proxy sits between your webservice and the user. By default this proxy buffers the response sent from your server. For some use cases, including streaming large quantities of data to the browser, this can be undesirable. Buffering can be disabled on a per-request basis by sending a X-Accel-Buffering: no header in your response.[1]

Backends

Toolforge provides two different execution environments for web services: Grid Engine and Kubernetes. Generally the Toolforge administrators would recommend that new tools try using Kubernetes first and only use the Grid Engine backend if they find a technical limitation that prevents them from using Kubernetes. The Kubernetes backend generally provides more modern software versions and will eventually be the default environment. The main drawback is that Kubernetes webservices can not spawn additional jobs on the job grid.

Things that remain the same

  1. Your workflow. You still ssh in, use become, and hack on code as usual
  2. Logging access. Same locations as before, same behavior as before
  3. Replica DB / Dumps access

Grid Engine

The Grid Engine backend runs your web service as a grid job on a Ubuntu Trusty grid exec node. This is similar to the way that jsub runs any grid job you submit, but there is a separate exec queue on the grid for running jobs started by webservice.

Kubernetes

Kubernetes (k8s) is a a platform for running containers that is slowly replacing the grid engine in Toolforge. Kubernetes webservices have access to newer versions of most software than the grid engine provides. K8s also provides a more robust system for restarting tools manually or automatically following an application crash.

User visible differences from GridEngine based webservices

  1. Each process runs inside a Docker container, orchestrated by Kubernetes.
    • Provides better resource isolation (one tool can not take down other tools by consuming all RAM or CPU)
    • Better health checking (monitoring built into Kubernetes, not a hack we wrote)
    • Less complex proxy setup, leading to fewer proxy related outages / issues
  2. Containers based on Debian Jessie
    • Newer software versions than those available with Ubuntu Trusty or Precise
    • Better support from Wikimedia TechOps team
  3. Less NFS surface exposed
    • /home is not mounted for web services
    • No /shared - use /data/project/shared instead. The latter works on both gridengine and kubernetes, the former only on gridengine.
    • /public/dumps and /data/scratch are mounted the same way
  4. It is not possible to interact with the gridengine from Kubernetes (no jsub...)


Switching between GridEngine and Kubernetes

You can switch between the backends to make sure your code works fine between them.

From GridEngine to Kubernetes

 webservice --backend=gridengine stop
 webservice --backend=kubernetes start

From Kubernetes to GridEngine

webservice --backend=kubernetes stop
webservice --backend=gridengine start


Default web server (lighttpd + PHP)

This is a brief summary of the /Lighttpd documentation page.


  • webservice --backend=kubernetes php5.6 start|stop|restart|shell
  • webservice --backend=gridengine lighttpd start|stop|restart
  • webservice --backend=gridengine lighttpd-plain start|stop|restart

Lighttpd is the HTTP server used by both the lighttpd and lighttpd-plain types supported by webservice. These types are supported by both the Grid Engine and Kubernetes backends.

  • Error logs from the lighttpd process are stored in $HOME/error.log
  • PHP scripts are automatically run using a FastCGI helper process.
  • The lighttpd web server is configurable (including adding other FastCGI handlers). A $HOME/.lighttpd.conf file can be used to change the default configuration.
  • Everything runs as the tool user, regardless of file ownership.

The web server reads any configuration in $HOME/.lighttpd.conf, and merges it with the default configuration. Most tools will not need custom configuration.

See our lighttpd help page for more detailed information.

PHP

The lighttpd webservice type includes support for running PHP scripts from files with a .php in $HOME/public_html using a FastCGI helper process.

Use webservice --backend=kubernetes php5.6 start|stop|restart|shell to run a PHP based webservice on Kubernetes. See Kubernetes PHP documentation for more details.

Python (uWSGI)

uWSGI is a Web Server Gateway Interface server for Python2 and Python3 web applications. It is commonly used to run applications built with Flask, Django, or other Python web application frameworks.

webservice --backend=kubernetes python
Python3 with a default uwsgi configuration
webservice --backend=kubernetes python2
Python2 with a default uwsgi configuration
webservice --backend=gridengine uwsgi-python
Python2 on Grid Engine with a default uwsgi configuration
webservice --backend=gridengine uwsgi-plain
Python2 or Python3 on Grid Engine with a user supplied uwsgi configuration

Default uwsgi configuration

The uwsgi-python, python, and python2 types share a common uWSGI configuration designed to make it easy to deploy a typical Python webservice. This uses a convention over configuration design with the following expectations:

  • Your application will have a wsgi entry point in $HOME/www/python/src/app.py in a variable named app (example).
  • Python libraries will be loaded from a virtualenv located in $HOME/www/python/venv.
  • Custom configuration for uWSGI in ini file form will be loaded from $HOME/www/python/uwsgi.ini
    • Examples of configuration parameters can be found in the uWSGI manual.
    • Headers can be added using route = .* addheader:Access-Control-Allow-Origin: *
  • Logs will be written to $HOME/uwsgi.log

python (Python3 + Kubernetes)

  • webservice --backend=kubernetes python start|stop|restart|shell

See Default uwsgi configuration for general information. More information is also available at Help:Toolforge/Kubernetes#python (uwsgi + python3.4).

This is running python3.4 with virtualenv support - you must use a virtualenv for installing your libraries.

Using virtualenv with webservice shell

You need to setup and use a new virtualenv. You can do so with the following:

For new projects

First, get your python code setup so that your app.py file lives under ~/www/python/src. Then...

  1. webservice --backend=kubernetes python shell
  2. mkdir -p ~/www/python
  3. python3 -m venv ~/www/python/venv
  4. source ~/www/python/venv/bin/activate
  5. pip install --upgrade pip (This brings in newest pip, which is required for wheel support)
  6. Install the libraries you need (e.g. pip install -r ~/www/python/src/requirements.txt)
  7. exit out of webservice shell
  8. webservice --backend=kubernetes python start

For python2 projects, use python2 -m virtualenv in step 3.

Moving an existing project

If you are already running a python3 webservice using uwsgi-plain on the job grid:

  1. Make a backup of your current venv: mv ~/www/python/venv ~/www/python/venv.gridengine
  2. Move your uwsgi.ini file away as well: mv ~/www/python/uwsgi.ini ~/www/python/uwsgi.ini.gridengine
  3. Follow the instructions #For new projects
  4. Before doing webservice --backend=kubernetes python start, you have to do a webservice --backend=gridengine stop
  5. To switch back to gridengine, you can do:
    1. mv ~/www/python/venv ~/www/python/venv.k8s
    2. mv ~/www/python/venv.gridengine ~/www/python/venv
    3. mv ~/www/python/uwsgi.ini.gridengine ~/www/pytho/uwsgi.ini
    4. webservice --backend=kubernetes stop
    5. webservice --backend=gridengine uwsgi-plain start

The fundamental thing to remember is that virtualenvs created straight on the bastion work only with gridengine, and virtualenvs created inside webservice shell work only with kubernetes.

Once you are done migrating and are happy with it, you can delete your venv & uwsgi.ini backups.

Installing numpy / scipy / things with binary dependencies

If your package with binary dependencies has a manylinux1 wheel, you can directly install it with pip quickly and with minimum hassle. You can check if your package has a manylinux1 wheel by:

  1. Go to https://pypi.python.org/pypi
  2. Search for your package name in top right
  3. Find it in the list and click on it
  4. Look for packages that end in the string: cp34-cp34m-manylinux1_x86_64.whl
  5. If it exists, then this package is installable with a binary wheel!

You can install it by:

  1. webservice --backend=kubernetes python shell
  2. source ~/www/python/venv/bin/activate
  3. pip install --upgrade pip (This brings in newest pip, which is required for wheel support)
  4. pip install $packagename

Tada! You only need to do the pip install --upgrade pip once, after that you can install manylinux1 packages easily.

Note that this only applies if you are using a package with binary dependencies. Most python packages do not have binary dependencies (are pure python) and do not need this!


python2 (Python2 + Kubernetes)

  • webservice --backend=kubernetes python2 start|stop|restart|shell

See Default uwsgi configuration for general information.

uwsgi-python (Python2 + Grid Engine)

  • webservice --backend=gridengine uwsgi-python start|stop|restart

See Default uwsgi configuration for general information. Python 3 is not supported by this type, but see the section on uwsgi-plain below for an alternative.

uwsgi-plain (Python3 + Grid Engine)

  • webservice --backend=gridengine uwsgi-plain start|stop|restart

The uwsgi-plain type leaves configuration of the uWSGI service up to the tool's $HOME/uwsgi.ini configuration file. This allows users with unique requirements to tune the uWSGI service to work with their application. One reason to use this is if you must run a Python3 webservice on Grid Engine. A working config for a Python3 Flask app is documented in Phabricator task T104374.

Using a uwsgi app with a default entry point that is not app.py

The default uwsgi configuration for the uwsgi webservice backend expects to find the uwsgi entry point as the variable app loaded from the $HOME/www/python/src/app.py module. If your application has another entry point, the easiest thing to do is create a $HOME/www/python/src/app.py module, import your entry point, and expose it as app. See Making a Django app work for an example of this pattern.

Making a Django app work

By default your app.py should be in ~/www/python/src/. And contain:

import os

from django.core.wsgi import get_wsgi_application

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "<YOUR-TOOL-NAME>.settings")

app = get_wsgi_application()

To correctly locate the static files configure the place the uwsgi.ini into ~/www/python/uwsgi.ini. And add this setting:

[uwsgi]
check-static = /data/project/<YOUR-TOOL-NAME>/www/python

and in settings.py use:

STATIC_URL = '/<YOUR-TOOL-NAME>/static/'
STATIC_ROOT = os.path.join(BASE_DIR, 'static')

Then deploy your static files into ~/www/python/static

Logs

You can find the logs in ~/uwsgi.log on both platforms

node.js web services

  • webservice --backend=gridengine nodejs start|stop|restart
  • webservice --backend=kubernetes nodejs start|stop|restart|shell

Node.js can run fairly well on Toolforge including with websocket support. Using --backend kubernetes is recommended so your code is executed with node version v6.9.1. The Grid Engine backend provides a very old node version v0.10.25.

  1. Put your node application in $HOME/www/js in your tool's home directory.
  2. Make sure your server starts up properly when npm start is executed. The default way to do this is to name your main script server.js
  3. Your server should bind to a port that is passed in as an environment variable (PORT). You can access this via process.env.PORT. Without this your tool will not be found by the Nginx proxy.
  4. Run webservice --backend=kubernetes nodejs start to start your webserver (or webservice --backend=kubernetes nodejs restart to restart it after a code change)
  5. PROFIT! :)

This is an example code for a node.js web server running as a tool:

var http = require('http');
var port = parseInt(process.env.PORT, 10) ; // IMPORTANT!! You HAVE to use this environment variable as port!

http.createServer(function (req, res) {
	res.writeHead(200, {'Content-Type': 'text/plain'});
	res.end('Hello World\n');
}).listen(port);

Keeping this in $HOME/www/js/server.js and doing a webservice --backend kubernetes nodejs start should work

Troubleshooting

If you run into errors doing npm install, try LINK=g++ npm install

Java

Tomcat

  • webservice --backend=gridengine tomcat start|stop|restart

Before using Tomcat, you have to setup Tomcat by running setup-tomcat. This will create a local Tomcat installation at $HOME/public_tomcat/.

To deploy a Web Application Archive (WAR), move it to $HOME/public_tomcat/webapps/$TOOL.war where $TOOL is the name of your tool. Archive extraction, deployment, and configuration is done automatically by Tomcat. A Tomcat restart may be required. The application will be available at tools.wmflabs.org/$TOOL/.

To test the Tomcat webservice, you can use the Tomcat sample application (available on tomcat.apache.org).

When reading Tomcat tutorials, it is helpful to know that $CATALINA_HOME under our configuration is the $HOME/public_tomcat directory created by setup-tomcat. The default Tomcat classloader will read jar files such as a MySQL JDBC driver jar that are placed in $HOME/public_tomcat/lib (i.e. $CATALINA_HOME/lib).

Troubleshooting

If your Java application is more complex, the standard memory settings might not work. You might get errors like There is insufficient memory for the Java Runtime Environment to continue and Tomcat will simply stop working. See Help:Toolforge/Web § Memory limit for instructions on getting the runtime memory limit increased.

The settings for the JVM can be modified in public_tomcat/bin/setenv.sh. If the memory setting from JAVA_OPTS is too low, you'll get the well-known OutOfMemoryError from Java. In same cases, Tomcat may not stop anymore following an OOM error. Killing the grid engine job using qdel may be your only solution.

Play and similar JVM-based frameworks

  • webservice --backend=kubernetes jdk8 start|stop|restart|shell BINARY

Play Framework projects (and other JVM-based projects that have one executable to start the application) can be run on Toolforge. Play Framework uses JDK 8, so we need to use Kubernetes.

In order to work on Toolforge, the following Play configuration changes need to be made:

# Secret key
# ~~~~~
# The secret key is used to secure cryptographics functions.
# If you deploy your application to several instances be sure to use the same key!
# On Toolforge, we will make a startup script that specifies play.crypto.secret
# using a command line option reading from a private file.
play.crypto.secret="changeme"

# Port
# ~~~~~
# On WMF Toolforge, the port used by kubernetes webservice is 8000/TCP
http.port=8000

# HTTP context
# ~~~~~
# Your tool will be available at https://tools.wmflabs.org/$TOOLNAME/.
# Play usually expects to be operating at the root of a domain, so this setting is
# required for routes to work properly.
play.http.context="/$TOOLNAME/"

The application secret can be stored in a private file with 440 permissions.

After building the project, start your webservice using webservice --backend=kubernetes jdk8 start '$EXECUTABLE -Dplay.crypto.secret="$(cat /data/project/$TOOLNAME/app_secret)"'. For more details, see User:Sn1per/Play on Tool Labs.

Other / generic web servers

You can easily run other web servers that are not directly supported. This can be accomplished using the generic webservice type on the Grid Engine backend or a runtime specific type on the Kubernetes backend.

  • webservice --backend=gridengine generic start|stop|restart SCRIPT
  • webservice --backend=kubernetes golang start|stop|restart|shell SCRIPT
  • webservice --backend=kubernetes jdk8 start|stop|restart|shell SCRIPT
  • webservice --backend=kubernetes ruby2 start|stop|restart|shell SCRIPT
To start a webserver that is launched by a script at /data/project/toolname/code/server.bash, you would launch it with:
$ webservice --backend=gridengine generic start /data/project/toolname/code/server.bash
Your script will be passed an HTTP port to bind to in an environment variable named PORT. This is the port that the Nginx proxy will forward requests for https://tools.wmflabs.org/YOUR_TOOL to.

Ambox notice.png Note that your tool will receive URLs that include your tool prefix - e.g. /YOUR_TOOL/index.html instead of /index.html. You may need to adapt your tool configuration to handle this.

HHVM (experimental)

It is possible to run HHVM in proxygen mode as Generic webservice. Bryan Davis provided the following script (with some tweaks):

Copy the contents and save it as $HOME/hhvm-webservice.sh. Then start the HHVM process using webservice --backend=gridengine generic start $HOME/hhvm-webservice.sh.

This has been tested and works. However, this is just an experimental implementation and not recommended for production bots, specially since the proxygen mode has some drawbacks:

  • Documentation for configuring HHVM's proxygen webserver is lacking upstream. Information can be found, but it requires a lot of digging.
  • No obvious support for alias configuration to easily map https://tools.wmflabs.org/my-tool-name/ to the tool's $HOME/public_html. This can be worked around using hhvm.virtual_host[default][rewrite_rules] settings.
  • Multiple indexes files (index.php and index.html for example) is not supported yet (hhvm.server.default_document can be set only once; if set multiple times, only the last instance is used).

Running Hack-coded files

hhvm.hack.lang.look_for_typechecker option in the above script has been set to false in order to run Hack files without the Typechecker not running error. Please don't run hh_client in the Bastion or Grid servers; use your own HHVM installation instead.

Further information

See also

References

  1. https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/