User:BryanDavis/Sandbox/Help:Toolforge/Web
![]() | Help improve documentation for this page: https://phabricator.wikimedia.org/T232407 |
![]() | WMCS is in the process of transitioning from grid engine to Kubernetes for all Web services. You are encouraged to run Web services on the Kubernetes platform when possible. |
Overview
This page introduces information about running a Web server on Toolforge. In our local jargon, these are also often called web services or webservices, a name possibly inspired by the webservice
command line tool that is used to start, stop, and change other features for a web server.
Web Service Introduction
Every tool can have a dedicated web server running on either Kubernetes or if necessary the Grid Engine.
You are encouraged to run web services on Kubernetes.
More information about the differences between Grid Engine and Kubernetes later in this documentation.
Using webservice command
You can use the webservice
command to start, stop, restart, and check the status of your tool's web server.
$ become my_cool_tool
$ webservice start
Use webservice --help
to get a full list of arguments.
With no other arguments or configuration, webservice start
will start a web server using the php7.3 runtime on Kubernetes. The document root for this lighttpd web server is $HOME/public_html
. You will have to create this directory yourself.
Common issues
Static file server
Static files in a tool's $HOME/www/static
directory are available directly from the URL tools-static.wmflabs.org/toolname
. This does not require any action on the tool's part — putting the files in the appropriate folder (and making the directory readable) should 'just work'.
You can use this to serve static assets (CSS, HTML, JS, etc) or to host simple websites that don't require a server-side component.
External assets
To preserve the privacy of our users, avoid embedding assets (images, css, javascript) from servers outside of Wikimedia Foundation control .
- Libraries
- Toolforge provides an anonymizing reverse proxy to cdnjs. Browse libraries
- Fonts
- Toolforge provides an anonymizing reverse proxy to Google fonts. Search fonts
- Maps
- Wikimedia provides maps servers with data from OpenStreetMap. Documentation
Using HTTP cookies
Since all tools in the 'tools' project reside under the same domain, prefix the name of any cookie you set with your tool's name, and if possible, add a Path
attribute to limit the URLs that the browser will send the cookie back to.
Do not store privacy-related or security information in cookies. Cookies may be read by every other tool your user visits. A simple workaround is to store session information in a database, and use the cookie as an opaque key to that information.
Default tool memory limits for Web service jobs
- Grid Engine: 4GiB
- Kubernetes: 2GiB for most runtimes (Java's limit is 4GiB).
Requesting additional tool memory
![]() | Currently tool memory limits can only be adjusted for Grid Engine Web services (T183436). |
- Request more tool memory by opening a Phabricator task
- Notify #wikimedia-cloud connect freenode irc channel that you have filed a request.
A Cloud Services administrator will review your request and can create a /data/project/.system/config/$TOOLNAME.web-memlimit
configuration file that will adjust the limit.
Response buffering
An Nginx proxy sits between your webservice and the user. By default this proxy buffers the response sent from your server. For some use cases, including streaming large quantities of data to the browser, this can be undesirable. Buffering can be disabled on a per-request basis by sending a X-Accel-Buffering: no
header in your response.[1]
Grid Engine and Kubernetes backends
Toolforge provides two different execution environments for web services: Grid Engine and Kubernetes.
Toolforge administrators recommend that you try using Kubernetes first for new tools and only use the Grid Engine backend if there is a technical limitation that prevents the tool from using Kubernetes.
The Kubernetes backend provides more modern software versions and will eventually be the default environment. The main drawback is that Kubernetes Web services can not spawn additional jobs on the job grid.
- Your workflow: you still ssh in, use
become
, and hack on code as usual - Logging access: locations are the same
- Replica DB / Dumps access
Grid Engine
The Grid Engine backend runs your Web service as a grid job on a Debian Stretch grid exec node. This is similar to the way that jsub
runs any grid job you submit, but there is a separate exec queue on the grid for running jobs started by webservice
.
Kubernetes
Kubernetes (k8s) is a platform for running containers that is slowly replacing the Grid Engine in Toolforge. Kubernetes Web services have access to newer versions of most software than the grid engine provides. K8s also provides a more robust system for restarting tools manually or automatically following an application crash.
User visible differences from GridEngine based Web services
- Each process runs inside a Docker container, orchestrated by Kubernetes.
- Provides better resource isolation (one tool can not take down other tools by consuming all RAM or CPU)
- Better health checking (monitoring built into Kubernetes, not a hack we wrote)
- Less complex proxy setup, leading to fewer proxy related outages / issues
- Containers based on Debian Jessie
- Newer software versions than those available with Ubuntu Trusty or Precise
- Better support from Wikimedia TechOps team
- Less NFS surface exposed
/home
is not mounted for web services- No
/shared
- use/data/project/shared
instead. The latter works on both Grid Engine and Kubernetes. The former only on Gridengine. /public/dumps
and/data/scratch
are mounted the same way
- It is not possible to interact with the Grid Engine from Kubernetes (no
jsub
...) - Kubernetes backend has specific
webservice
options:
-m MEMORY, --mem MEMORY
Set higher Kubernetes memory limit
-c CPU, --cpu CPU Set a higher Kubernetes cpu limit
-r REPLICAS, --replicas REPLICAS
Set the number of pod replicas to use
Switching between GridEngine and Kubernetes
You can switch between the backends to make sure your code works on both of them.
From GridEngine to Kubernetes
webservice --backend=gridengine stop webservice --backend=kubernetes <type> start
From Kubernetes to GridEngine
webservice --backend=kubernetes stop webservice --backend=gridengine start
Configuring a default backend
You can choose a default backend for your tool by creating a $HOME/.webservicerc
configuration file.
To set your tool's default to Kubernetes, use this syntax:
[Default]
--backend=kubernetes
To set your tool's default to Grid Engine, use this syntax:
[Default]
--backend=gridengine
Default web server (lighttpd + PHP)
See:Help:Toolforge/Web/Lighttpd
PHP
The lighttpd
webservice type includes support for running PHP scripts from files with a .php
in $HOME/public_html
using a FastCGI helper process.
Use webservice --backend=kubernetes php7.2 start|stop|restart|shell
to run a PHP based webservice on Kubernetes. If you need to, you can also use the legacy php5.6
version. See Kubernetes PHP documentation for more details.
Python (uWSGI)
uWSGI is a Web Server Gateway Interface server for Python2 and Python3 web applications. It is commonly used to run applications built with Flask, Django, or other Python web application frameworks.
webservice --backend=kubernetes python3.7
- Python3.7 with a default uwsgi configuration
webservice --backend=kubernetes python3.5
- Python3.5 with a default uwsgi configuration (deprecated)
webservice --backend=kubernetes python
- Python3.4 with a default uwsgi configuration (deprecated)
webservice --backend=kubernetes python2
- Python2 with a default uwsgi configuration (deprecated)
webservice --backend=gridengine uwsgi-python
- Python2 on Grid Engine with a default uwsgi configuration
webservice --backend=gridengine uwsgi-plain
- Python2 or Python3 on Grid Engine with a user supplied uwsgi configuration
Default uwsgi configuration
The uwsgi-python
, python3.7
, python3.5
, python
, and python2
types share a common uWSGI configuration designed to make it easy to deploy a typical Python webservice. This uses a convention over configuration design with the following expectations:
- Your application will have a wsgi entry point in
$HOME/www/python/src/app.py
in a variable namedapp
(example). - Python libraries will be loaded from a virtualenv located in
$HOME/www/python/venv
. - Custom configuration for uWSGI in ini file form will be loaded from
$HOME/www/python/uwsgi.ini
- Examples of configuration parameters can be found in the uWSGI manual.
- Headers can be added using
route = .* addheader:Access-Control-Allow-Origin: *
- Logs will be written to
$HOME/uwsgi.log
python3.7 (Python3 + Kubernetes)
webservice --backend=kubernetes python3.7 start|stop|restart|shell
See Default uwsgi configuration for general information..
This is running Python3.7 with virtualenv support - you must use a virtualenv for installing your libraries.
Using virtualenv with webservice shell
You need to setup and use a new virtualenv. You can do so with the following:
For new projects
First, get your python code setup so that your app.py file lives under ~/www/python/src. Then...
webservice --backend=kubernetes python3.7 shell
mkdir -p ~/www/python
python3 -m venv ~/www/python/venv
(on a Toolforge bastion, usevirtualenv -p python3 venv
)source ~/www/python/venv/bin/activate
pip install --upgrade pip
(This brings in newest pip, which is required for wheel support)- Install the libraries you need (e.g.
pip install -r ~/www/python/src/requirements.txt
) - exit out of webservice shell
webservice --backend=kubernetes python3.7 start
For Python2 projects, use python2 -m virtualenv
in step 3.
Moving an existing project
If you are already running a Python3 Web service using uwsgi-plain on the job grid:
- Make a backup of your current venv:
mv ~/www/python/venv ~/www/python/venv.gridengine
- Move your uwsgi.ini file away as well:
mv ~/www/python/uwsgi.ini ~/www/python/uwsgi.ini.gridengine
- Follow the instructions #For new projects
- Before doing
webservice --backend=kubernetes python start
, you have to do awebservice --backend=gridengine stop
- To switch back to gridengine, you can do:
mv ~/www/python/venv ~/www/python/venv.k8s
mv ~/www/python/venv.gridengine ~/www/python/venv
mv ~/www/python/uwsgi.ini.gridengine ~/www/python/uwsgi.ini
webservice --backend=kubernetes stop
webservice --backend=gridengine uwsgi-plain start
The fundamental thing to remember is that virtualenvs created straight on the bastion work only with gridengine, and virtualenvs created inside webservice shell work only with kubernetes.
Once you are done migrating and are happy with it, you can delete your venv & uwsgi.ini backups.
Installing numpy / scipy / things with binary dependencies
If your package with binary dependencies has a manylinux1 wheel, you can directly install it with pip quickly and with minimum hassle. You can check if your package has a manylinux1 wheel by:
- Go to https://pypi.python.org/pypi
- Search for your package name in top right
- Find it in the list and click on it
- Look for packages that end in the string:
cp34-cp34m-manylinux1_x86_64.whl
- If it exists, then this package is installable with a binary wheel!
You can install it by:
webservice --backend=kubernetes python shell
source ~/www/python/venv/bin/activate
pip install --upgrade pip
(This brings in newest pip, which is required for wheel support)pip install $packagename
Tada! You only need to do the pip install --upgrade pip
once, after that you can install manylinux1 packages easily.
Note that this only applies if you are using a package with binary dependencies. Most python packages do not have binary dependencies (are pure python) and do not need this!
Python/Python3.5 (Python3 + Kubernetes)
This works mostly like python3.7, but for Python 3.4 respectivly 3.5. These are outdated versions of Python that are no longer supported upstream and should not be used for new tools.
Python2 (Python2 + Kubernetes)
webservice --backend=kubernetes python2 start|stop|restart|shell
See Default uwsgi configuration for general information.
uwsgi-python (Python2 + Grid Engine)
webservice --backend=gridengine uwsgi-python start|stop|restart
See Default uwsgi configuration for general information. Python 3 is not supported by this type, but see the section on uwsgi-plain below for an alternative.
uwsgi-plain (Python3 + Grid Engine)
webservice --backend=gridengine uwsgi-plain start|stop|restart
The uwsgi-plain
type leaves configuration of the uWSGI service up to the tool's $HOME/uwsgi.ini
configuration file. This allows users with unique requirements to tune the uWSGI service to work with their application. One reason to use this is if you must run a Python3 webservice on Grid Engine. A working config for a Python3 Flask app is documented in Phabricator task T104374.
Using a uwsgi app with a default entry point that is not app.py
The default uwsgi configuration for the uwsgi webservice backend expects to find the uwsgi entry point as the variable app
loaded from the $HOME/www/python/src/app.py
module. If your application has another entry point, the easiest thing to do is create a $HOME/www/python/src/app.py
module, import your entry point, and expose it as app
. See Making a Django app work for an example of this pattern.
Making a Django app work
There is an issue that may currently need a workaround for Django: using utf8mb4
character and collation on your tables may cause issues with length of unique indexes, for instance when using python-social-auth or in your own models that have unique indexes. Using utf8
may cause errors when inserting 4-byte UTF-8 characters. See the issue for specific workarounds.
Setting up
By default your app.py
should be in ~/www/python/src/
. And contain:
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "<YOUR-TOOL-NAME>.settings")
app = get_wsgi_application()
To correctly locate the static files configure the place the uwsgi.ini
into ~/www/python/uwsgi.ini
. And add this setting:
[uwsgi] check-static = /data/project/<YOUR-TOOL-NAME>/www/python
and in settings.py
use:
STATIC_URL = '/<YOUR-TOOL-NAME>/static/'
STATIC_ROOT = os.path.join(BASE_DIR, 'static')
Then deploy your static files into ~/www/python/static
Logs
You can find the logs in ~/uwsgi.log on both platforms
node.js web services
webservice --backend=kubernetes node10 start|stop|restart|shell
webservice --backend=gridengine nodejs start|stop|restart
Node.js can run fairly well on Toolforge including with websocket support. Using --backend kubernetes node10
is recommended so your code is executed with an up-to-date version of node (v10.15.2 as of November 2019). The Grid Engine backend provides an older version of node (v8.11.1 as of November 2019).
- Put your node application in
$HOME/www/js
in your tool's home directory. It is a dictionary path hardcoded.[1]
- Make sure your server starts up properly when
npm start
is executed. The default way to do this is to name your main scriptserver.js
- Your server should bind to a port that is passed in as an environment variable (
PORT
). You can access this viaprocess.env.PORT
. Without this your tool will not be found by the Nginx proxy. - Run
webservice --backend=kubernetes node10 start
to start your webserver (orwebservice --backend=kubernetes node10 restart
to restart it after a code change) - Find your container's name by running
kubectl get pods
and use that name to check your container's logskubectl logs -f $MY_CONTAINER_NAME
- PROFIT! :)
This is an example code for a node.js web server running as a tool:
var http = require('http');
var port = parseInt(process.env.PORT, 10) ; // IMPORTANT!! You HAVE to use this environment variable as port!
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(port);
Keeping this in $HOME/www/js/server.js
and doing a webservice --backend kubernetes node10 start
should work; you may first need to create $HOME/www/js/package.json
containing the text
{
"scripts": {
"start": "node server.js"
}
}
Running npm with webservice shell
To use an up-to-date version of node, e.g. for installing dependencies, run:
webservice --backend=kubernetes node10 shell
cd $HOME/www/js
npm install
Troubleshooting
- If you run into errors doing
npm install
, tryLINK=g++ npm install
- If you can't access the
kubectl
executable, could it be that you started a webservice shell and didn'texit
it?
Java
Tomcat
webservice --backend=gridengine tomcat start|stop|restart
Before using Tomcat, you have to setup Tomcat by running setup-tomcat
. This will create a local Tomcat installation at $HOME/public_tomcat/
.
To deploy a Web Application Archive (WAR), move it to $HOME/public_tomcat/webapps/$TOOL.war
where $TOOL
is the name of your tool. Archive extraction, deployment, and configuration is done automatically by Tomcat. A Tomcat restart may be required. The application will be available at tools.wmflabs.org/$TOOL/
.
To test the Tomcat webservice, you can use the Tomcat sample application (available on tomcat.apache.org).
When reading Tomcat tutorials, it is helpful to know that $CATALINA_HOME
under our configuration is the $HOME/public_tomcat
directory created by setup-tomcat
. The default Tomcat classloader will read jar files such as a MySQL JDBC driver jar that are placed in $HOME/public_tomcat/lib
(i.e. $CATALINA_HOME/lib
).
Troubleshooting
If your Java application is more complex, the standard memory settings might not work. You might get errors like There is insufficient memory for the Java Runtime Environment to continue
and Tomcat will simply stop working. See Help:Toolforge/Web § Memory limit for instructions on getting the runtime memory limit increased.
The settings for the JVM can be modified in public_tomcat/bin/setenv.sh
. If the memory setting from JAVA_OPTS
is too low, you'll get the well-known OutOfMemoryError
from Java. In same cases, Tomcat may not stop anymore following an OOM error. Killing the grid engine job using qdel
may be your only solution.
Play and similar JVM-based frameworks
webservice --backend=kubernetes jdk11 start|stop|restart|shell BINARY
Play Framework projects (and other JVM-based projects that have one executable to start the application) can be run on Toolforge. Play Framework uses JDK 8, so we need to use Kubernetes.
In order to work on Toolforge, the following Play configuration changes need to be made:
# Secret key
# ~~~~~
# The secret key is used to secure cryptographics functions.
# If you deploy your application to several instances be sure to use the same key!
# On Toolforge, we will make a startup script that specifies play.crypto.secret
# using a command line option reading from a private file.
play.crypto.secret="changeme"
# Port
# ~~~~~
# On WMF Toolforge, the port used by kubernetes webservice is 8000/TCP
http.port=8000
# HTTP context
# ~~~~~
# Your tool will be available at https://tools.wmflabs.org/$TOOLNAME/.
# Play usually expects to be operating at the root of a domain, so this setting is
# required for routes to work properly.
play.http.context="/$TOOLNAME/"
The application secret can be stored in a private file with 440 permissions.
After building the project, start your webservice using webservice --backend=kubernetes jdk11 start '$EXECUTABLE -Dplay.crypto.secret="$(cat /data/project/$TOOLNAME/app_secret)"'
. For more details, see User:Sn1per/Play on Tool Labs.
Other / generic web servers
You can easily run other web servers that are not directly supported. This can be accomplished using the generic
webservice type on the Grid Engine backend or a runtime specific type on the Kubernetes backend.
webservice --backend=gridengine generic start|stop|restart SCRIPT
webservice --backend=kubernetes golang start|stop|restart|shell SCRIPT
webservice --backend=kubernetes jdk11 start|stop|restart|shell SCRIPT
webservice --backend=kubernetes ruby2 start|stop|restart|shell SCRIPT
To start a webserver that is launched by a script at /data/project/toolname/code/server.bash
, you would launch it with:
$ webservice --backend=gridengine generic start /data/project/toolname/code/server.bash
Your script will be passed an HTTP port to bind to in an environment variable named PORT
. This is the port that the Nginx proxy will forward requests for https://tools.wmflabs.org/YOUR_TOOL to.
Note that your tool will receive URLs that include your tool prefix - e.g. /YOUR_TOOL/index.html instead of /index.html. You may need to adapt your tool configuration to handle this.
HHVM (experimental)
It is possible to run HHVM in proxygen mode as Generic webservice. Bryan Davis provided the following script (with some tweaks):
hhvm-webservice.sh |
---|
The following content has been placed in a collapsed box for improved usability. |
#!/usr/bin/env bash # Run an HHVM webservice # # usage: webservice --backend=gridengine generic start hhvm-webservice.sh set -e TOOLNAME=${USER#tools.} if [[ -z $PORT ]]; then echo "PORT environment variable not set." >&2 echo "usage: webservice --backend=gridengine generic start $0" >&2 exit 1 fi /bin/cat << EOF > ${HOME}/hhvm-webservice.ini ; Do not edit this file directly! Edit hhvm-webservice.sh instead. date.timezone = UTC hhvm.enable_obj_destruct_call = true hhvm.enable_zend_compat = true hhvm.error_handling.call_user_handler_on_fatals = true hhvm.hack.lang.iconv_ignore_correct = true hhvm.jit = true hhvm.log.always_log_unhandled_exceptions = true hhvm.log.native_stack_trace = false hhvm.log.runtime_error_reporting_level = "HPHP_ALL ^ E_NOTICE" hhvm.log.use_syslog = false hhvm.pcre_cache_type = lru hhvm.pid_file = hhvm.repo.central.path = /tmp/hhvm-webservice.${TOOLNAME}/hhvm.hhbc hhvm.server.apc.expire_on_sets = true hhvm.server.apc.expire_on_sets = true hhvm.server.apc.purge_frequency = 4096 hhvm.server.apc.table_type = concurrent hhvm.server.apc.ttl_limit = 172800 hhvm.server.dns_cache.enable = true hhvm.server.dns_cache.ttl = 300 hhvm.server.exit_on_bind_fail = true hhvm.server.port = ${PORT} hhvm.server.source_root = ${HOME}/public_html hhvm.server.stat_cache = true hhvm.server.thread_count = 4 hhvm.server.type = proxygen hhvm.virtual_host[default][rewrite_rules][0][pattern] = "^/${TOOLNAME}(.*)\$" hhvm.virtual_host[default][rewrite_rules][0][qsa] = true hhvm.virtual_host[default][rewrite_rules][0][to] = "\$1" hhvm.log.file=${HOME}/hhvm-webservice.log error_log=${HOME}/hhvm-webservice-error.log hhvm.hack.lang.look_for_typechecker = false ; Tweakable configuration max_execution_time = 60 memory_limit = 128M hhvm.log.use_log_file = true hhvm.log.level = Warning ; Repo Authoritative is disabled by default. Uncomment the following if you want to use it (don't forget to actually generate and deploy it!) ;hhvm.repo.authoritative = true ; Uncomment the following if you decided to use the .hh extension instead of .php ;hhvm.server.default_document = index.hh EOF exec /usr/bin/hhvm -m server -c ${HOME}/hhvm-webservice.ini |
The above content has been placed in a collapsed box for improved usability. |
Copy the contents and save it as $HOME/hhvm-webservice.sh
. Then start the HHVM process using webservice --backend=gridengine generic start $HOME/hhvm-webservice.sh
.
This has been tested and works. However, this is just an experimental implementation and not recommended for production bots, specially since the proxygen mode has some drawbacks:
- Documentation for configuring HHVM's proxygen webserver is lacking upstream. Information can be found, but it requires a lot of digging.
- No obvious support for alias configuration to easily map https://tools.wmflabs.org/my-tool-name/ to the tool's
$HOME/public_html
. This can be worked around usinghhvm.virtual_host[default][rewrite_rules]
settings. - Multiple indexes files (index.php and index.html for example) is not supported yet (
hhvm.server.default_document
can be set only once; if set multiple times, only the last instance is used).
Running Hack-coded files
hhvm.hack.lang.look_for_typechecker
option in the above script has been set to false
in order to run Hack files without the Typechecker not running error. Please don't run hh_client in the Bastion or Grid servers; use your own HHVM installation instead.
Further information
- Feature request to support multiple default documents (the developers don't have plans to implement it for now).
- Proxygen and FastCGI modes at the HHVM documentation.
References
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect, the bridged Telegram group, or the bridged Mattermost channel
- Discuss via email after you subscribed to the cloud@ mailing list