History of job queue runners at WMF

This page documents the operational history of job queue runners at WMF. A jobrunner is a service that continuously processes items from MediaWiki's job queue.

jobs-loop.sh

Used from 2009 until 2015.

Initial stack:

Management: Init service (debian wikimedia-job-runner)
Service: jobs-loop.sh (SVN mediawiki-core@wmf-deployment)
Orchestration: nextJobDB.php (mediawiki-core@1.10)
Runner: runJobs.php (mediawiki-core@1.10)
Queue store: JobQueueDB
Backend: mc hosts.

The nextJobDB.php script used ad-hoc Memcached logic to orchestrate and aggregate state across the wiki farm.

In 2011, the jobs-loop.sh script was moved from the wmf-deployment SVN branch of mediawiki-core, to the WikimediaMaintenance extension where it remained for a while.

In 2012, the "wikimedia-job-runner" debian package and jobs-loop.sh script were both folded into Puppet.

In 2013, Redis support was developed and transitioned to. This involved creating JobQueueRedis, formalising of the aggregation logic as JobQueueAggregator with a Memc and Redis implementations, and development of the JobQueueFederated concept. These were deployed later that year.

Eventual stack:

Management: Init service and cron restarts (puppet)
Service: jobs-loop.sh (puppet)
Orchestration: nextJobDB.php (mediawiki-core @1.23)
Runner: runJobs.php (mediawiki-core@1.23)
Queue store: JobQueueFederated + JobQueueAggregatorRedis + JobQueueRedis (wmf-config)
Backend: rdb10xx hosts.

jobrunner & jobchron

From 2015 to 2017.

Overview

Source code: mediawiki/services/jobrunner.git

The job queue provides a means of deferring work that is too expensive to perform in the context of a web request. On our production environment, it does this by using Redis to enable shared access to a queue. Web MediaWiki instances enqueue operations for asynchronous execution, and a special class of app servers called job runners dequeue and execute them. The master process on the job runners is a service implemented in PHP called jobrunner.

The dispatcher configuration option specifies how a batch of jobs will be run. By default this uses the runJobs.php maintenance script. For Wikimedia specifically, the dispatcher is configured to instead make an HTTP request to an RPC endpoint on the localhost (docroot:/rpc/RunJobs.php). This allows it to optimally use HHVM (command-line invocation would have a higher startup time and no persistent compilation cache)

Configuration

When the jobrunner service starts, it reads configuration values from /etc/jobrunner/jobrunner.conf. This file is generated by Puppet from puppet:/modules/mediawiki/manifests/jobrunner.pp and puppet:/modules/mediawiki/templates/jobrunner/jobrunner.conf.erb. For an overview of configuration options, see the jobrunner.sample.json file.

Deployment

Jobrunner is deployed using Scap3 (as of T129148). The deployment directory on the deployment host is: /srv/deployment/jobrunner/jobrunner.

Browse to the deployment directory and get the local repo in the state you want to deploy (e.g. git pull).
Once ready, first run scap deploy-log in one terminal (or screen) to start watching the logs.
In another terminal (or screen), run scap deploy -v "log message here" to start the deployment.
Follow the instructions as deployment reaches each group of servers. Scap will automatically restart services on active jobrunner servers (e.g. those in the primary DC).

Logging and metrics

The jobrunner service logs to /var/log/mediawiki/jobrunner.log.

Metrics are reported to Graphite via statsd.

SRE runbook

For the old version of Job queue operational tasks and debugging see Revision 1863966 of Job_queue.

EventBus

The current system as of 2017. See MediaWiki JobQueue.