History of job queue runners at WMF
This page documents the operational history of job queue runners at WMF. A jobrunner is a service that continuously processes items from MediaWiki's job queue.
jobs-loop.sh
Used from 2009 until 2015.
Initial stack:
- Management: Init service (debian wikimedia-job-runner)
- Service: jobs-loop.sh (SVN mediawiki-core@wmf-deployment)
- Orchestration: nextJobDB.php (mediawiki-core@1.10)
- Runner: runJobs.php (mediawiki-core@1.10)
- Queue store: JobQueueDB
- Backend: mc hosts.
The nextJobDB.php script used ad-hoc Memcached logic to orchestrate and aggregate state across the wiki farm.
In 2011, the jobs-loop.sh script was moved from the wmf-deployment SVN branch of mediawiki-core, to the WikimediaMaintenance extension where it remained for a while.
In 2012, the "wikimedia-job-runner" debian package and jobs-loop.sh script were both folded into Puppet.
In 2013, Redis support was developed and transitioned to. This involved creating JobQueueRedis, formalising of the aggregation logic as JobQueueAggregator with a Memc and Redis implementations, and development of the JobQueueFederated concept. These were deployed later that year.
Eventual stack:
- Management: Init service and cron restarts (puppet)
- Service: jobs-loop.sh (puppet)
- Orchestration: nextJobDB.php (mediawiki-core@1.23)
- Runner: runJobs.php (mediawiki-core@1.23)
- Queue store: JobQueueFederated + JobQueueAggregatorRedis + JobQueueRedis (wmf-config)
- Backend: rdb10xx hosts.
jobrunner & jobchron
From 2015 to 2017.
Overview
mediawiki/services/jobrunner.git
The job queue provides a means of deferring work that is too expensive to perform in the context of a web request. On our production environment, it does this by using Redis to enable shared access to a queue. Web MediaWiki instances enqueue operations for asynchronous execution, and a special class of app servers called job runners dequeue and execute them. The master process on the job runners is a service implemented in PHP called jobrunner.
The dispatcher
configuration option specifies how a batch of jobs will be run. By default this uses the runJobs.php maintenance script. For Wikimedia specifically, the dispatcher is configured to instead make an HTTP request to an RPC endpoint on the localhost (docroot:/rpc/RunJobs.php). This allows it to optimally use HHVM (command-line invocation would have a higher startup time and no persistent compilation cache)
Configuration
When the jobrunner service starts, it reads configuration values from /etc/jobrunner/jobrunner.conf
. This file is generated by Puppet from puppet:/modules/mediawiki/manifests/jobrunner.pp and puppet:/modules/mediawiki/templates/jobrunner/jobrunner.conf.erb. For an overview of configuration options, see the jobrunner.sample.json file.
Deployment
Jobrunner is deployed using Scap3 (as of T129148). The deployment directory on the deployment host is: /srv/deployment/jobrunner/jobrunner
.
- Browse to the deployment directory and get the local repo in the state you want to deploy (e.g.
git pull
). - Once ready, first run
scap deploy-log
in one terminal (or screen) to start watching the logs. - In another terminal (or screen), run
scap deploy -v "log message here"
to start the deployment. - Follow the instructions as deployment reaches each group of servers. Scap will automatically restart services on active jobrunner servers (e.g. those in the primary DC).
Logging and metrics
The jobrunner service logs to /var/log/mediawiki/jobrunner.log
.
Metrics are reported to Graphite via statsd.
SRE runbook
For the old version of Job queue operational tasks and debugging see Revision 1863966 of Job_queue.
EventBus
The current system as of 2017. See Kafka Job Queue.
See also
- JobQueue manual on mediawiki.org, aimed at MediaWiki core devs and extension authors.