Portal:Data Services/Admin/Quarry

From Wikitech
Jump to navigation Jump to search

This page contains administrative documentation for the Quarry service.

Architecture

The service is composed of a web interface and query runners, with a shared database by both to persist state of operations.

Query runners launch SQL queries to Wiki Replicas.

Web interface

A webproxy exists quarry.wmflabs.org which redirects to the web server.

The main service is a nginx server listening on port 80/tcp, which enforces a HTTPS redirection. Then, a uwsgi quarry-specific app is use, which is controlled by the uwsgi-quarry-web.service systemd service file.
The code of this app is hosted at analytics/quarry/web in gerrit (mirrored to https://github.com/wikimedia/analytics-quarry-web), which is git cloned locally to something like /srv/quarry.
Logs produced by the uwsgi proccess can be found in /var/log/syslog with the uwsgi-quarry-web keyword.

Workers

When a query is launched in the web interface by an user, it is submitted via celery to workers/runners.

There is a celery-quarry-worker.service systemd service file which controls the workers.
Again, the code lives in something like /srv/quarry.
Logs produced by the celery worker can be found in /var/log/syslog with the celery-quarry-worker keyword.

Database

The local mariadb database stores state of users and queries for the Quarry service. Is controlled by the mariadb.service systemd service file.

The local redis database stores web sessions and a queue for worker jobs. Is controlled by the redis-instance-tcp_6379.service systemd service file.

Queries

Queries results are stored in NFS at /data/project/quarry/results/. This is something to improve in the future.

In the case of long queries, there is a cronjob in the quarry user of the web node that runs every minute and checks queries running longer than 30 minutes to kill them.
Logs can be found at /var/log/quarry/killer.log.

Deployment

The service is deployed in a Cloud VPS project (called quarry) and is composed of several virtual machines instances.

These VM are usually like this (source of truth is openstack-browser):

quarry-db-01.quarry.eqiad1.wikimedia.cloud
quarry-web-01.quarry.eqiad1.wikimedia.cloud
quarry-worker-01.quarry.eqiad1.wikimedia.cloud
quarry-worker-02.quarry.eqiad1.wikimedia.cloud

The deployment is persisted into operations/puppet.git in several places, specially in:

After the initial git clone by puppet, code deployments are done with simple git fetch as we go:

root@quarry-web-01:/srv/quarry# git fetch && git checkout FETCH_HEAD
root@quarry-worker-01:/srv/quarry# git fetch && git checkout FETCH_HEAD
root@quarry-worker-02:/srv/quarry# git fetch && git checkout FETCH_HEAD

And then restart the affected services:

root@quarry-web-01:~# systemctl restart uwsgi-quarry-web.service 
root@quarry-worker-01:~# systemctl restart celery-quarry-worker.service
root@quarry-worker-02:~# systemctl restart celery-quarry-worker.service

Admin operations

Some well known admin operations of this service.

Get runners current work

In worker nodes, go to /srv/quarry and then:

user@quarry-worker-01:/srv/quarry $ sudo /srv/quarry/venv/bin/celery -A quarry.web.worker inspect active

Stuck query

Sometimes if a query is stuck in running state but not actually running, check the celery logs in workers node to find more info.

Clear running or queued queries

It is possible that queries was running when the database was lock or the runners killed during maintenance window. This will mark all less than 30 minutes ones in running or queued as "killed" state:

UPDATE
query join query_revision on query.latest_rev_id = query_revision.id join query_run on latest_run_id = query_run.id
SET status=3 where (status = 0 or status = 2) and query_run.timestamp <= DATE_ADD(NOW(), INTERVAL -1 HOUR);

See quarry/web/models/queryrun.py for query run numeric status correspondences.

Maintenance mode warning

Planned maintenance windows should be warned to end-users using a maintenance message present in /srv/quarry/config.yaml. It's commanded to add relevant phabricator task as HTML link in the message. Restarting the web service on -web instance is needed after config edit.

Block a user

INSERT INTO user_group (user_id, group_name) VALUES (XXX, "blocked");

See also