Wikidata query service

From Wikitech
Jump to: navigation, search

Wikidata Query Service is the Wikimedia implementation of SPARQL server, based on Blazegraph engine, to service queries for Wikidata and other data sets. Please see more detailed description in the User Manual.

Hardware

We're currently running on three servers in eqiad: wdqs1003, wdqs1004, wdqs1005 and three servers in codfw: wdqs2001, wdqs2002 and wdqs2003. Those two clusters are in active/active mode (traffic is sent to both), but due to how we route traffic with GeoDNS, the eqiad cluster sees most of the traffic.

Server specs are similar to the following:

  • CPU: dual Intel(R) Xeon(R) CPU E5-2620 v3
  • Disk: 800GB raw raided space SSD
  • RAM: 128GB

Monitoring

Icinga group

Grafana dashboard: https://grafana.wikimedia.org/dashboard/db/wikidata-query-service

WDQS dashboard: http://discovery.wmflabs.org/wdqs/

Data reload procedure

  1. Go to icinga and schedule downtime: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=wdqs2002
  2. Depool: HOME=/root sudo depool
  3. Remove data loaded flag: rm /srv/wdqs/data_loaded
  4. Stop the updater: sudo service wdqs-updater stop
  5. Turn on maintenance: touch /var/lib/nginx/wdqs/maintenance
  6. Stop Blazegraph: sudo service wdqs-blazegraph stop
  7. Prepare data for loading (can be done in advance at any time)
  8. Remove old db: rm /srv/wdqs/wikidata.jnl
  9. Start blazegraph: sudo service wdqs-blazegraph start, check that /srv/wdqs/wikidata.jnl is created.
  10. Check logs: sudo journalctl -u wdqs-blazegraph -f
  11. Load data: bash loadData.sh -n wdq -d /srv/wdqs/dump/data
  12. Restore data loaded flag: touch /srv/wdqs/data_loaded
  13. Start updater: sudo service wdqs-updater start
  14. Check logs: sudo journalctl -u wdqs-updater -f
  15. Reload categories: /usr/local/bin/reloadCategories.sh or if needed to be done manually: bash createNamespace.sh categories; bash forAllCategoryWikis.sh loadCategoryDump.sh categories
  16. Wait till the updater catches up
  17. Turn off maintenance: rm /var/lib/nginx/wdqs/maintenance
  18. Repool: HOME=/root sudo pool

Contacts

If you need more info, talk to User:Smalyshev, User:Gehel or anybody from mw:Discovery team.

Usage