Dumps/CategoriesRDF

From Wikitech

This doc is about the dumps of categories in rdf format. For information about the xml/sql dumps including the category-related tables, please see Dumps/XML-SQL Dumps.

Issues with these dumps should be reported in Phabricator under the Dumps-generation project, as well as the Wikidata-Query-Service project. WDQS ingests these dumps, which is why they are produced.

These dumps are run out of cron.

  • Weekly runs: generate full lists of categories on each public Wikimedia project in rdf format
  • Daily runs: generate sparql-format lists of queries to run which move, delete and insert categories that have changed since the previous day

These dumps run on database servers designated 'vslow, dumps', on a snapshot host dedicated to 'misc' dump generation (everything other than the xml/sql dumps).

The dump scripts are in our git puppet repo.

The daily runs take about 15 minutes to complete, as of early 2019. The weekly runs take about 2.5 hours to complete.