Dumps/WikidataDumps

From Wikitech

This page is about the Wikidata entity dumps. For information about the xml/sql dumps, please see Dumps/XML-SQL Dumps.

These dumps currently run once a week via cron.

The dump scripts attempt to retry failures up to a maximum number of retries. After that maximum, failures are reported by email to ops-dumps@wikimedia.org.

Issues with wikidata entity dumps should be filed under the Dumps-generation project in Phabricator, as well as Wikidata.

Because wikidata grows so quickly, over a period of a few months these dumps can get slow to finish. It's a bad idea to have one week's jobs still going when the next week's starts. Several jobs are run in parallel; this number could theoretically be increased, but the dbas should approve first, since this means more concurrent requests to the database.

These dumps run on database servers designated 'vslow, dumps', on a snapshot host dedicated to 'misc' dump generation (everything other than the xml/sql dumps).

The dump scripts are in our git puppet repo.