We want mirrors! For more information see Dumps/Mirror status.
For a list of various information sources about the dumps, see Dumps/Other information sources.
- For documentation on the "adds/changes" dumps, see Dumps/Adds-changes dumps.
- For downloading older media dumps, go to archive.org (see Dumps/Archive.org for details).
- For current dumps issues, see the Dumps-generation project in Phabricator.
- See Dumps/Known issues and wish list for a much older wishlist.
- For current redesign plans and discussion, see Dumps/Dumps 2.0: Redesign.
- For historical information about the dumps, see Dumps/History.
- For info on HTML dumps, see dumpHTML.
The following info is for folks who hack on, maintain and administer the dumps and the dump servers.
Rather than bore you with that here, see Dumps/Current Architecture.
Adding a new snapshot host
Install and add to site.pp in the snapshot stanza (see snapshot1005-7). Add the relevant hiera entries, documented in site.pp, according to whether the server will run en wiki dumps (only one server should do so), or misc cron jobs (one host should do so, not the same host running en wiki dumps).
Dumps run out of /srv/deployment/dumps/dumps/xmldumps-backup on each server. Deployment is done via scap3 from the deployment server.
Starting dump runs
- Do nothing. These jobs run out of cron.
The dumps code is all in the repo /operations/dumps.git, branch 'master'. Various supporting scripts that are not part of the dumps proper, are in puppet; you can find those in the snapshot module.
Getting a copy as a committer:
git clone ssh://<user>@gerrit.wikimedia.org:29418/operations/dumps.git
git checkout master
ssh to the deployment host.
- cd /srv/deployment/dumps/dumps
- git pull
- scap deploy
Note: you likely need to be in the ops ldap group to do the scap. Also note that changes pushed will not take place until the next dump run; any current run uses the existing dump code to complete.
Fixing configuration files
Configuration file setup is handled in the snapshot puppet module. You can check the config files themselves at /etc/dumps/confs on any snapshot host.
Out of space
If the hosts serving the dumps run low on disk space, you can reduce the number of backups that are kept. Change the value for 'keep' in the configuration files in puppet to a lower number.
The dumps can break in a few interesting ways.
- They no longer appear to be running. Is the monitor running? See below. If it is running, perhaps all the workers are stuck on a stage waiting for a previous stage that failed.
- Shoot them all and let the cron job sort it out. You can also look at the error notifications section and see if anything turns up; fix the underlying problem and wait for cron.
- A dump for a particular wiki has been aborted. This may be due to me shooting the script because it was behaving badly, or because a host was powercycled in the middle of a run.
- The next cron job should fix this up.
- A dump on a particular wiki has failed.
- Check the information on error notifications, track down the underlying issue (db outage? MW deploy of bad code? Other?), fix it, and wait for cron to rerun it.
Email is ordinarily sent if a dump does not complete successfully, going to email@example.com which is an alias. If you want to follow and fix failures, add yourself to that alias.
Logs are kept of each run. From any snapshot host, you can find the logs in the directory (
/mnt/data/xmldatadumps/private/<wikiname>/<date>/dumplog.txt). From these you may glean more reasons for the failure.
Logs that capture the rest are available in /var/log/dumps/ and may also contain clues.
When one or more steps of a dump fail, the index.html file for that dump includes a notation of the failure and sometimes more information about it. Note that one step of a dump failing does not prevent other steps from running unless they depend on the data from that failed step as input.
Monitoring is broken
If the monitor does not appear to be running (the index.html file showing the dumps status is never updated), check which host should have it running (see the hiera host entries for the snapshots and look for the one with monitor: true). This is a service that should be restarted with systemd or upstart, depending on the os version, so you'll want to see what change broke it.
You really really don't want to do this. These jobs run out of cron. All by themselves. Trust me. Once the underlying problem (bad MW code, unhappy db server, out of space, etc) is fixed, it will get taken care of.
Okay, you don't trust me, or something's really broken. See Dumps/Rerunning a job if you absolutely have to rerun a wiki/job.