Nova Resource:Deployment-prep/Dumps
Dumps testing in deployment-prep
For information on how to set up instances, see Nova_Resource:Deployment-prep/Dumps/Setup_notes.
What you can test
Currently you can check dumps of very small wikis in beta, or a couple of larger ones if you have the patience. Even these larger ones are tiny compared to the wikis in production. Run times for complete runs of larger wikis may take a couple of hours however.
How to test
- Ssh into the instance, become root, and then
su - dumpsgen
to become the dumpsgen user with its environment. cd /srv/deployment/dumps/dumps/xmldatadumps
to get to the directory with the scripts.- Decide if you want to test one dumps job or all of them, for a given wiki.
- To run all jobs for, e.g., enwikinews, run
python ./worker.py --configfile /etc/dumps/conf... stuff... copy pasta tomorrow
- To run one job for enwikinews, run
....stuff..
- In either case, output will appear on the console as the run progresses. The run of all jobs for enwikinews should not take more than a couple of minutes.
- If you see an exception from a job, you can run just that job and give the --verbose argument before the wikiname, thus:
...stuff...
I recommend enwikinews as a nice wiki that has a small size.
For testing all the dumps capabilities, you'll want to run the stubs and metahistory jobs for a large wiki, e.g. enwiki. These take a few minutes each. You can run those by...
After you have finished testing, please clean up your run by removing the run directory: .... Yes, the next dump run will remove an old run to make sure the dumps don't fill up the disk, but I would prefer that we keep the oldest run around, so it can be used for prefetch testing; if you have a bunch of broken runs left around, eventually all the good runs wil have been removed and no good page content dumps will be available for prefetch for the next test.