Nova Resource:Dumps
Project Name | dumps |
---|---|
Details, admins/members |
openstack-browser |
Monitoring |
Dumps
Description
This is a project that archives the public datasets generated by Wikimedia.
Purpose
Archive the public Wikimedia datasets.
Anticipated time span
indefinite
Project status
currently running
Contact address
https://groups.google.com/forum/#!forum/wikiteam-discuss
Willing to take contributors or not
not willing
Subject area narrow or broad
broad
Project information
Introduction
This project was created to provide a dedicated space just for transferring Wikimedia dump files to the Internet Archive. These dumps were created as a possible backup in the case of cluster-wide hardware failure, and its also often used by researchers/bots. Sometimes, these files are generated for forking of any Wikimedia project, when lots of people of a project has different aims from the original Wikimedia goal.
More information about the archiving process is available at Nova Resource:Dumps/Archive.org
Data currently being archived
Here are some information and links regarding the data that this project is archiving:
- Wikimedia main database dumps
- Wikimedia incremental dumps
- Wikidata JSON dumps
- Wikimania videos
- OpenStreetMap datasets
Servers
- dumps-N (where N is an integer): Main archiving servers
- dumps-stats: Wikimedia data manipulation, including dumps above and other stuff of relevance for Wikimedia research.
Storage:
- Before the eqiad migration we used to have a 900 GB quota (hardly sufficient for comfortable work).
- Currently all heavy operations are conducted on /data/scratch/. We currently keep to a soft limit of using only 3 TB of space, but such disk usage is always temporary and will be deleted once the data is pushed to the Archive.
- Everything is retained locally only for very short periods, just the time needed for packing on archive.org.
Links
Our resources:
- Main WikiTeam repository
- Balchivist repository (main archiving infrastructure)
Wikimedia data:
OSM data:
Server admin log
2022-02-12
- 19:38 andrewbogott: rebooting most VMs to pick up new nfs server changes (T301280)
2021-03-10
- 12:43 arturo: briefly stopping VM dumps-5 and dumps-4 to migrate hypervisor
2020-06-24
- 18:34 bstorm: removing files from /data/project/dumps/temp/wikidata and /data/project/dumps/temp/cirrussearch T255628
2020-02-26
- 19:59 jeh: restart dumps-0
2019-06-20
- 14:04 andrewbogott: moving dumps-5 to a new cloudvirt
- 13:58 andrewbogott: moving dumps-4 to a new cloudvirt
2019-04-18
- 20:13 andrewbogott: r... (more)