Nova Resource:Dumps

From Wikitech
Jump to navigation Jump to search
Project Name dumps
Details,
admins/members
openstack-browser
Monitoring

Dumps

Description

This is a project that archives the public datasets generated by Wikimedia.

Purpose

Archive the public Wikimedia datasets.

Anticipated time span

indefinite

Project status

currently running

Contact address

https://groups.google.com/forum/#!forum/wikiteam-discuss

Willing to take contributors or not

not willing

Subject area narrow or broad

broad


Project information

Introduction

This project was created to provide a dedicated space just for transferring Wikimedia dump files to the Internet Archive. These dumps were created as a possible backup in the case of cluster-wide hardware failure, and its also often used by researchers/bots. Sometimes, these files are generated for forking of any Wikimedia project, when lots of people of a project has different aims from the original Wikimedia goal.

More information about the archiving process is available at Nova Resource:Dumps/Archive.org

Data currently being archived

Here are some information and links regarding the data that this project is archiving:

  • Wikimedia main database dumps
  • Wikimedia incremental dumps
  • Wikidata JSON dumps
  • Wikimania videos
  • OpenStreetMap datasets

Servers

  • dumps-N (where N is an integer): Main archiving servers
  • dumps-stats: Wikimedia data manipulation, including dumps above and other stuff of relevance for Wikimedia research.

Storage:

  • Before the eqiad migration we used to have a 900 GB quota (hardly sufficient for comfortable work).
  • Currently all heavy operations are conducted on /data/scratch/. We currently keep to a soft limit of using only 3 TB of space, but such disk usage is always temporary and will be deleted once the data is pushed to the Archive.
  • Everything is retained locally only for very short periods, just the time needed for packing on archive.org.

Links

Our resources:

Wikimedia data:

OSM data:

Edit documentation

Server admin log

2021-03-10

  • 12:43 arturo: briefly stopping VM dumps-5 and dumps-4 to migrate hypervisor

2020-06-24

  • 18:34 bstorm: removing files from /data/project/dumps/temp/wikidata and /data/project/dumps/temp/cirrussearch T255628

2020-02-26

  • 19:59 jeh: restart dumps-0

2019-06-20

  • 14:04 andrewbogott: moving dumps-5 to a new cloudvirt
  • 13:58 andrewbogott: moving dumps-4 to a new cloudvirt

2019-04-18

  • 20:13 andrewbogott: rebooting dumps-1 to try to workaround nfs issues

2019-03-14

  • 23:18 bd808: Deleted dumps-stats and bugzilla ([[ph... (more)