Help:Shared storage

From Wikitech
Jump to: navigation, search

In the Wikimedia Labs cluster, there are a few shared storage directories that can be made available on request. You can request them by filing a task on Phabricator under the Labs project.

Disadvantages of using NFS

Any form of shared storage (including NFS!) is a big Single Point of Failure. If NFS fails / has problems (and we have a bad history of those, sadly), then your instance is unreachable and often times unusable (load skyrockets often). Do consider strongly if you really do need NFS and if it can not be solved in other ways before requesting NFS to be turned on.

/data/scratch

This is a 'temp' space that is shared across all instances in all projects that have opted into this. Any data you put into them can be read by all other instances that have a /data/scratch, but they can not delete your data by default. This data is not backed up.

Use this for:

  1. Sharing public large data between instances
  2. 'Temporary' storage / usage that can be purged later

Do not use these for:

  1. Information that should be kept private to your project (credentials, keys, etc)
  2. Information that should be backed up and kept safe (code, data backups, etc)
  3. Files that are actively read from or written to (e.g. databases, logfiles)

/data/project

This is per-project private space that is shared across all instances in the project only (and not across all instances across all projects as with /data/scratch). Any data you put in them is visible to all other instances in your project only. This data is backed up.

Use this for:

  1. Backing up important data (database dumps, etc)

Do not use this for:

  1. Storing code / config that is directly run (please store code in git and run them off local storage)
  2. Storing databases / data that is directly manipulated (do not put postgres / mysql / mongo / etc data directories on NFS, and do not do lots of sqlite operations on NFS either)
  3. Logfiles
  4. Temporary storage of large amount of data (use /data/scratch for that instead)

/home

This is per-project private space shared across all instances in your project only and mounted in /home. This allows you to keep a shared homedirectory across instances, to keep a useful scripts, etc in. Note that enabling this will very strongly couple availability of your instance to NFS - you can not ssh in when NFS is down. This data is also backed up.

Use this for:

  1. Storing small scripts / .rc files across instances

Do not use this for:

  1. All the things we ask you to *not* use /data/project for

Note that progress is being made in building a simple system to share .rc / convenience scripts that does not involve NFS. You can track that on Task T102173.

/public/dumps

This is a global, read-only share that contains data dumps (such as those found on dumps.wikimedia.org) that can be read for research purposes. These include compressed XML dumps of Wikimedia wikis (Wikipedia, et al.), raw page counts data, Wikidata JSON dumps, and more! You can request this if you are going to be doing significant work with any form of dumps, since this will be easier to access than having to download and process them yourself.

/data/project/shared/mediawiki

On Tool Labs, you can access a full checkout of all MediaWiki repositories hosted on gerrit.

This is especially useful to search code across all repositories with commands like ack-grep.

The checkout should also include the code review notes, from which you can e.g. extract code review statistics.