Swift/Swift Container Name Conventions

From Wikitech

Most files stored in Swift will be for a FileRepo class. There are things like Math, DPL, EasyTimeline, and such that may directly store files without using FileRepo.

Mapping FileRepo virtual URLS to storage paths

FileRepo sometimes uses "virtual URLs" like "mwrepo://zone/relpath", which map to storage paths. The former are url encoded whereas the later are not. Virtual URLs are FileRepo specific and are used for convenience. "mwrepo://<zone>/<rel path>" maps to "mwstore://<name of backend for repo>/<zone>[/<relative zone root dir>]/<rel path>". Typically the root dir for a zone is just the whole container (but don't assume this for TempFileRepo's).

Examples:

  • "mwrepo://public/f/fe/<somefile>.png" maps to "mwstore://<name of backend for repo>/public/f/fe/<somefile>.png"
  • "mwrepo://thumb/a/ab/<somefile>.jpg/120px-<somefile>.jpg" maps to "mwstore://<name of backend for repo>/thumb/a/ab/<somefile>.jpg/120px-<somefile>.jpg"
  • For a TempFileRepo Y (which derives from a regular FileRepo X), "mwrepo://thumb/d/de/<somefile>.png/80px-<somefile>.png" maps to "mwstore://<name of backend for repo X>/thumb/temp/d/de/<somefile>.png/80px-<somefile>.png"

Mapping FileRepo storage paths to Swift URLs

FileRepo uses FileBackend storage paths (which start with "mwstore://"). They are not URL encoded. These should not be confused with "virtual URLs" like "mwrepo://zone/relpath" (which do map to storage paths however). The mapping depends on sharding, though both schemes are similar.

Our FileRepo storage paths all have sharding hash directories in them. There are two formats:

  • Public files: <base 16 char0>/<base 16 char0><base 16 char1>
  • Deleted files: <base 36 char0>/<base 36 char1>/<base 36 char2>

Non-sharded repos

Typical storage paths are simply like "mwstore://<backend>/<repo zone>/[<relative zone root dir>/]<path relative to zone>". Such paths will map to the container "<proj>-<lang>-<repo>-<zone>" and use relative (to container) path of "[<relative zone root dir>/]<path relative to zone>". Again, note that the zone root dir is usually not used (meaning zones map to containers), but it is used in certain cases. Note that all relative paths include shards (like "d/d1"), but we don't really do anything special with them. Note that deleted files used different sharding that the others.

Examples:

  • mwstore://local-swift/public/a/ab/<some file>.tiff on Serbian Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-sr-local-public/a/ab/<some file>.tiff"
  • mwstore://local-swift/public/archive/a/ab/<some file>.tiff on Serbian Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-sr-local-public/archive/a/ab/<some file>.tiff"
  • mwstore://local-swift/deleted/x/y/a/<some file>.tiff on Serbian Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-sr-local-deleted/x/y/a<some file>.tiff"
  • mwstore://local-swift/thumb/c/c1/<some file>.jpg/120px-<some file>.jpg on Finnish Wikisource maps to "msfe/v1/<AUTH stuff>/wikisource-fi-local-thumb/c/c1/<some file>.jpg/120px-<some file>.jpg"
  • mwstore://local-swift/thumb/archive/e/e2/<some file>.svg/120px-<some file>.svg.png on Finnish Wikisource maps to "msfe/v1/<AUTH stuff>/wikisource-fi-local-thumb/archive/e/e2/<some file>.svg/120px-<some file>.svg.png"
  • mwstore://local-swift/thumb/temp/f/fd/<some file>.jpg/120px-<some file>.jpg on Polish Wiktionary maps to "msfe/v1/<AUTH stuff>/wiktionary-pl-local-thumb/temp/f/fd/<some file>.jpg/120px-<some file>.jpg"

Sharded repos

Typical storage paths are simply like "mwstore://<backend>/<repo zone>/[<relative zone root dir>/]<shard0>/<shard0><shard1>/<rest of path relative to zone>". Such paths will map to the container "<proj>-<lang>-<repo>-<zone>.<shard0><shard1>" and use relative (to container) path of "[<relative zone root dir>/]<shard0>/<shard0><shard1>/<rest of path relative to zone>" (this is just everything after the repo zone name). Again, note that the zone root dir is usually not used (meaning zones map to containers), but it is used in certain cases. Note that all relative paths include shards (like "d/d1"), but we don't really do anything special with them.

Note that deleted files used different sharding that the others. In all cases, we only care about the first two hash directories. The hash directories map to shards in the following manner (two cases):

  • Public files: <base 16 char0>/<base 16 char0><base 16 char1> maps to the shard "<base 16 char0><base 16 char1>".
  • Deleted files: <base 36 char0>/<base 36 char1>/<base 36 char2> maps to the shard "<base 36 char0><base 36 char1>". Note that we ignore the last hash directory/character.

Examples:

  • mwstore://local-swift/public/a/ab/<some file>.tiff on Commons Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-commons-local-public.ab/a/ab/<some file>.jpg"
  • mwstore://local-swift/public/archive/a/ab/<some file>.tiff on Commons Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-commons-local-public.ab/archive/a/ab/<some file>.jpg"
  • mwstore://local-swift/deleted/x/y/a/<some file>.tiff on English Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-en-local-deleted.xy/x/y/a<some file>.tiff"
  • mwstore://local-swift/thumb/c/c1/<some file>.jpg/120px-<some file>.jpg on English Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-en-local-thumb.c1/c/c1/<some file>.jpg"
  • mwstore://local-swift/thumb/archive/c/c1/<some file>.jpg/120px-<some file>.jpg on English Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-en-local-thumb.c1/archive/c/c1/<some file>.jpg"
  • mwstore://local-swift/thumb/temp/f/fd/<some file>.jpg/120px-<some file>.jpg on Commons Wikipedia maps to "msfe/v1/<AUTH stuff>/wikipedia-commons-local-thumb.fd/temp/f/fd/<some file>.jpg/120px-<some file>.jpg"

Mapping FileRepo storage paths to old NFS paths

For the old NFS storage, we use FSFileBackend and manually force containers to have certain paths (rather than all having the same parent directory) for backwards compatibility. Typical paths like "mwstore://<backend>/<repo zone>/[<relative zone root dir>/]<path relative to zone>" will map to "<root directory of zone container>/[<relative zone root dir>/]<path relative to zone>". The container roots are derived from the project, language, whether the wiki is private, and the zone in one case (the 'deleted' zone).

NOTICE: For the <lang> portion of NFS paths, a few wikis use exceptional language names that are not equal to the language ($lang global):

  • 'wikimania2005wiki' should use 'wikimania' as <lang> (not wikimania2005)
  • 'otrs_wikiwiki' should use 'otrs_wiki' as <lang> (not otrs-wiki)
  • 'execwiki' should use 'execwiki' as <lang> (not exec)

Zone to Container root mapping for PRIVATE wikis:

  • Private zones (the "deleted" zone):
    • "/mnt/upload6/private/archive/<proj>/<lang>" (full path to the deleted zone; does not have zone in name)
  • "Public" zones (the "public", "thumb", "temp" zones...):
    • public zone: "/mnt/upload6/private/<lang>" (no zone in name)
    • thumb zone: "/mnt/thumbs/private/<lang>/thumb"
    • other zones zone: "/mnt/upload6/private/<lang>/<zone>"

Zone to Container root mapping for PUBLIC wikis:

  • Private zones (the "deleted" zone):
    • "/mnt/upload6/private/archive/<proj>/<lang>" (full path to the deleted zone; does not have zone in name)
  • "Public" zones (the "public", "thumb", "temp" zones...):
    • public zone: "/mnt/upload6/<site>/<lang>" (no zone in name)
    • thumb zone: "/mnt/thumbs/<site>/<lang>/thumb"
    • other zones zone: "/mnt/upload6/<site>/<lang>/<zone>"

Examples:

  • mwstore://local-swift/public/a/ab/<some file>.tiff on Commons Wikipedia maps to "/mnt/upload6/wikipedia/commons/a/ab/<some file>.jpg"
  • mwstore://local-swift/public/archive/a/ab/<some file>.tiff on Commons Wikipedia maps to "/mnt/upload6/wikipedia/commons/a/ab/archive/<some file>.jpg"
  • mwstore://local-swift/deleted/x/y/a/<some file>.tiff on English Wikipedia maps to "/mnt/upload6/private/archive/wikipedia/en/x/y/a<some file>.tiff"
  • mwstore://local-swift/thumb/c/c1/<some file>.jpg/120px-<some file>.jpg on English Wikipedia maps to "/mnt/thumbs/wikipedia/en/thumb/c/c1/<some file>.jpg"
  • mwstore://local-swift/thumb/archive/c/c1/<some file>.jpg/120px-<some file>.jpg on English Wikipedia maps to "/mnt/thumbs/wikipedia/en/thumb/archive/c/c1/<some file>.jpg"
  • mwstore://local-swift/thumb/temp/f/fd/<some file>.jpg/120px-<some file>.jpg on Commons Wikipedia maps to "/mnt/thumbs/wikipedia/commons/thumb/temp/f/fd/<some file>.jpg/120px-<some file>.jpg"

Relevant WMF config code:

'wgUploadDirectory' => array(
    # Using upload5 since Feb 2009
    # Using upload6 since Jan 2010
     'default'      => '/mnt/upload6/$site/$lang',
     'private' => '/mnt/upload6/private/$lang',

     'wikimania2005wiki' => '/mnt/upload6/wikipedia/wikimania', // back compat
     'otrs_wikiwiki' => '/mnt/upload6/private/otrs_wiki',
     'execwiki' => '/mnt/upload6/private/execwiki',
),

$wgLocalFileRepo = array(
    'class' => 'LocalRepo',
    'name' => 'local',
    'directory' => $wgUploadDirectory,
    'url' => $wgUploadBaseUrl ? $wgUploadBaseUrl . $wgUploadPath : $wgUploadPath,
    'scriptDirUrl' => $wgScriptPath,
    'hashLevels' => 2,
    'thumbScriptUrl' => $wgThumbnailScriptPath,
    'transformVia404' => true,
    'initialCapital' => $wgCapitalLinks,
    'deletedDir' => "/mnt/upload6/private/archive/$site/$lang",
    'deletedHashLevels' => 3,
    'thumbDir' => str_replace( '/mnt/upload6', '/mnt/thumbs', "$wgUploadDirectory/thumb" ),
);

Mapping upload URLs to swift URLs

In order to get the correct sharded container, and to handle the fact that the 'public' zone didn't have 'public' in the URL, a few translations are needed beyond just taking the URI path and appending that to a base path (with AUTH).

Examples (from rewrite.py):

        # (a) http://upload.wikimedia.org/<proj>/<lang>/.*
        #         => http://msfe/v1/AUTH_<hash>/<proj>-<lang>-local-public/.*
        # (b) http://upload.wikimedia.org/<proj>/<lang>/archive/.*
        #         => http://msfe/v1/AUTH_<hash>/<proj>-<lang>-local-public/archive/.*
        # (c) http://upload.wikimedia.org/<proj>/<lang>/thumb/.*
        #         => http://msfe/v1/AUTH_<hash>/<proj>-<lang>-local-thumb/.*
        # (d) http://upload.wikimedia.org/<proj>/<lang>/thumb/archive/.*
        #         => http://msfe/v1/AUTH_<hash>/<proj>-<lang>-local-thumb/archive/.*
        # (e) http://upload.wikimedia.org/<proj>/<lang>/thumb/temp/.*
        #         => http://msfe/v1/AUTH_<hash>/<proj>-<lang>-local-thumb/temp/.*
        # (f) http://upload.wikimedia.org/<proj>/<lang>/temp/.*
        #         => http://msfe/v1/AUTH_<hash>/<proj>-<lang>-local-temp/.*