upload.wikimedia.org

From Wikitech

upload.wikimedia.org is the hostname used for access to original and scaled-down media files for the Wikimedia projects.

The domain name is separated from wikipedia.org to aid in browser security (around cross-origin, cookies, and JavaScript especially), as well as web performance (through the use of parallel connections).

MediaWiki

When uploading new files from MediaWiki, storage is handled by MediaWiki's FileBackend component (see wmf-config/filebackend). Each wiki has its own Swift bucket, and these are exposed through an Nginx proxy at upload.wikimedia.org. For example, files from English Wikipedia (en.wikipedia.org) are served from upload.wikimedia.org/wikipedia/en/*.

Users mainly upload files through the UploadWizard feature on commons.wikimedia.org. For read-access, the Special:Filepath shortcut redirects to the appropiate path on upload.wikimedia.org.

When articles are edited and saved by MediaWiki, the creation of newly referenced thumbnails or thumbnail sizes is deferred. In production, we turn off $wgGenerateThumbnailOnParse. Instead, it generates a deterministic URL only. We then use the equivalent a 404 handler to have thumbnail servers generate these on-the-fly. On smaller wikis and during local development this can be done with MediaWiki's thumb_handler.php. In production, we route cache misses from the CDN for upload.wikimedia.org to Swift through a rewrite proxy that does the same thing. In case of thumbnail not (yet) existing, it forwards the requests to Thumbor which then responds with the new thumbnail for the CDN cache and the currently-requesting client, and (also) stores it in Swift for future use after it falls out of the CDN cache.

On private wikis, media files are not served from "upload.wikimedia.org", but instead served through the img_auth.php MediaWiki entry point from the private wiki's own domain name (which enforces authentication and access control).

Thumbor

Media storage

Media storage components.

History

2008

upload.wikimedia.org has its own set of Squid proxy caches separate from the "text" squids (for HTML/CSS responses). This avoids contention between the two data sets, which have different characteristics for object size, update rate, etc.

2009

IPv6 support was partially added. AAAA records are currently sent to a list of participating resolvers only. This works using the PowerDNS pipebackend, running selective_answer.py. Currently this is Europe (knams) region only, due to lack of sufficient IPv6 connectivity elsewhere! The IPv6 address is served by an IPv6-to-IPv4 proxy (ha-proxy), which simply proxies the HTTP request to the IPv4 LVS cluster, including an X-Forwarded-For header. Unfortunately it does not support doing HTTP processing along with persistent connections.

External link