Jump to content

CDN

From Wikitech

The Wikimedia CDN handles traffic routing and HTTP caching for all Wikimedia projects. It is maintained by the SRE Traffic team. This page documents what our CDN exposes for downstream services to predictably consume.

The main components of the CDN, (collectively known as cache-proxy, or cp) servers are:

HAProxy
TLS termination, HTTP/2 termination, and rate limiting.
Varnish
Front-end caching.
Apache Traffic Server
Back-end caching.

The frontend layer is effectively equally distributed and is responsible for traffic capacity. Each server at this layer is effectively identical and is statistically very likely to hold a copy of the same HTTP responses in-memory.

The backend layer is distributed by request hash (e.g. the URL and other metadata), and it is responsible for content capacity. Each server is assigned a subset of URLs, and thus together is able to hold a diverse and long-tail of HTTP responses.

As a single server cannot handle all end-user traffic for a single peak-popularity page, the frontend layer serves an important role ahead of the backend. The frontend is responsible for absorbing and coalescing concurrent requests for the same URL when it is absent from the cache, so that it only places minimal demand on the (one) backend assigned for that URL.

Refer to #History for earlier iterations of our caching software.

Headers

X-Analytics
This header is used for measurement purposes and its behavior is documented in X-Analytics.
X-Analytics-TLS
A multi-value header that lists various properties of the request. These properties always include the following key=value properties delimited by semi-colons (;):
  • vers: Returns the name of the used protocol when the incoming connection was made over an TLS transport layer.
  • keyx: Elliptic curve used.
  • auth: Authentication algorithm used.
  • ciph: The name of the used cipher.
  • prot: The HTTP protocol version used.
  • sess: Whether the request is part of a new TLS session or an existing one.
Example: X-Analytics-TLS: vers=TLSv1.3;keyx=X25519;auth=ECDSA;ciph=CHACHA20-POLY1305-SHA256;prot=h2;sess=new
X-Cache
A comma-separated list of cache hostnames with information such as hit/miss status for each entry. This header is read right-to-left: The rightmost is the outermost cache and further entries to the left progress deeper towards the application layer. The rightmost cache is the in-memory cache while all others are disk caches. In case of cache hit, the number of times the object has been returned is also specified. Once "hit" is encountered while reading right to left, everything to the left of "hit" is part of the cached object that got hit. It's whether the entries to the left missed, passed, or hit when that object was first pulled into the hitting cache.
Possible values are:
  • hit: a cache hit in cache storage. There was no need to query a deeper cache server (or the applayer, if already at the last cache server). Hits could need reaching an inner layer if content is stale and must-revalidate is set. In this scenario the cache server sends a conditional request to an inner layer and if a 304 Not Modified is obtained the response is sent from the cache.
  • int: locally-generated response from the cache. For example, a 301 redirect. The cache did not use a cache object and it didn't need to contact another server. Backend errors will trigger an int response as well. let's consider a backend responding with a 429 without a response body, the cache will internally generate an error response after contacting the applayer.
  • miss: the object might be cacheable, but we don't have it.
  • pass: the object was uncacheable, talk to a deeper level.
Some subtleties on "pass": different caches (eg: in-memory vs. on-disk) might disagree on whether the object is cacheable or not. A pass on the in-memory cache (for example, because the object is too big) could be a hit for an on-disk cache. Also, it's sometimes not clear that an object is uncacheable till the moment we fetch it. In that case, we cache for a short while the fact that the object is uncachable. In Varnish terminology, this is a "hit-for-pass".
If we don't know an object is uncacheable until after we fetch it, it's initially identical to a normal miss. Which means coalescing, other requests for the same object will wait for the first response. But after that first fetch we get an uncacheable object, which can't answer the other requests which might have queued. Because of that they all get serialized and we've destroy the performance of hot (high-parallelism) objects that are uncacheable. "hit-for-pass" is the answer to that problem. When we make that first request (no knowledge), and get an uncacheable response, we create a special cache entry that says something like "this object cannot be cached, remember it for 10 minutes" and then all remaining queries for the next 10 minutes proceed in parallel without coalescing, because it's already known the object isn't cacheable.
The content of the X-Cache header is recorded for every request in the webrequest log table.
Example: X-Cache: cp1066 hit/6, cp3043 hit/1, cp3040 hit/26603
X-Client-IP
Reports the User-Agent IP as reported by the layer 3 (no HTTP headers are parsed to populate the header).
Example: X-Client-IP: 185.15.58.224
Example: X-Client-IP: 2a02:ec80:600:ed1a::1
X-Client-Port
Reports the source port of the connection on the client side, which is the port the client connected from.
Example: X-Client-Port: 25312
X-Connection-Properties
A multi-value header that lists various properties of the request. These properties always include the following key=value properties delimited by semi-colons (;):
  • H2: Represents whether HTTP/2 is used. Possible values are 0 or 1.
  • SSR: Returns true if the TLS session has been resumed through the use of SSL session cache or TLS tickets on an incoming connection over an SSL/TLS transport layer. Possible values are 0 or 1.
  • SSL: Returns the name of the used protocol when the incoming connection was made over an TLS transport layer.
  • C: Returns the name of the used cipher when the incoming connection was made over an TLS transport layer.
  • EC: The elliptic curve used.
Example: X-Connection-Properties: H2=1;SSR=0;SSL=TLSv1.3;C=TLS_CHACHA20_POLY1305_SHA256;EC=X25519
X-Forwarded-Proto
Identifies the protocol (HTTP or HTTPS) used by connecting client. The value of this header hard-coded to https.
Example: X-Forwarded-Proto: https
X-Varnish-Cluster
This header is used to signal the back-end caching layer which varnish cluster handled a request. The value of this header is hard-coded to misc.
Example: X-Varnish-Cluster: misc

HTTPS

TLS protocols

When older standards are dropped, this is done gradually. Clients with deprecated protocols are served https://www.wikipedia.org/sec-warning giving information about why their browser will not be supported in the future.

Ciphers

TLS 1.2 ciphers, in order of preference, are:

  • ECDHE-ECDSA-AES256-GCM-SHA384
  • ECDHE-ECDSA-CHACHA20-POLY1305
  • ECDHE-ECDSA-AES128-GCM-SHA256
  • ECDHE-RSA-AES256-GCM-SHA384
  • ECDHE-RSA-CHACHA20-POLY1305
  • ECDHE-RSA-AES128-GCM-SHA256

TLS 1.3 cipher suites, in order of preference, are:

  • TLS_AES_256_GCM_SHA384
  • TLS_CHACHA20_POLY1305_SHA256
  • TLS_AES_128_GCM_SHA256

Rate-limiting

Once an IP reaches a limit of over 2000 concurrent requests, all traffic to that IP is dropped for 300 seconds (five minutes). Connections/sockets are immediately freed to prevent any saturation-based outage. This has a nice side-effect of giving the appearance of their attack succeeding since the attackers will experience endless loading.

Requests that have reached other components behind this portion of the stack will not be canceled.

Request Normalization

Query sorting

Query parameters are alphabetically sorted to improve cache hitrate. Without sorting, /page?a=1&b=1 and /page?b=1&a=1 would miss the cache despite technically being the same page. Alphabetical sorting creates predictable URLs.

Example: /favicon.ico?vgutierrez=1&c=1&b=0&a=0 is sorted as /favicon.ico?a=0&b=0&c=1&vgutierrez=1

This very same sorting strategy is implemented in purged, the daemon responsible for fetching purge events from the application layer and injecting them in both the front-end and back-end caching layer.

Path normalization

Pages with parentheses or certain other special characters in their titles have more than one correct URL. For example the two following URLs are both correct:

One with literal parentheses, one with parentheses URL-encoded, or one with a mix of the two are all valid. However, when a page changes, purges are sent only for the URL-encoded URL: if the encoded URL is cached, it does not get purged.

Caching

This diagram is for the "text" cache cluster. The "upload" cluster is similar.

Current cache clusters in all data centers:

cache_text
Primary cluster for all wiki domains traffic (MediaWiki), and misc web services (e.g. Gerrit, Phabricator)
cache_upload
Serves upload.wikimedia.org and maps.wikimedia.org exclusively (images, thumbnails, map tiles)

Any other cache clusters one might find in the wild are likely historical and decommissioned.

Text cluster

The front-end caching layer hides non-session cookies (those that don't match ([sS]ession|Token)=) for cache lookup purposes. After cache lookup is performed the cookies are restored so they reach upstream as expected. This assumes that any upstream that requires some non-session cookie to work properly (like the GeoIP one) will return a non cacheable response.

By default, varnish doesn't cache requests with cookies, to be able to cache responses with cookies and without Vary:Cookie varnish will replace session cookies with the fixed string Token=1 if and only if Vary:Cookie isn't present in the response

Logic

The backend caching layer avoids caching responses that meet any of the following requirements:

  • Response contains a Set-Cookie header
  • Response contains a Vary:Cookie header and an uncacheable cookie
  • Content-Length is bigger than 1GB
  • Response status is higher than 499
  • Request contains an Authorization header

Additionally the backend caching layer will skip cache lookup for any request that meet any of the following requirements:

  • Request contains an Authorization header

Retention

Web browsers first hit the LVS load balancers.

LVS distributes traffic to the edge frontend cluster. As of June 2022, the frontend cache is capped to 1 day with a 7-day keep for benefit of HTTP-304 via IMS/INM (wikimedia-frontend.vcl).

Misses from the frontend are hashed to the edge backend cluster. Since April 2020, the ATS backend TTL is capped to 24 hours (T249627, trafficserver/backend.pp).

Misses and HTTP-304 renewals from the ATS backend are routed to the MediaWiki app servers. Since July 2016, the max-age for page views is 14 days (T124954, $wgCdnMaxAge), which controls for how long an unmodified page may have its page view HTML renewed (possibly several times, after another 24 hours), and this shapes the long tail for configuration changes, skin changes, and anything else that isn't tracked by the page edit timestamp or stored inside ParserOutput/ParserCache. Changes were only one reality is meant to be presented, should generally pre-seed their state for 14 days to be fully resiliant against this.

Since Dec 2023, the wikitext parser cache retains entries for 30 days (T280604, wgParserCacheExpireTime, wmf-config).

Invalidating content

For Varnish:

  • When pages are edited, their canonical url is proactively purged by MediaWiki (via Kafka and Purged).

For ParserCache:

  • Values in ParserCache are verifiable by revision ID. Edits will naturally invalidate it.
  • The TTL is enforced through a daily maintenance script, schedule via Puppet class misc::maintenance::parsercachepurging.

Optimizations

The backend caching layer strips all cookies (except MediaWiki/CentralAuth sessions) when performing cache lookups. It is thus assumed that all other cookies are either for client-side usage only (and safe to ignore for caching), or are used by low-traffic features that explicitly opt-out from caching. This significantly improves hitrate and reduces cache writes (change 828002).

If MediaWiki or other applications make use of a non-session cookie (e.g. GeoIP), it must produce a non-cacheable response (via Cache-Control).

History

An overview of notable events and changes to our caching infrastructure:

Old caching clusters

These former clusters no longer exist but remnants may exist in our repositories.

cache_bits
Used to exist just for static content and ResourceLoader, now decommissioned (traffic went to cache_text)
cache_maps
Served maps.wikimedia.org exclusively, which is now serviced by cache_upload
cache_misc
Miscellaneous lower-traffic / support services (e.g. phabricator, metrics, etherpad, graphite, etc). Now moved to cache_text.
cache_mobile
Was like cache_text but just for (m|zero)\. mobile hostnames, now decommissioned (traffic went to cache_text)
cache_parsoid
Legacy entrypoint for parsoid and related *oid services, now decommissioned (traffic goes via cache_text to RestBase)

Through the years

2023:

  • In Dec 2023, parser cache retention was raised back to 30 days (T280604).

2022:

  • In April 2022, we replaced ATS with HAProxy for TLS termination and HTTP/2 (T290005). This changed the stack to: HAProxy for TLS termination, Varnish frontend, and ATS backend.

2021:

  • In May 2021, parser cache retention was temporary reduced from 30 to 21 days due to reaching capacity limits (change 685181).

2020:

  • In June 2022, the Purged service was introduced. MediaWiki no longer uses multicast HTCP purging, but instead produces Kafka events for purging URLs, which local Purged instances on Varnish and ATS servers consume and apply by producing local PURGE requests.
  • In April 2020, a year after switching from Varnish to ATS as cache backend, the TTL was re-enabled and lowered from the 7 days set in 2016, down to 24 hours (T249627). With Varnish frontend also at 1 day and a grace-keep of 7 days, this means frontend objects may outlive backend ones.

2019:

  • We adopted the "ATS sandwich" featuring Apache Traffic Server (ATS) as both TLS terminator and as backend cache, thus discontinuing Nginx- ("nginx minus") and Varnish backend. This changed the stack to:
    • ATS for TLS termination (ats-tls),
    • Varnish frontend (varnish-fe), and
    • ATS backend (ats-be).
  • It was explored to evolve the ATS-TLS layer to subsume the responsibilities of Varnish-frontend one day.
  • Prior to 2019, the stack for many years involved Nginx- for TLS termination and HTTP2, Varnish as frontend, and a second Varnish layer as cache backend. As such, in older documentation "Varnish" might sometimes also refer to the cache backend.

2016:

  • We decreased the max object TTL in Varnish from the long-standing 31 days down to 1 day for Varnish frontends, and 14 days for Varnish backends and MediaWiki (T124954). The parser cache remains at 31 days.
  • We deployed HTTP/2 support to the Wikimedia CDN, which was at the time comprised of Nginx- and Varnish (T96848).

2013:

  • Prevent white-washing of expired page-view HTML. Various static aspects of a page are not tracked or versions, as such, when the max-age expires, a If-Not-Modified must not return true after expiry even if the database entry of the wiki page was unchanged (T46570).

Further reading

See also

  • CDN/Hardware: An overview of the physical servers powering the CDN.
  • SRE/Traffic: A full overview of CDN software components