User:Vgutierrez/CDN

From Wikitech
Jump to navigation Jump to search

The Traffic team maintains the user-facing CDN stack used in Wikimedia projects. This page documents what this CDN exposes for downstream to predictably consume.

Overview of CDN stack

HAProxy
TLS Termination, HTTP/2 termination and rate limiting.
Varnish
Front-end caching.
Apache Traffic Server
Back-end caching.

Headers

X-Analytics
This header is used for measurement purposes and its behavior is documented in X-Analytics.
X-Analytics-TLS
A multi-value header that lists various properties of the request. These properties always include the following key=value properties delimited by semi-colons (;):
  • vers: Returns the name of the used protocol when the incoming connection was made over an TLS transport layer.
  • keyx: Elliptic curve used
  • ciph: The name of the used cipher.
  • prot: The HTTP protocol version used
  • sess: Whether the request is part of a new TLS session or an existing one.
Example: X-Analytics-TLS: vers=TLSv1.3;keyx=X25519;auth=ECDSA;ciph=CHACHA20-POLY1305-SHA256;prot=h2;sess=new
X-Client-IP
Reports the User-Agent IP as reported by the layer 3 (no HTTP headers are parsed to populate the header).
Example: X-Client-IP: 185.15.58.224
Example: X-Client-IP: 2a02:ec80:600:ed1a::1
X-Client-Port
Reports the source port of the connection on the client side, which is the port the client connected from.
Example: X-Client-Port: 25312
X-Connection-Properties
A multi-value header that lists various properties of the request. These properties always include the following key=value properties delimited by semi-colons (;):
  • H2: Represents whether HTTP/2 is used. Possible values are 0 or 1.
  • SSR: Returns true if the TLS session has been resumed through the use of SSL session cache or TLS tickets on an incoming connection over an SSL/TLS transport layer. Possible values are 0 or 1.
  • SSL: Returns the name of the used protocol when the incoming connection was made over an TLS transport layer.
  • C: Returns the name of the used cipher when the incoming connection was made over an TLS transport layer.
  • EC: The elliptic curve used.
Example: X-Connection-Properties: H2=1;SSR=0;SSL=TLSv1.3;C=TLS_CHACHA20_POLY1305_SHA256;EC=X25519
X-Forwarded-Proto
Identifies the protocol (HTTP or HTTPS) used by connecting client. The value of this header hard-coded to https.
Example: X-Forwarded-Proto: https
X-Varnish-Cluster
This header is used to signal the back-end caching layer which varnish cluster handled a request. The value of this header is hard-coded to misc.
Example: X-Varnish-Cluster: misc

HTTPS

TLS protocols

When older standards are dropped, this is done gradually. Clients with deprecated protocols are served https://www.wikipedia.org/sec-warning giving information about why their browser will not be supported in the future.

Ciphers

TLS 1.2 ciphers, in order of preference, are:

  • ECDHE-ECDSA-AES256-GCM-SHA384
  • ECDHE-ECDSA-CHACHA20-POLY1305
  • ECDHE-ECDSA-AES128-GCM-SHA256
  • ECDHE-RSA-AES256-GCM-SHA384
  • ECDHE-RSA-CHACHA20-POLY1305
  • ECDHE-RSA-AES128-GCM-SHA256

TLS 1.3 cipher suites, in order of preference, are:

  • TLS_AES_256_GCM_SHA384
  • TLS_CHACHA20_POLY1305_SHA256
  • TLS_AES_128_GCM_SHA256

Rate-limiting

Once an IP reaches a limit of over 2000 concurrent requests, all traffic to that IP is dropped for 300 seconds (five minutes). Connections/sockets are immediately freed to prevent any saturation-based outage. This has a nice side-effect of giving the appearance of their attack succeeding since the attackers will experience endless loading.

Requests that have reached other components behind this portion of the stack will not be canceled.

Request Normalization

Query sorting

Query parameters are alphabetically sorted to improve cache hitrate. Without sorting, /page?a=1&b=1 and /page?b=1&a=1 would miss the cache despite technically being the same page. Alphabetical sorting creates predictable URLs.

Example: /favicon.ico?vgutierrez=1&c=1&b=0&a=0 is sorted as /favicon.ico?a=0&b=0&c=1&vgutierrez=1

This very same sorting strategy is implemented in purged, the daemon responsible of fetching purge events from the application layer and inject them on both the front-end and back-end caching layer.

Path normalization

Pages with parentheses or certain other special characters in their titles have more than one correct URL. For example the two following URLs are both correct:

One with literal parentheses, one with parentheses URL-encoded, or one with a mix of the two are all valid. However, when a page changes, purges are sent only for the URL-encoded URL: if the encoded URL is cached, it does not get purged.

Caching logic

Text cluster

The front-end caching layer hides non-session cookies (those that don't match ([sS]ession|Token)=) for cache lookup purposes. After cache lookup is performed the cookies are restored so they reach upstream as expected. This assumes that any upstream that requires some non-session cookie to work properly (like the GeoIP one) will return a non cacheable response.

By default, varnish doesn't cache requests with cookies, to be able to cache responses with cookies and without Vary:Cookie varnish will replaces session cookies with the fixed string Token=1 if and only if Vary:Cookie isn't present in the response

Caching logic

The backend caching layer avoids caching responses that meet any of the following requirements:

  • Response contains a Set-Cookie header
  • Response contains a Vary:Cookie header and an uncacheable cookie
  • Content-Length is bigger than 1GB
  • Response status is higher than 499
  • Request contains an Authorization header

Additionally the backend caching layer will skip cache lookup for any request that meet any of the following requirements:

  • Request contains an Authorization header

Optimizations

The backend caching layer hides cacheable cookies during the cache lookup stage for text/upload (not misc) to improve hitrate and avoid unnecessary cache writes[1].