MediaWiki HTTP cache headers
![]() | This page may be outdated or contain incorrect details. Please update it if you can. |
Varnish and Apache Traffic Server cache Wikipedia content.
We use varnish to cache bits.wikimedia.org content — CSS and JS resources, some images.
Cache headers
(See the HTTP specs for more formal wording)
Headers are sent mainly in OutputPage.php, function sendCacheControl(), around line 317. The headers sent depend mainly on the action (setSquidMaxage = $wgSquidMaxage in index.php for view and history) and if a cookie is sent by the browser.
Headers explained
Last-modified
This is required for client-side caching, as without it browsers don't know what to base their if-modified-since requests on. If the page hasn't changed the squid will only respond with a 304 (unchanged) status code, and only the response code and headers are transferred.
Cache-control
s-maxage
Tells intermediate caches such as squids how long they should consider the content to be valid without ever checking back. This needs to be hidden from caches we can't purge, otherwise users won't see changes. This is the reason for a header_access rule on the Squids which replaces any Cache-control header with one that only allows client caching:
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
max-age
How long clients (browsers) should deem the content to be up to date. We allow clients to keep the page (the 'private' allows this), but tell them to send a conditional if-modified-since request. For this of course the Last-modified header is needed, we set it to the last modification time or- if we don't have it- to the current time minus one hour. Images and stylesheets (including the generated ones that represent the user's pref selections) have max-age > 0 to avoid reloading those on each request. This is the reason why users have to refresh their cache after changing the prefs. (Is there a way to force a client to re-request something using javascript?)
private
Allows browsers to cache the content
Putting it together
Cache-Control: s-maxage=($wgSquidMaxage) , must-revalidate, max-age=0'
Allows caching on squids (s-maxage) which will replace it with
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
for all anon visitors without session which don't send a cookie. Second-tier squids are allowed to get the original headers with a special rule in squid.conf that matches their ips. After the first visit to an edit page or login the user sends a cookie and mw will also send no s-maxage to the squids so they don't cache it:
Cache-Control: private, must-revalidate, max-age=0
This again allows browsers to cache the page while forcing them to check for changes on each page view.
Vary
Tells downstream proxy caches to cache the content depending on some values — if those values are different, serve another page for the same url. For example, we use
Vary: Accept-Encoding, Cookie
to make sure logged-in users (which send a cookie) get pages with their user name and prefs (the cookie bit) and clients that don't support gzip transfer-encoding don't get compressed pages. I think there's some support for transparent decompression in Squid3, so it might not require to store different copies. See also: Vary in RFC 2616 and HTTP State Management Mechanism].