Varnish

From Wikitech
Jump to: navigation, search
Wikimedia infrastructure

Data centres and PoPs

Networking

HTTP Caching


Wiki

Media

Logs

Varnish is a fast caching HTTP proxy

Cache Clusters

We currently host the following Varnish cache clusters at all of our datacenters:

  • cache_text - Primary cluster for MediaWiki and various app/service (e.g. RESTBase) traffic
  • cache_upload - Serves upload.wikimedia.org and maps.wikimedia.org exclusively (images, thumbnails, map tiles)
  • cache_misc - Miscellaneous lower-traffic / support services (e.g. phabricator, metrics, etherpad, graphite, etc)

Old clusters that no longer exist:

  • cache_bits - Used to exist just for static content and ResourceLoader, now decommed (traffic went to cache_text)
  • cache_mobile - Was like cache_text but just for (m|zero)\. mobile hostnames, now decommed (traffic went to cache_text)
  • cache_parsoid - Legacy entrypoint for parsoid and related *oid services, now decommend (traffic goes via cache_text to RestBase)
  • cache_maps - Served maps.wikimedia.org exclusively, which is now serviced by cache_upload

Headers

X-Cache

X-Cache is a comma-separated list of cache hostnames with information such as hit/miss status for each entry. The header is read right-to-left: the rightmost is the outermost cache, things to the left are progressively deeper towards the applayer. The rightmost cache is the in-memory cache, all others are disk caches.

In case of cache hit, the number of times the object has been returned is also specified. Once "hit" is encountered while reading right to left, everything to the left of "hit" is part of the cached object that got hit. It's whether the entries to the left missed, passed, or hit when that object was first pulled into the hitting cache. For example:

X-Cache: cp1066 hit/6, cp3043 hit/1, cp3040 hit/26603

An explanation of the possible information contained in X-Cache follows.

Not talking to other servers

  • hit: a cache hit in cache storage. There was no need to query a deeper cache server (or the applayer, if already at the last cache server)
  • int: locally-generated response from the cache. For example, a 301 redirect. The cache did not use a cache object and it didn't need to contact another server

Talking to other servers

  • miss: the object might be cacheable, but we don't have it
  • pass: the object was uncacheable, talk to a deeper level

Some subtleties on pass: different caches (eg: in-memory vs. on-disk) might disagree on whether the object is cacheable or not. A pass on the in-memory cache (for example, because the object is too big) could be a hit for an on-disk cache. Also, it's sometimes not clear that an object is uncacheable till the moment we fetch it. In that case, we cache for a short while the fact that the object is uncachable. In Varnish terminology, this is a hit-for-pass.

If we don't know an object is uncacheable until after we fetch it, it's initially identical to a normal miss. Which means coalescing, other requests for the same object will wait for the first response. But after that first fetch we get an uncacheable object, which can't answer the other requests which might have queued. Because of that they all get serialized and we've destroy the performance of hot (high-parallelism) objects that are uncacheable. hit-for-pass is the answer to that problem. When we make that first request (no knowledge), and get an uncacheable response, we create a special cache entry that says something like "this object cannot be cached, remember it for 10 minutes" and then all remaining queries for the next 10 minutes proceed in parallel without coalescing, because it's already known the object isn't cacheable.

HOWTO

See Varnish statistics

Run

# varnishstat

Fred has also written a Ganglia plugin in Python for varnish, which is automatically installed by Puppet. All varnishstat metrics are therefore visible on Ganglia.

Set runtime parameters

Run

# varnishadm -S /etc/varnish/secret -T 127.0.0.1:6082

Add/remove backends

Puppet can automatically generate Varnish backend and director statements by setting the $varnish_backends and $varnish_directors variables (see below).

Alternatively, backends can be defined manually in the templates/varnish/wikimedia.vcl.erb ERB template file.

See request logs

As explained below, there are no access logs. However, you can see NCSA style log entries for current requests using:

# varnishncsa

See backend health

Run

# varnishlog -i Backend_health -O

Package a new Varnish release

 git checkout debian-wmf
 gbp import-orig --pristine-tar /tmp/varnish-${version}.tar.gz
 git push gerrit pristine-tar
 git push gerrit upstream
 # edit changelog, commit and open a code review:
 git push gerrit HEAD:refs/for/debian-wmf

Upgrading to a new minor Varnish release

Run the following commands to upgrade Varnish to a new minor release:

depool ; sleep 3 ; puppet agent --disable 'Upgrading varnish' ; run-no-puppet echo; apt update; service varnish-frontend stop; service varnish stop ; apt install varnish varnish-dbg libvarnishapi1 ; puppet agent --enable ; puppet agent -tv ; pool

Note that it's important to avoid race conditions with cron-scheduled puppet agent runs. The run-no-puppet command can be used for that purpose.

Upgrading from Varnish 3 to Varnish 4

In the specific case of upgrading from Varnish 3 to Varnish 4, follow this procedure:

  • Disable puppet on the node puppet agent --disable "Upgrading to Varnish 4"
  • Set varnish_version4 to true in hieradata
  • Depool the node and wait a bit for it to be drained: depool ; sleep 15
  • Verify that no user requests are being served by the frontend varnish: varnishncsa -n frontend -m 'RxRequest:^(?!PURGE$)' | grep -v PageGetter
  • Verify that no user requests are being served by the backend varnish: varnishncsa -m 'RxRequest:^(?!PURGE$)' | grep -v 'backend check'
  • Enable our experimental repo: echo deb http://apt.wikimedia.org/wikimedia jessie-wikimedia experimental > /etc/apt/sources.list.d/wikimedia-experimental.list ; apt update
  • Stop Varnish 3: service varnish-frontend stop; service varnish stop
  • Remove libvarnishapi1: apt-get -y remove libvarnishapi1
  • Wipe on-disk storage: rm -f /srv/sd*/varnish*
  • Re-enable puppet: puppet agent --enable
  • Run puppet agent a few times and ensure it completes successfully: puppet agent -t; puppet agent -t; puppet agent -t; puppet agent -t
  • Fix permissions for ganglia: chmod 644 /var/lib/varnish/*/*.vsm ; service ganglia-monitor restart
  • Test the upgrade and if everything is fine repool the node

Downgrading from Varnish 4 to Varnish 3

  • Disable puppet on the node puppet agent --disable "Downgrading to Varnish 3"
  • Remove varnish_version4 from hieradata, or set it to false
  • Depool the node and wait a bit for it to be drained: depool ; sleep 15
  • Verify that no user requests are being served by the frontend varnish: varnishncsa -n frontend -q 'not ReqMethod eq PURGE' | grep -v PageGetter
  • Verify that no user requests are being served by the backend varnish: varnishncsa -q 'not ReqMethod eq PURGE' | grep -v 'backend check'
  • Remove our experimental repo: rm /etc/apt/sources.list.d/wikimedia-experimental.list ; apt update
  • Stop Varnish 4: service varnish-frontend stop; service varnish stop
  • Remove libvarnishapi1: apt-get -y remove libvarnishapi1
  • Wipe on-disk storage: rm -f /srv/sd*/varnish*
  • Re-enable puppet: puppet agent --enable
  • Run puppet agent a few times and ensure it completes successfully: puppet agent -t; puppet agent -t; puppet agent -t; puppet agent -t. If something goes wrong here, you might have to unmask varnish.service: systemctl unmask varnish.service
  • Test the downgrade and if everything is fine repool the node

Upgrading from Varnish 4 to Varnish 5

  • Disable puppet on the node to be upgraded: puppet agent --disable "Upgrading to Varnish 5"
  • Set profile::cache::base::varnish_version: 5 and apt::use_experimental: true in hiera
  • Depool the node and wait a bit for it to be drained: depool ; sleep 15
  • Enable our experimental repo: echo deb http://apt.wikimedia.org/wikimedia jessie-wikimedia experimental > /etc/apt/sources.list.d/wikimedia-experimental.list ; apt update
  • Stop Varnish 4: service varnish-frontend stop; service varnish stop
  • Remove libvarnishapi1: apt-get -y remove libvarnishapi1
  • Re-enable puppet: puppet agent --enable
  • Run puppet agent a few times and ensure it completes successfully: puppet agent -t; puppet agent -t
  • Repool the node if everything looks fine: pool

Downgrading from Varnish 5 to Varnish 4

  • Disable puppet on the node to be downgraded: puppet agent --disable "Downgrading to Varnish 4"
  • Set profile::cache::base::varnish_version: 4 and apt::use_experimental: false in hiera
  • Depool the node and wait a bit for it to be drained: depool ; sleep 15
  • Disable our experimental repo: rm /etc/apt/sources.list.d/wikimedia-experimental.list ; apt update
  • Stop Varnish 5: service varnish-frontend stop; service varnish stop
  • Remove libvarnishapi1: apt-get -y remove libvarnishapi1
  • Re-enable puppet: puppet agent --enable
  • Run puppet agent a few times and ensure it completes successfully: puppet agent -t; puppet agent -t
  • Repool the node if everything looks fine: pool

Some more tricks

// Query Times
varnishncsa -F '%t %{VCL_Log:Backend}x %Dμs %bB %s %{Varnish:hitmiss}x "%r"'

// Top URLs
varnishtop -i RxURL

// Top Referer, User-Agent, etc.
varnishtop -i RxHeader -I Referer
varnishtop -i RxHeader -I User-Agent

// Cache Misses
varnishtop -i TxURL

Configuration

Deployment and configuration of Varnish is done using Puppet.

See the production/modules/varnish and varnishkafka repositories in operations/puppet.

We use a custom varnish 3.0.x package with several local patches applied. Varnish uses a VCL file (Varnish Configuration Language), a DSL where Varnish behavior is controlled using subroutines that are compiled into C and executed during each request. The VCL files are located at

/etc/varnish/*.vcl

One-off purges (bans)

Note: if all you need is purging objects based on their URLs, see how to perform One-off purges.

Sometimes it's necessary to do one-off purges to fix operational issues, and these are accomplished with the varnishadm "ban" command. It's best to read up on this thoroughly ahead of time! What a ban effectively does in practice is mark all objects that match the ban conditions, which were in the cache prior to the ban command's execution, as invalid.

Keep in mind that bans are not routine operations! These are expected to be isolated low-rate operations we perform in emergencies or after some kind of screw-up has happened. These are low-level tools which can be very dangerous to our site performance and uptime, and Varnish doesn't deal well in general with a high rate of ban requests. These instructions are mostly for operations staff use (with great care). Depending on the cluster and situation, under normal circumstances anywhere from 85 to 98 percent of all our incoming traffic is absorbed by the cache layer, so broad invalidation can greatly multiply applayer request traffic until the caches refill, causing serious outages in the process.

How to execute a ban (on one machine)

The varnishadm ban command is generally going to take the form:

varnishadm [-n frontend] ban [ban conditions]

Note that every machine has two varnish daemons, the default (backend) instance which requires no '-n' parameter, and the frontend instance that requires '-n frontend'.

Execute a ban on a cluster

The following example shows how to ban all objects with Content-Length: 0 and status code 200 in ulsfo cache_upload:

salt -b 1 -C 'G@site:ulsfo and G@cluster:cache_upload' cmd.run "varnishadm -n frontend ban 'obj.status == 200 && obj.http.content-length == 0'"

Examples of ban conditions

Ban all content on zh.wikipedia.org

req.http.host == "zh.wikipedia.org"

Ban all 301 redirect objects in hostnames ending in wikimedia.org

obj.status == 301 && req.http.host ~ ".*wikimedia.org"

Ban all urls that start with /static/ , regardless of hostname:

 req.url ~ "^/static/"

Ban condition for MediaWiki outputs by datestamp of generation

1. Determine the start/end timestamps you need in the same standard format as date -u:

 > date -u
 Thu Apr 21 12:16:52 UTC 2016

2. Convert your start/end timestamps to unix epoch time integers:

 > date -d "Thu Apr 21 12:16:52 UTC 2016" +%s
 1461241012
 > date -d "Thu Apr 21 12:36:01 UTC 2016" +%s
 1461242161

3. Note you can reverse this conversion, which will come in handy below, like this:

 > date -ud "Jan 1, 1970 00:00:00 +0000 + 1461241012 seconds"
 Thu Apr 21 12:16:52 UTC 2016

4. For ban purposes, we need a regex matching a range of epoch timestamp numbers. It's probably easiest to approximate it and round outwards to a slightly wider range to make the regex simpler. This regex rounds the range above to be 1461241000 - 1461242199 , which if converted back via step 3, shows we've rounded the range outwards to cover 12:16:40 through 12:36:39

 146124(1[0-9]|2[01])

5. MediaWiki emits a Backend-Timing header with fields D and t, where t is a microsecond-resolution epoch number (we'll ignore those final 6 digits), like so:

 Backend-Timing: D=31193 t=1461241832338645

6. To ban on this header using the epoch seconds regex we build in step 4:

 ban obj.http.Backend-Timing ~ "t=146124(1[0-9]|2[01])"

How to execute a ban across a cluster

The first step is selecting the correct cluster from the list at the top of this page for the traffic you're trying to ban.

Keeping in mind the architecture of our cache tiers and layers, there are ordering rules that must be followed:

  1. Backends at Tier-1 datacenters must be banned before backends at Tier-2 datacenters.
  2. Backends at any datacenter must be banned before frontends at the same datacenter.

Currently, only eqiad is a Tier-1 datacenter at the Traffic layer, and all others are Tier-2. Therefore a reasonable procedure that obeys the rules above is:

  1. Ban eqiad backend instances
  2. Ban codfw backend instances
  3. Ban ulsfo and esams backend instances
  4. Ban all frontend instances

For distributed execution of ban commands, cache clusters and sites can be selected with salt grain conditionals on "site" and "cluster".

Putting this all together, this is a real example of banning all 404 objects with request URL "/apple-app-site-association":

salt -b 1 -v -t 30 -C 'G@cluster:cache_text and G@site:eqiad' cmd.run "varnishadm ban 'req.url == \"/apple-app-site-association\" && obj.status == 404'"
salt -b 1 -v -t 30 -C 'G@cluster:cache_text and G@site:codfw' cmd.run "varnishadm ban 'req.url == \"/apple-app-site-association\" && obj.status == 404'"
salt -b 1 -v -t 30 -C 'G@cluster:cache_text and not G@site:codfw and not G@site:eqiad' cmd.run "varnishadm ban 'req.url == \"/apple-app-site-association\" && obj.status == 404'"
salt -b 1 -v -t 30 -C 'G@cluster:cache_text' cmd.run "varnishadm -n frontend ban 'req.url == \"/apple-app-site-association\" && obj.status == 404'"

How to execute a ban across a cluster with cumin

Cluster-wide ban of objects with Content-Type text on cache_upload:

cumin -b 1 'R:class = role::cache::upload and *.eqiad.wmnet' "varnishadm ban 'obj.http.content-type ~ \"^text\"'"
cumin -b 1 'R:class = role::cache::upload and *.codfw.wmnet' "varnishadm ban 'obj.http.content-type ~ \"^text\"'"
cumin -b 1 'R:class = role::cache::upload and not *.eqiad.wmnet and not *.codfw.wmnet' "varnishadm ban 'obj.http.content-type ~ \"^text\"'"
cumin -b 1 'R:class = role::cache::upload' "varnishadm -n frontend ban 'obj.http.content-type ~ \"^text\"'"

Cluster-wide ban of objects with specific path:

cumin -b 1 'R:class = role::cache::upload and *.eqiad.wmnet' "varnishadm ban 'req.url ~ \"^/path\"'"
cumin -b 1 'R:class = role::cache::upload and *.codfw.wmnet' "varnishadm ban 'req.url ~ \"^/path\"'"
cumin -b 1 'R:class = role::cache::upload and not *.eqiad.wmnet and not *.codfw.wmnet' "varnishadm ban 'req.url ~ \"^/path\"'"
cumin -b 1 'R:class = role::cache::upload' "varnishadm -n frontend ban 'req.url ~ \"^/path\"'"

External links

Presentations:

See also

  • IPsec - Used to encrypt back haul links between caching POPs and primary sites