Multicast HTCP purging
Typical Purge Flow
- MediaWiki instance detects that a purge is needed. It sends a multicast HTCP packet for each individual URI that needs to be purged.
- Native multicast routing propagates the packet to all of our datacenters
- The daemon vhtcpd which is running on every relevant cache machine and subscribed to the appropriate multicast group(s) receives a copy of the HTCP request
- vhtcpd forwards the request to the Varnish instances on the local host over persistent HTTP/1.1 connections, using the PURGE request method
- PURGE requests are handled by our custom VCL and cause the URI in question to be purged.
Risks for loss of purge requests
In general, multicast HTCP requests are UDP and have no response or confirmation back to the sender, therefore there is *always* the potential for requests to be silently lost at various points along the path. Specifically:
- Application UDP send buffers - If the sending application (e.g. MediaWiki) does not allocate a sufficient UDP send buffer (e.g. via setsockopt(SO_SNDBUF)) to handle its own outgoing packet rate spikes, they could be dropped from the UDP send buffer before they ever leave the application host.
- Network infrastructure - any router or switch could drop packets on the floor due to excessive congestion or other similar issues.
- vhtcpd's UDP receive buffers - inverse of the first risk, on the receiving side. vhtcpd currently configures its listening sockets with fairly large (16MB) receive buffers to help mitigate this risk. Its internal code structure also prioritizes pulling requests from the UDP buffers into the internal memory queue over sending the requests on towards the varnishes (this is the inverse of the usual priority pattern for such software, which would be to prioritize emptying the internal queue over filling it, but helps avoid these potential UDP buffer losses).
- vhtcpd queue overflow - vhtcpd's internal buffers are large (default: 256MB, current config: 1GB), so that they can absorb long rate spikes and deal with temporary varnishd downtimes, etc. However, if conditions conspire to completely fill the internal buffer, vhtcpd's recourse is to wipe the buffer and start fresh again. The count of buffer wipes is visible via the queue_overflow statistic, which is exported to ganglia.
Within our network, the following distinct multicast addresses are reserved for HTCP purging:
- 220.127.116.11:4827 - text - used to purge all general traffic (e.g. Wikis and services)
- 18.104.22.168:4827 - upload - used to purge upload.wikimedia.org images
- 22.214.171.124:4827 - maps - used to purge maps.wikimedia.org images
- 126.96.36.199:4827 - misc - used to purge various services/URLs serviced by cache_misc
Note: the list of above is ideal/near-future, and was recently briefly true, but for now upload and text are both sharing the text multicast address because the change that split them was reverted temporarily...
MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see DefaultSettings.php to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.
All Apaches are configured through CommonSettings.php to send HTCP purge requests to the appropriate multicast group. It uses multicast Time To Live 8 (instead of the default, 1) because the messages need to cross multiple routers.
On terbium, run:
$ echo 'https://example.org/foo?x=y' | mwscript purgeList.php
Note that static content under
/static/ must always be purged via hostname '
en.wikipedia.org'. This is the shared virtual hostname under which Varnish caches content for
/static/, regardless of requesting wiki hostname. Note also that mobile hostnames are cached independently of desktop hostnames. For example, to purge all copies of enwiki's article about Foo, one must purge both https://en.wikipedia.org/wiki/Foo and https://en.m.wikipedia.org/wiki/Foo
Confirming that varnish is receiving and processing purge requests:
# frontend instance: varnishlog -c -n frontend -m RxRequest:PURGE # backend instance: varnishlog -c -m RxRequest:PURGE
Confirming that vhtcpd is operating correctly:
Which will show output similar to:
start:1453497429 uptime:353190 inpkts_recvd:611607720 inpkts_sane:611607720 inpkts_enqueued:611607720 inpkts_dequeued:594452309 queue_overflows:3 queue_size:0 queue_max_size:2282100
The file is written out every 15 seconds or so. The fields are:
- start - the unix timestamp the daemon started at
- uptime - seconds the daemon has been running
- inpkts_recvd - input HTCP packets received from the network
- inpkts_sane - packets from above that survived sanity-checking and parsing
- inpkts_enqueued - packets from above that made it into the internal queue
- inpkts_dequeued - packets from the above that have been dequeued (sent) to all local varnish daemons
- queue_overflows - number of times the internal queue has reached the maximum size limit and wiped back to zero
- queue_size - current size of the internal request queue
- queue_max_size - the maximum size the queue has ever been since startup or the last overflow wipe
This data is also recorded in ganglia metrics for all of the cache hosts as well. Note that both local varnish daemons (frontend and backend) must dequeue a packet before it leaves the queue. If one daemon is stuck or stopped, that will eventually cause a queue overflow!
To dump traffic directly off the network interfaces, use e.g.:
tcpdump -n -v udp port 4827 and host 188.8.131.52
(but note that you will only see traffic if the machine is subscribed to multicast, and generally the vhtcpd daemon must be up and listening for that to happen!)