Multicast HTCP purging

This page contains historical information. Since 2020, see Kafka HTTP purging instead.

2020

Multicast HTCP purging was the dominant method of purging Varnish/ATS HTTP cache objects in our Traffic infrastructure from 2013 until July 2020, which is when we switched to Kafka-based CDN purging. See T250781 for all migration work details.

Typical Purge Flow

MediaWiki instance detects that a purge is needed. It sends a multicast HTCP packet for each individual URI that needs to be purged.
Native multicast routing propagates the packet to all of our datacenters
The daemon vhtcpd (replaced by purged in 2020 while migrating to Kafka) which is running on every relevant cache machine and subscribed to the appropriate multicast group(s) receives a copy of the HTCP request
vhtcpd forwards the request to the Varnish instances on the local host over persistent HTTP/1.1 connections, using the PURGE request method
PURGE requests are handled by our custom VCL and cause the URI in question to be purged.

Risks for loss of purge requests

In general, multicast HTCP requests are UDP and have no response or confirmation back to the sender, therefore there is *always* the potential for requests to be silently lost at various points along the path. Specifically:

Application UDP send buffers - If the sending application (e.g. MediaWiki) does not allocate a sufficient UDP send buffer (e.g. via setsockopt(SO_SNDBUF)) to handle its own outgoing packet rate spikes, they could be dropped from the UDP send buffer before they ever leave the application host.
Network infrastructure - any router or switch could drop packets on the floor due to excessive congestion or other similar issues.
vhtcpd's UDP receive buffers - inverse of the first risk, on the receiving side. vhtcpd currently configures its listening sockets with fairly large (16MB) receive buffers to help mitigate this risk. Its internal code structure also prioritizes pulling requests from the UDP buffers into the internal memory queue over sending the requests on towards the varnishes (this is the inverse of the usual priority pattern for such software, which would be to prioritize emptying the internal queue over filling it, but helps avoid these potential UDP buffer losses).
vhtcpd queue overflow - vhtcpd's internal buffers are large (default: 256MB, current config: 1GB), so that they can absorb long rate spikes and deal with temporary varnishd downtimes, etc. However, if conditions conspire to completely fill the internal buffer, vhtcpd's recourse is to wipe the buffer and start fresh again. The count of buffer wipes is visible via the queue_overflow statistic.

MediaWiki

MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see DefaultSettings.php to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.

All Apaches are configured through CommonSettings.php to send HTCP purge requests to the appropriate multicast group. It uses multicast Time To Live 8 (instead of the default, 1) because the messages need to cross multiple routers.

One-off purge

On mwmaint1002, run:

$ echo 'https://example.org/foo?x=y' | mwscript purgeList.php

Note that static content under /static/ must always be purged via hostname 'en.wikipedia.org'. This is the shared virtual hostname under which Varnish caches content for /static/, regardless of requesting wiki hostname. Note also that mobile hostnames are cached independently of desktop hostnames. For example, to purge all copies of enwiki's article about Foo, one must purge both https://en.wikipedia.org/wiki/Foo and https://en.m.wikipedia.org/wiki/Foo

Troubleshooting

Confirming that varnish is receiving and processing purge requests:

# frontend instance:
varnishlog -c -n frontend -m RxRequest:PURGE
# backend instance:
varnishlog -c -m RxRequest:PURGE

Confirming that vhtcpd is operating correctly:

cat /tmp/vhtcpd.stats

Which will show output similar to:

start:1453497429 uptime:353190 inpkts_recvd:611607720 inpkts_sane:611607720 inpkts_enqueued:611607720 inpkts_dequeued:594452309 queue_overflows:3 queue_size:0 queue_max_size:2282100

The file is written out every 15 seconds or so. The fields are:

start - the unix timestamp the daemon started at
uptime - seconds the daemon has been running
inpkts_recvd - input HTCP packets received from the network
inpkts_sane - packets from above that survived sanity-checking and parsing
inpkts_enqueued - packets from above that made it into the internal queue
inpkts_dequeued - packets from the above that have been dequeued (sent) to all local varnish daemons
queue_overflows - number of times the internal queue has reached the maximum size limit and wiped back to zero
queue_size - current size of the internal request queue
queue_max_size - the maximum size the queue has ever been since startup or the last overflow wipe

Note that both local varnish daemons (frontend and backend) must dequeue a packet before it leaves the queue. If one daemon is stuck or stopped, that will eventually cause a queue overflow!

To dump traffic directly off the network interfaces, use e.g.:

tcpdump -n -v udp port 4827 and host 239.128.0.112

(but note that you will only see traffic if the machine is subscribed to multicast, and generally the vhtcpd daemon must be up and listening for that to happen!)

External links