Kafka HTTP purging

From Wikitech
Jump to navigation Jump to search
Wikimedia infrastructure

Data centres and PoPs

Networking

HTTP Caching


MediaWiki

Media

Logs

Search

[edit]

The current (2020) mechanism for purging objects from the CDN is based on a daemon running on all cache nodes called Purged. Purged can be configured to read purge messages using either the legacy Multicast HTCP purging mechanism, or via Kafka. Regardless of the source from which purge messages are read, Purged converts them into HTTP PURGE requests sent locally to both the ATS cache backend and to the Varnish cache frontend.

Typical Purge Flow

  • A MediaWiki instance detects that a purge is needed. It produces a Kafka message on a given topic for each individual URI that needs to be purged
  • The daemon Purged, running on every relevant cache machine and consuming from the appropriate Kafka topics receives a copy of the purge message
  • Purged forwards the request to the ATS and Varnish instances on the local host over persistent HTTP/1.1 connections, using the PURGE request method
  • PURGE requests are handled by ATS and Varnish and cause the URI in question to be purged

MediaWiki

All CDN purges are generated in MediaWiki via CdnCacheUpdate::purge method. Currently MediaWiki is configured to send the generated purges to the EventRelayer under the cdn-url-purges key. EventBus extension provides an implementation of the EventRelayer, CdnPurgeEventRelayer that creates purge events and sends them to Kafka using normal EventBus flow - via eventgate service.

Relevant configuration:

// Configuration for the EventRelayer to send purges to resource-purge kafka topic
'wgEventRelayerConfig' => [
	'cdn-url-purges' => [
		'class' => \MediaWiki\Extension\EventBus\Adapters\EventRelayer\CdnPurgeEventRelayer::class,
		'stream' => 'resource-purge',
	],
	'default' => [
		'class' => EventRelayerNull::class,
	],
],
// EventBus stream configuration 
'wgEventServiceDefault' => 'eventgate-main'

One-off purge

On mwmaint1002, run:

$ echo 'https://example.org/foo?x=y' | mwscript purgeList.php

Note that static content under /static/ must always be purged via hostname 'en.wikipedia.org'. This is the shared virtual hostname under which Varnish caches content for /static/, regardless of requesting wiki hostname. Note also that mobile hostnames are cached independently of desktop hostnames. For example, to purge all copies of enwiki's article about Foo, one must purge both https://en.wikipedia.org/wiki/Foo and https://en.m.wikipedia.org/wiki/Foo