Jump to content

X-Provenance

From Wikitech

X-Provenance header

The X-Provenance HTTP header is used within the Wikimedia CDN and request classification systems to signal the origin or trust level of a request. It provides early, lightweight identification of known traffic sources, helping optimize filtering and rate-limiting decisions.

Purpose

This header is meant to:

  • Tag traffic based on its origin before deeper inspection (e.g. session token validation or UA classification)
  • Enable fast-path handling (e.g. skip filtering, assign different rate limits)
  • Allow Requestctl, HAProxy and Varnish logic to apply differentiated rules based on known provenance

Syntax

The header follows the form:

X-Provenance: label1=value1;labelN=valueN

Where label identifies the provenance of the request. Examples:

  • net: used to flag internal or requests coming from trusted network ranges
  • abuser: request coming from a known abuser
  • client: request coming from a known client ipblock
  • cloud: request coming from a known cloud
  • isp: ISP data provided by MaxMind ISP database
  • net=unknown: default fallback value
  • datacenter=true: indicates the request is coming from a datacenter, not from a eyeballs provider. Data is provided at the moment by the Spur datacenter feed
  • id: request coming from a verified client, for which we have both a matching user agent and a matching provenance expression. For instance, a request with user-agent "Googlebot" coming from the ip ranges of googlebot.

Use Cases

  • Applied by CDN edge terminators (HAProxy layer)
  • Enables bypassing generic rate limits or Requestctl rules for trusted sources
  • Can be used as an input to moat-mode rules or future trust scoring systems

Implementation

Currently implemented in:

  • HAProxy sets its value based on known IP ranges and MaxMind database
  • Requestctl rules consume the header for filtering decisions in both HAProxy and Varnish

Future Plans

  • Tighter integration with session/token-based identification
  • Use in shaping rate-limiting tiers dynamically
  • Expanded label taxonomy to support more trusted classes