Jump to content

Test Kitchen/Architecture

From Wikitech

This page provides a detailed overview of the Test Kitchen architecture and how its components work together to enable instrumentation and experimentation.

Test Kitchen is a distributed system that enables product teams to instrument features and run A/B tests on Wikipedia and other Wikimedia projects. The architecture supports two types of experiments based on different enrollment mechanisms:

  • Everyone experiments: Target all users (logged-in and logged-out) using edge unique cookies for enrollment at the CDN layer
  • Logged-in experiments: Target authenticated users using Central Auth IDs for enrollment at the application layer

Components

A detailed overview of the architecture of Test Kitchen

User-facing components

Test Kitchen UI (formerly known as MPIC or xLab)

Web application where experiment and instrument owners configure their experiments and instruments. The Test Kitchen UI serves configuration to enrolment authorities and analytics systems via a RESTful API.

The Test Kitchen UI provides the following REST API endpoints:

https://test-kitchen.wikimedia.org/api/v1/instruments (all instruments)

https://test-kitchen.wikimedia.org/api/v1/experiments (all experiments)

SDK (Client Libraries)

Standardized toolkits for creating and maintaining analytics instrumentation, available for:

  • JavaScript: For client-side instrumentation in web browsers
  • PHP: For server-side instrumentation in MediaWiki
  • Kotlin: For Android app instrumentation
  • Swift: For iOS app instrumentation, in the future

The SDKs provide methods for:

  • Submitting events to event streams
  • Checking experiment enrollment and assignment
  • Creating instruments and managing schemas
  • Interacting with EventGate via EventLogging/EventBus

Superset

Analysis platforms where experiment results from automated analytics are visualized. Results available at https://superset.wikimedia.org/superset/dashboard/experiment-analytics/ Experiment results update hourly during active experiments.

Data Pipeline Components

EventGate

The event intake service that receives, validates, and routes analytics events from instrumentation to the data lake. EventGate:

  • Validates events against their declared schemas
  • Enriches events with HTTP header data (when configured)
  • Routes events to appropriate Kafka topics
  • Ensures events reach Hive for storage

MediaWiki Extensions

Extension:TestKitchen

The core extension that integrates Test Kitchen with MediaWiki. (Formerly named Extension:TestKitchen.)

Responsibilities:

  • Fetches instrument and experiment configuration from Test Kitchen API
  • Acts as the enrolment authority for logged-in experiments
  • Provides PHP SDK methods for server-side instrumentation
  • Manages caching of Test Kitchen configuration in the WAN Cache

Caching strategy:

  • Prioritizes content delivery speed over configuration freshness
  • Asynchronously fetches configuration from Test Kitchen API every minute
  • Returns empty response if no cached response available, which disables all Test Kitchen-managed instrumentation, ensuring site performance is never degraded by instrumentation issues

The TestKitchen extension fetches configuration from:

Extension:EventLogging

Handles submission of analytics events to EventGate.

Function:

  • For JavaScript instrumentation: submits events directly to EventGate
  • For PHP instrumentation: defers to EventBus for submission
  • Uses EventStreamConfigs to determine event stream details

Extension:EventBus

Handles server-side event submission for PHP instrumentation.

Function:

  • Receives events from EventLogging (when instrumentation is in PHP)
  • Submits events to EventGate
  • Works in conjunction with EventStreamConfigs

Extension:EventStreamConfigs

Provides event stream configuration details to other extensions.

Function:

  • Stores and serves event stream configurations
  • Consumed by EventLogging and EventBus to determine where events should be sent
  • Configurations can be dynamically created by TestKitchen extension for Test Kitchen instruments

Extension:WikimediaEvents

Common location for analytics instrumentation code.

Usage:

  • Many product teams write their experiment instrumentation here
  • Provides shared ResourceLoader modules for analytics
  • Alternative: teams can instrument in their own product codebases

GrowthBook JS SDK Analysis

One piece of code that is very interesting to us is the GrowthBook JavaScript SDK. This SDK has multiple functions, all of which are interesting to us. An analysis of the code will be conducted collaboratively here: Test Kitchen/Architecture/GrowthBook_JS_SDK_Analysis

Flows

Configuration Flow

How experiment configuration travels from Test Kitchen UI to enrollment authorities

1. Experiment Owner
   ↓ Creates/configures experiment in UI
   
2. Test Kitchen UI
   ↓ Stores configuration in database
   ↓ Exposes via REST API
   
3. Enrollment Authorities fetch config:
   
   Varnish Path:
   ├─ Polls API every ~1 minute
   ├─ Fetches: /api/v1/experiments?format=config&authority=varnish
   ├─ Caches config locally on each cache node
   └─ Requires 24hr advance notice for propagation
   
   MediaWiki Path:
   ├─ TestKitchen polls API every ~1 minute
   ├─ Fetches: /api/v1/experiments?format=config&authority=mediawiki
   ├─ Caches in WAN cache
   └─ Config updates propagate within minutes
   
4. Ready for Enrolment
   └─ Authorities can now enroll users and assign groups

Enrollment Flow

How users get enrolled and assigned to experiment groups:

1. User Visits Wiki
   ↓
   
2. Enrollment Authority Checks User
   
   Everyone Experiment (Varnish):
   ├─ Checks for wmf-uniq cookie (Edge Unique ID)
   ├─ If no cookie, creates one
   ├─ Hashes cookie + experiment name → subject ID
   ├─ Checks if subject ID falls into a bucket → enrolled
   ├─ If enrolled, uses bucket to determine experiment group
   └─ Adds enrollment info to internal HTTP header for downstream services
   
   Logged-in Experiment (MediaWiki):
   ├─ Checks Central Auth ID
   ├─ Hashes ID + experiment ID → subject ID
   ├─ Checks if subject ID falls below a threshold → enrolled
   └─ If enrolled, uses subject ID to determine experiment group

3. TestKitchen extension combines enrollment info and makes it available to feature code

4. Feature Code Executes
   ├─ Checks: experiment.isAssignedGroup('treatment_name')
   ├─ Executes variant logic based on assignment
   └─ User sees appropriate variant
   
5. Enrollment Persists
   
   Everyone Experiment:
   └─ Enrollment tied to cookie lifetime
      └─ User may switch groups if cookie cleared
   
   Logged-in Experiment:
   └─ Enrollment tied to Central Auth ID
      └─ Consistent across all wikis and sessions

Data Flow

How analytics events travel from user interactions to analysis dashboards:

1. User Interacts with Feature
   ↓ Triggers instrumented action
   
2. SDK Captures Event
   
   Client-side (JavaScript):
   ├─ experiment.send('action', {data})
   └─ Creates event with experiment metadata
   
   Server-side (PHP):
   ├─ experiment->send('action', [data])
   └─ Creates event with experiment metadata
   
3. Event Contains:
   ├─ experiment.name (e.g., "larger-default-font-size")
   ├─ experiment.enrolled (true/false)
   ├─ experiment.assigned (e.g., "x-large" or "control")
   ├─ experiment.subject_id
   |  ├─ Everyone Experiments: "awaiting"
   |  └─ Logged-in Experiments: subject ID
   ├─ action (e.g., "page-visited")
   ├─ contextual attributes (performer info, wiki, page, etc.)
   └─ interaction data (custom fields)
   
4. EventLogging/EventBus
   ↓ Submits event to EventGate
   
5. EventGate
   ├─ Validates against schema
   ├─ Enriches with headers (if configured)
   |  └─ Everyone Experiments: Replaces "awaiting" with subject ID from enrollment authority
   └─ Routes to Kafka topic
   
6. Hive
   ├─ Events land in tables (e.g., product_metrics.web_base)
   └─ Available for querying within ~2.5 hours
   
7. Automated Analysis
   ├─ Queries Hive for experiment events
   ├─ Computes metrics (clickthrough rate, retention, etc.)
   ├─ Runs statistical analysis (Bayesian + frequentist)
   ├─ For everyone experiments: splits by logged-in status
   └─ Updates every hour during experiment
   
8. Superset
   └─ Displays results to experiment owners
      ├─ Metric values per group
      ├─ Statistical significance
      ├─ Confidence intervals
      └─ Sample sizes