Test Kitchen/Architecture
This page provides a detailed overview of the Test Kitchen architecture and how its components work together to enable instrumentation and experimentation.
Test Kitchen is a distributed system that enables product teams to instrument features and run A/B tests on Wikipedia and other Wikimedia projects. The architecture supports two types of experiments based on different enrollment mechanisms:
- Everyone experiments: Target all users (logged-in and logged-out) using edge unique cookies for enrollment at the CDN layer
- Logged-in experiments: Target authenticated users using Central Auth IDs for enrollment at the application layer
Components

User-facing components
Test Kitchen UI (formerly known as MPIC or xLab)
Web application where experiment and instrument owners configure their experiments and instruments. The Test Kitchen UI serves configuration to enrolment authorities and analytics systems via a RESTful API.
The Test Kitchen UI provides the following REST API endpoints:
https://test-kitchen.wikimedia.org/api/v1/instruments (all instruments)
https://test-kitchen.wikimedia.org/api/v1/experiments (all experiments)
- https://test-kitchen.wikimedia.org/api/v1/experiments?format=config&authority=mediawiki (logged-out experiments)
- https://test-kitchen.wikimedia.org/api/v1/experiments?format=config&authority=varnish (logged-in experiments)
- https://test-kitchen.wikimedia.org/api/v1/experiments?format=analytics (all experiments, for analytics purpose)
SDK (Client Libraries)
Standardized toolkits for creating and maintaining analytics instrumentation, available for:
- JavaScript: For client-side instrumentation in web browsers
- PHP: For server-side instrumentation in MediaWiki
- Kotlin: For Android app instrumentation
- Swift: For iOS app instrumentation, in the future
The SDKs provide methods for:
- Submitting events to event streams
- Checking experiment enrollment and assignment
- Creating instruments and managing schemas
- Interacting with EventGate via EventLogging/EventBus
Superset
Analysis platforms where experiment results from automated analytics are visualized. Results available at https://superset.wikimedia.org/superset/dashboard/experiment-analytics/ Experiment results update hourly during active experiments.
Data Pipeline Components
EventGate
The event intake service that receives, validates, and routes analytics events from instrumentation to the data lake. EventGate:
- Validates events against their declared schemas
- Enriches events with HTTP header data (when configured)
- Routes events to appropriate Kafka topics
- Ensures events reach Hive for storage
MediaWiki Extensions
Extension:TestKitchen
The core extension that integrates Test Kitchen with MediaWiki. (Formerly named Extension:TestKitchen.)
Responsibilities:
- Fetches instrument and experiment configuration from Test Kitchen API
- Acts as the enrolment authority for logged-in experiments
- Provides PHP SDK methods for server-side instrumentation
- Manages caching of Test Kitchen configuration in the WAN Cache
Caching strategy:
- Prioritizes content delivery speed over configuration freshness
- Asynchronously fetches configuration from Test Kitchen API every minute
- Returns empty response if no cached response available, which disables all Test Kitchen-managed instrumentation, ensuring site performance is never degraded by instrumentation issues
The TestKitchen extension fetches configuration from:
- https://test-kitchen.wikimedia.org/api/v1/instruments
- https://test-kitchen.wikimedia.org/api/v1/experiments?format=config&authority=mediawiki
Extension:EventLogging
Handles submission of analytics events to EventGate.
Function:
- For JavaScript instrumentation: submits events directly to EventGate
- For PHP instrumentation: defers to EventBus for submission
- Uses EventStreamConfigs to determine event stream details
Extension:EventBus
Handles server-side event submission for PHP instrumentation.
Function:
- Receives events from EventLogging (when instrumentation is in PHP)
- Submits events to EventGate
- Works in conjunction with EventStreamConfigs
Extension:EventStreamConfigs
Provides event stream configuration details to other extensions.
Function:
- Stores and serves event stream configurations
- Consumed by EventLogging and EventBus to determine where events should be sent
- Configurations can be dynamically created by TestKitchen extension for Test Kitchen instruments
Extension:WikimediaEvents
Common location for analytics instrumentation code.
Usage:
- Many product teams write their experiment instrumentation here
- Provides shared ResourceLoader modules for analytics
- Alternative: teams can instrument in their own product codebases
GrowthBook JS SDK Analysis
One piece of code that is very interesting to us is the GrowthBook JavaScript SDK. This SDK has multiple functions, all of which are interesting to us. An analysis of the code will be conducted collaboratively here: Test Kitchen/Architecture/GrowthBook_JS_SDK_Analysis
Flows
Configuration Flow
How experiment configuration travels from Test Kitchen UI to enrollment authorities
1. Experiment Owner
↓ Creates/configures experiment in UI
2. Test Kitchen UI
↓ Stores configuration in database
↓ Exposes via REST API
3. Enrollment Authorities fetch config:
Varnish Path:
├─ Polls API every ~1 minute
├─ Fetches: /api/v1/experiments?format=config&authority=varnish
├─ Caches config locally on each cache node
└─ Requires 24hr advance notice for propagation
MediaWiki Path:
├─ TestKitchen polls API every ~1 minute
├─ Fetches: /api/v1/experiments?format=config&authority=mediawiki
├─ Caches in WAN cache
└─ Config updates propagate within minutes
4. Ready for Enrolment
└─ Authorities can now enroll users and assign groups
Enrollment Flow
How users get enrolled and assigned to experiment groups:
1. User Visits Wiki
↓
2. Enrollment Authority Checks User
Everyone Experiment (Varnish):
├─ Checks for wmf-uniq cookie (Edge Unique ID)
├─ If no cookie, creates one
├─ Hashes cookie + experiment name → subject ID
├─ Checks if subject ID falls into a bucket → enrolled
├─ If enrolled, uses bucket to determine experiment group
└─ Adds enrollment info to internal HTTP header for downstream services
Logged-in Experiment (MediaWiki):
├─ Checks Central Auth ID
├─ Hashes ID + experiment ID → subject ID
├─ Checks if subject ID falls below a threshold → enrolled
└─ If enrolled, uses subject ID to determine experiment group
3. TestKitchen extension combines enrollment info and makes it available to feature code
4. Feature Code Executes
├─ Checks: experiment.isAssignedGroup('treatment_name')
├─ Executes variant logic based on assignment
└─ User sees appropriate variant
5. Enrollment Persists
Everyone Experiment:
└─ Enrollment tied to cookie lifetime
└─ User may switch groups if cookie cleared
Logged-in Experiment:
└─ Enrollment tied to Central Auth ID
└─ Consistent across all wikis and sessions
Data Flow
How analytics events travel from user interactions to analysis dashboards:
1. User Interacts with Feature
↓ Triggers instrumented action
2. SDK Captures Event
Client-side (JavaScript):
├─ experiment.send('action', {data})
└─ Creates event with experiment metadata
Server-side (PHP):
├─ experiment->send('action', [data])
└─ Creates event with experiment metadata
3. Event Contains:
├─ experiment.name (e.g., "larger-default-font-size")
├─ experiment.enrolled (true/false)
├─ experiment.assigned (e.g., "x-large" or "control")
├─ experiment.subject_id
| ├─ Everyone Experiments: "awaiting"
| └─ Logged-in Experiments: subject ID
├─ action (e.g., "page-visited")
├─ contextual attributes (performer info, wiki, page, etc.)
└─ interaction data (custom fields)
4. EventLogging/EventBus
↓ Submits event to EventGate
5. EventGate
├─ Validates against schema
├─ Enriches with headers (if configured)
| └─ Everyone Experiments: Replaces "awaiting" with subject ID from enrollment authority
└─ Routes to Kafka topic
6. Hive
├─ Events land in tables (e.g., product_metrics.web_base)
└─ Available for querying within ~2.5 hours
7. Automated Analysis
├─ Queries Hive for experiment events
├─ Computes metrics (clickthrough rate, retention, etc.)
├─ Runs statistical analysis (Bayesian + frequentist)
├─ For everyone experiments: splits by logged-in status
└─ Updates every hour during experiment
8. Superset
└─ Displays results to experiment owners
├─ Metric values per group
├─ Statistical significance
├─ Confidence intervals
└─ Sample sizes