Jump to content

Test Kitchen/Decision Records/Remove 24h requirement

From Wikitech

Decision Record: Remove 24-Hour Lead Time for Experiment Activation

Date: 27 October 2025

Context

The xLab UI currently requires experiments to be "turned on" 24 hours before their scheduled start date. This has created confusion and friction:

  • Users must understand that "turning on" an experiment doesn't actually start it—it only becomes active when both the start date is reached AND the experiment has been toggled on
  • Multiple teams have forgotten this step, blocking their experiments from running
  • The terminology is confusing with overlapping concepts: "active," "turn on/off," "activate," "started"

The 24-hour buffer was originally implemented as a conservative approach to ensure Varnish nodes have experiment configuration with ample lead time.

Technical Background

  • Varnish nodes fetch new configuration every minute and store it locally on disk. Network jitter exists, but 3 minutes is a reasonable propagation estimate based on observations from the A/A tests we completed in FY24/25 SDS2.4 and the first A/B tests run by teams.
  • There is a 14:30 UTC deployment window that already provides built-in lead time, and the 24-hour requirement is significantly more conservative than necessary.

Decision

Remove the 24-hour lead time requirement for experiment activation. Experiments can be turned on and start collecting data on the same day, provided the activation occurs before the 14:30 UTC deployment window.

Implications

User Experience

  • Simplified mental model: experiments can be started when needed (respecting the deployment window)
  • Reduced friction and forgotten activations
  • Better alignment with GrowthBook's approach (which has simple "Start Experiment" functionality)

Operational Constraints

  • Users must still respect the 14:30 UTC deployment window
  • If an experiment is turned on after 14:30 UTC (e.g., 15:05 UTC), it will need to wait until the next day's deployment window
  • The 3-minute propagation time for Varnish nodes remains acceptable

Future Considerations

  • Align with GrowthBook terminology when implementing SDS2.3
  • Investigate how GrowthBook models phase changes and whether event tagging adjustments are needed
  • Consider measuring Varnish propagation time directly

TODO

  1. Ticket to implement removal of 24-hour requirement T408233
    1. Update documentation to reflect new timing requirements
    2. Clarify that the deployment window constraint remains in effect
  2. Ticket to measure Varnish propagation time T408236