Jump to content

Test Kitchen/GrowthBook user guide

From Wikitech
For users with CustomElevatedAccess role: When viewing an experiment's results, please DO NOT click the Update button if the start date of the experiment was more than 90 days ago as we do not retain raw experiment data longer than 90 days. Triggering an update when some or all of the underlying data has been deleted will overwrite the saved results with incomplete results (if some data still exists) or erase the saved results entirely.

This page provides guidance and resources for learning how to use Test Kitchen's GrowthBook installation (growthbook.wikimedia.org) for exploring and managing experiments.

What is GrowthBook?

GrowthBook is an open source feature flagging and experimentation software integrated into Test Kitchen (but we are only using it for experimentation). Currently our installation only allows analysis of imported experiments that were configured in Test Kitchen UI, but the Experiment Platform team is working to make it the way to configure experiments.

Our installation of GrowthBook is entirely on-premises – GrowthBook queries our Data Lake using our Presto cluster and then analyzes the data in a statistics engine running on our servers.

How do I…?

Access GrowthBook

Refer to Test Kitchen/GrowthBook user guide/Access for a comprehensive guide on this topic.

View and understand experiment results

Refer to docs.growthbook.io/using/experimenting#understanding-results for a comprehensive guide on this topic.

Our GrowthBook installation calculates and reports Bayesian results by default and we do not recommend switching to frequentist. You can learn more about our rationale at Test Kitchen/Decision Records/Keep Bayesian engine as default in GrowthBook.

Share results

If you use the built-in Share feature, know that only authorized users will be able to view the results even if you select "Anyone with the link" under "View access" because only authorized users can access growthbook.wikimedia.org.

The Experiment Platform team is considering solutions for easily sharing/publishing experiment results. In the meantime, you may use the Export CSV feature to download the experiment results. Please follow the data publication guidelines if you wish to share the results publicly.

Dig deeper with dimensions

GrowthBook lets you break down results by a dimension, which you can select under Unit Dimension dropdown. We currently support the following dimensions:

  • project_family (e.g. "Wikipedia")
  • language (e.g. "English", "French")
  • wiki_id (e.g. "enwiki", "frwiki")
  • wiki_name (e.g. "English Wikipedia", "French Wikipedia")
  • user_platform ("Desktop" vs "Mobile")
  • user_auth_status_first_exposure: the experiment subject's authentication status ("Permanent user", "Temporary user", and "Logged-out user") when they were first exposed to the experiment, as logged by the experiment
  • user_auth_status_highest_observed: the highest authentication status observed for the experiment subject while they were in the experiment, where Permanent user > Temporary user > Logged-out user
    • Logged-out user: the subject did not save any edits while holding the enrolled wmf-uniq cookie.
    • Temporary user: the subject saved an edit, and a temporary account was automatically created for them.
    • Permanent user: the subject either logged in to an existing account or the temporary user converted to a permanent by registering an account.

To request more dimensions contact the Experiment Platform team's Product Manager.

Compare treatments

For experiments with multiple treatments (e.g. A/B/C tests) results are initially shown relative to control (no treatment). To compare one treatment against another when viewing results of a multi-treatment experiment:

This is especially useful if the tested treatments have treatment-only metrics that are not applicable to the control group, such as when a new feature (rather than a variation on an existing feature) is being introduced.

Configure, start, and analyze an experiment

Refer to Test Kitchen/GrowthBook user guide/Configuring experiments for a comprehensive guide on this topic.

Assess experiment's health

A report on the experiment's health will only be available once the experiment has been analyzed using at least one metric. GrowthBook automatically performs a number of data quality checks, which you can learn more about at docs.growthbook.io/using/experimenting#health-page.

Experiments with very large sample sizes (e.g. 10M users) are likely to have Sample Ratio Mismatches (SRM) detected even when the split appears to be nearly perfect. The very large sample size shrinks the p-value of the SRM check so much that it produces a false positive.

Stop an experiment

Most experiments should not be longer than 30 days and Test Kitchen automatically stops an experiment after 90 days. (Data is automatically deleted after 90 days due to the data retention guidelines.)

When you are ready to stop your experiment, there are two ways:

  • Stop Experiment button is our recommended way because you have to pick a Conclusion (Did Not Finish, Inconclusive, Lost, Won) and write a short blurb. Remember: there are no losers in A/B testing, only learnings.
  • ⋮ → Edit status link lets you change the status from Running to Stopped. The experiment will have status "Stopped: Awaiting decision" and you will be able to pick a conclusion and write a short blurb at a later time by visiting the experiment page and clicking the pencil icon in the "Experiment stopped" bar.

Know when to stop the experiment

Refer to docs.growthbook.io/using/experimenting#deciding-ab-test-results.

Our installation uses the Experiment Decision Framework feature to generate recommendations of what decisions to make when the experiment has collected enough data for the results to be reliable:

  • ship the treatment when it is clear that the treatment won
  • roll back when it is clear that the treatment lost
  • review the results

If the experiment has not reached target power, the system estimates how many more days are needed until the results are reliable. The estimation is based on target MDEs of the goal metrics, how many goal metrics there are (the fewer the better), and how many new subjects the experiment acquires daily.

Define a new metric

Refer to Test Kitchen/GrowthBook user guide/Metrics for a comprehensive guide on this topic.

Organize information and assets

GrowthBook has two systems for organizing information: Projects and Tags. The Experiment Platform team reserves the use of Projects, and there is just one: Wikimedia. All assets (fact tables, metrics, and experiments) must be under the Wikimedia project.

We currently do not have any tagging guidelines – all users of the system are welcome to create and use Tags how they please, bearing in mind that this is a shared system.

If tags get too messy, we will review and revise our approach to tagging.

See also