Test Kitchen/Automated analysis of experiments/Prepared metrics
In some cases you may not be able to write a query template that produces the necessary columns (variation, subject_id, and outcome), but you are able to produce the quantities required for analysis: a sample_size, sample_mean, and sample_variance for each variation (group in experiment: "control" and "treatment").
The prepared_mean and prepared_proportion metric types tell the analysis system to not perform the calculations it does for the mean and proportion metric types, because those calculations are handled by the query_template.
Example: Fundraising
Scenario: Suppose we have instrumented clicks on the Donate link/button and we want to know if our treatment causes more donations to be made. Not just clicks on the link/button (which indicate intent to donate), but actual donations (end of funnel). In that case, we could instrument donations in such a way where:
- For privacy reasons, we can only pass experiment name and group membership to where the user starts the donation process.
- This is no different than passing a campaign parameter when running A/B tests of fundraising banners.
- When a donation has successfully been made through the payment processor and the donor is taken to the Thank You page, produce an event to the external analytics endpoint.
- Preferably server-side, but there's an argument to be made for client-side for consistency with click instrumentation.
The event (adhering to version 1.4.2 of base web schema) could have data like:
{
"$schema": "/analytics/product_metrics/web/base/1.4.2",
"meta": {
"stream": "product_metrics_web_base"
},
"action": "donate",
"action_subtype": "success",
"instrument_name": "Donation processor",
"experiment": {
"coordinator": "xLab",
"enrolled": "{experiment_machine_name}",
"assigned": "{group_machine_name}
}
}
Notice that this event does not have experiment.subject_id.
Example prepared_proportion query
Suppose that with our experiment we are interested in impact of treatment on the proportion of users who click on the donate link/button (demonstrate intent) and then go on to actually donate (demonstrate committment).
In this case, the samples are actualy subsets of the overall samples, because only a small proportion of the overall subjects in the experiment click on the donate link/button.
We can also assume that any given subject in the experiment is highly unlikely to donate more than once. They might click on the link/button more than once during the experiment, but they will not donate more than once.
Because {table} is specified at experiment level, we need to make sure that click events from client-side instrumentation (on the donate link/button) are flowing into the same stream as donate events from server-side instrumentation (on the donation process).
WITH
clicked_counts AS (
SELECT
IF(experiment.assigned = 'control', 'control', 'treatment') AS variation,
COUNT(DISTINCT subject_id) AS clicked_count
FROM {table}
WHERE action = 'click'
AND instrument_name = 'ClickThroughRateInstrument'
AND element_friendly_name = 'Donate link'
AND {where_boilerplate}
GROUP BY 1
),
donated_counts AS (
SELECT
IF(experiment.assigned = 'control', 'control', 'treatment') AS variation,
COUNT(1) AS donated_count
FROM {table}
WHERE action = 'donate'
AND action_subtype = 'success'
AND instrument_name = 'Donation processor'
AND {where_boilerplate}
GROUP BY 1
),
stats AS (
SELECT
cc.variation,
cc.clicked_count AS n,
1.0 * dc.donated_count / cc.clicked_count AS p
FROM clicked_counts cc
LEFT JOIN donated_counts dc ON cc.variation = dc.variation
)
SELECT
variation,
n AS sample_size,
p AS sample_mean,
p * (1-p) AS sample_variance
FROM stats
Example prepared_mean query
Suppose we had donation events like
{
"$schema": "/analytics/product_metrics/web/base/1.4.1",
"meta": {
"stream": "product_metrics_web_base"
},
"action": "donate",
"action_subtype": "amount",
"action_context": "123.45",
"instrument_name": "Donation processor",
"experiment": {
"coordinator": "xLab",
"enrolled": "{experiment_machine_name}",
"assigned": "{group_machine_name}
}
}
and included the donation amount (in USD) in action_context, and our experiment was to see if the treatment we are testing causes donors to donate more money on average. Again, for privacy reasons we would not have subject_id available in this data, so we would have to prepare the statistical quantities inside the query. Again, we also have to assume that each event is coming from a unique subject in the sample – which is fair to assume, since donors are unlikely to donate twice in a short period of time (e.g. duration of an experiment).
WITH donated_amounts AS (
SELECT
IF(experiment.assigned = 'control', 'control', 'treatment') AS variation,
CAST(action_context AS DOUBLE) AS donated_amount
FROM {table}
WHERE action = 'donate'
AND action_subtype = 'amount'
AND instrument_name = 'Donation processor'
AND {where_boilerplate}
)
SELECT
variation,
COUNT(1) AS sample_size,
AVG(donated_amount) AS sample_mean,
VAR_SAMP(donated_amount) AS sample_variance
FROM donated_amounts
GROUP BY 1