Experimentation Lab/Automated analysis of experiments/Converting queries for product health monitoring
If you conducted an experiment with Experimentation Lab, analyzing it with the automated analytics system, and decided to ship the tested treatment to all users, you may be interested in continuing to measure the metrics you measured during the experiment.
Currently we do not have an automated analytics system for product health monitoring like we have for experiments. We are exploring options in this problem space as part of SDS 2 objective in FY25/26.
In the meantime, this guide will help you get started with a custom solution that you will own. This guide assumes that the necessary steps have been followed to convert the instrumentation used in the experiment to baseline instrumentation that is collecting data on an ongoing basis.
Suppose we wanted to have a Superset dashboard for monitoring the metric we care about.
Measurement options
Between the data being collected by an instrument and a metric that is visualized in Superset is an important step: data processing/wrangling. You need to decide how you want to turn interaction data into measurements of a metric. There are two options, each with trade-offs that must be considered.
Option 1: Presto-based dashboard on top of raw interaction data (recommended)
Positives:
- Quick to set up, assuming you know how to query the data lake using Presto SQL.
- Easy to make small adjustments to how the metric is calculated or change/add dimensions.
- No maintenance costs.
Negatives:
- Metrics are calculated on-demand whenever dashboard is accessed. Performance can deteriorate with a large enough table. If you are collecting a lot of data, you may run into timeouts when you load the dashboard.
- Due to the Foundation's data retention policy, you are limited to 90 days of data. Beyond that you would need to use the sanitization system.
- If queries are updated to get data from both
eventandevent_sanitizeddatabases, same performance considerations as with large volume of data. - There are other considerations, so please refer to the section below.
- If queries are updated to get data from both
Option 2: Airflow-based data pipeline, Superset on top of aggregated data
Positives:
- Dashboard loads quickly because calculations are offloaded to a pipeline rather than being done on-demand, so the dashboard is just reporting precalculated metrics.
- Easy to backfill.
- Can publish public version of the dataset to share with volunteers, since community does not have access to Superset.
Negatives:
- Steep learning curve for getting started with Airflow if you have not used it before.
- Somebody needs to own/maintain the data pipeline.
- Need to determine dimensions ahead of time.
Converting a metric from metrics catalog
Let us pick one of the metrics from the metrics catalog – say, Clickthrough to a Page History page per Watchlist visit – and convert the query for use in our Superset dashboard. This guide assumes you have chosen our recommended option 1 and that the instrumentation is staying the same (not deviating from the instrumentation specification that corresponds to this metric definition).
The logic of the query is:
- For each subject (user), count how many Watchlists visits had edits shown (something to click on) and count how many of those visits resulted in at least one click on a Page History link.
- Calculate the CTR for each subject (user).
- Average across all subjects' (users') CTRs in the group to get the average CTR for the group.
Thus, we have an CTR for the control group and the treatment group.
That logic is formalized in the query template for that metric is:
WITH per_page_interactions AS (
SELECT
IF(experiment.assigned = 'control', 'control', 'treatment') AS variation,
experiment.subject_id,
performer.pageview_id,
SUM(IF(action = 'page-visited', 1, 0)) > 0 AS could_click,
CAST(SUM(IF(action = 'click', 1, 0)) > 0 AS INT) AS clicked_through
FROM {table}
WHERE action_source = 'Watchlist'
AND (
(
action = 'page-visited'
AND GET_JSON_OBJECT(action_context, '$.hc') = 'y'
)
OR (
action = 'click'
AND GET_JSON_OBJECT(action_context, '$.link') = 'page-history'
)
)
AND instrument_name = 'RCClickTracker'
AND {where_boilerplate}
GROUP BY 1, 2, 3
)
SELECT
variation,
subject_id,
AVG(clicked_through) AS outcome
FROM per_page_interactions
WHERE could_click
GROUP BY 1, 2
This is a metric used in an experiment conducted on logged-in users. We would need to make sure that the baseline instrument collects performer_name contextual attribute because we will use that in place of a subject ID (derived from the MediaWiki user's central ID during the experiment).
Suppose we wanted to calculate the average CTR for each day. We can modify the query template above to the following query:
WITH per_page_interactions AS (
SELECT
CAST(FROM_ISO8601_TIMESTAMP(meta.dt) AS DATE) AS dt,
-- ^ or something to this effect to convert timestamp to a day
performer.name AS username,
performer.pageview_id,
SUM(IF(action = 'page-visited', 1, 0)) > 0 AS could_click,
CAST(SUM(IF(action = 'click', 1, 0)) > 0 AS INT) AS clicked_through
FROM event.mediawiki_product_metrics_rc_watchlist_baseline
WHERE action_source = 'Watchlist'
AND (
(
action = 'page-visited'
AND GET_JSON_OBJECT(action_context, '$.hc') = 'y'
)
OR (
action = 'click'
AND GET_JSON_OBJECT(action_context, '$.link') = 'page-history'
)
)
AND instrument_name = 'RCClickTracker'
GROUP BY 1, 2, 3
), per_user_ctr AS (
SELECT
dt,
username,
AVG(clicked_through) AS ctr
FROM per_page_interactions
WHERE could_click
GROUP BY 1, 2
)
SELECT
dt,
AVG(ctr) AS ctr
FROM per_user_ctr
GROUP BY 1
Notable differences:
- Replaced
variationwith a date-typedtcolumn for calculating average CTR within each day. - Replaced
experiment.subject_idwithperformer.name. - Added an aggregation step – original query is intended to produce a set of per-user outcomes, but here we want to average across all users to produce a single measurement of the CTR for each day.
Long term monitoring
As noted above, if you want to see how the metric behaves beyond the most recent 90 days, you will need to either create a data pipeline that produces a dataset of measurements or you will need to sanitize the interaction data and adjust your Superset dashboard's Presto queries to also query the sanitized event database.
If you think you might utilize the event sanitization system (to retain interaction data in perpetuity), you should configure a separate stream for the data you are collecting. Currently we have a default stream product_metrics.web_base (corresponding table: event.product_metrics_web_base) that is shared by different instruments and experiments. We have no plans to start sanitizing that table and retaining its data beyond 90 days.
By configuring a separate stream, you will be able to create an entry for it in the allowlist and have fine control over which columns to keep/drop/hash.
Creating a Presto-based dashboard in Superset
Steps for creating a simple chart for use in a dashboard:
- Write query in SQL Lab. Be sure to use
presto_analytics_hivedatabase, notpresto_analytics_icebergand pickeventschema. - Click RUN. Your query needs to successfully produce a set of results.
- Click CREATE CHART. You will be taken to a chart creation page where you can select type of chart (e.g. line chart).
- Once you are satisfied with your chart, click SAVE. You will need to name your dataset and your chart. You can at this point add the chart to an existing dashboard or create a new dashboard.
The dataset will be a virtual dataset, created from the query. Whenever the dataset is accessed – for example, when opening the dashboard – Superset will execute the query. Multiple charts can use the same virtual dataset but Superset will not execute the query once and reuse the results in each chart. Instead, Superset will execute the query for every chart. This is where the point about performance and volume of data and complexity of the query becomes very important.