Test Kitchen/Custom Data Monitor
This page documents the process for measuring metrics related to the adoption of Test Kitchen schemas.
Quickstart
Background
schemas/event/secondary stores the JSON Schema schemas for validating events submitted by analytics instrumentation. It also stores the Metrics Platform per-platform base schemas (herein "the MP base schemas") and the Legacy EventLogging base schema.
Experiment Platform wants to track the number of high-level metrics about the schemas used for analytics instrumentation over time. Those high-level metrics are:
- The number of schemas
- The number of schemas that include the MP base schemas
- The average number of so-called "custom data" per schema
Here Experiment Platform defines custom data as any property defined in the schema that isn't in either the Test Kitchen base schemas or the Legacy EventLogging base schema.
Experiment Platform runs the script and updates the spreadsheet roughly once per quarter. The work is tracked in Phabricator. Since the work is routine, they have a template task, which can be duplicated whenever it needs to be done. Clicking the button at the top of the page will create such a duplicate task.
Links
- Custom Data Monitor Report spreadsheet
- https://gitlab.wikimedia.org/repos/data-engineering/custom-data-monitor
- T354965: [Epic] SDS 2.5 Establish baselines for Metrics Platform & Experimentation success indicators
- T356610: [SPIKE] Determine how to capture number of instruments developed total and via Metrics Platform over time
- T356610: Write a script to capture custom data properties counts in secondary schemas