Metrics Platform/Decision Records/Deprioritize Custom Data
Status: Accepted
Author: Sam Smith
Deciders: Sam Smith, Clare Ming, Marcel Forns, Morten Warncke-Wang
Consulted: Product Analytics
Informed:
Date authored: 2023-08-01
Date decided: 2023-08-01
Technical story: T343311: [EPIC] Create Core Interactions schemas/schema fragments
Keywords: monoschema, multischema, schemas, schema fragments, fragments, core interactions, interactions
Context and Problem Statement
Per T343311: [EPIC] Create Core Interactions schemas/schema fragments:
The goals of eliminating schema creation during the creation of an analytics instrument and having a data contract between a data producer through to a Data Scientist are irreconcilable. The first attempt at eliminating schema creation was to create the
analytics/mediawiki/metrics_platform schema
(AKA the monoschema). The monoschema contains many optional "Core Fields" – properties that are used relatively frequently in analytics instruments – and acustom_data
property, which can contain arbitrary amounts of data that don't fit into those Core Fields. Using the monoschema in your instrument means that you don't have to create a new schema but also that there's no data contract between your instrument (the data producer) and the Data Scientist.
Decision Drivers
Per the above, Data Engineering (represented by Marcel Forns) and Product Analytics (represented by Morten Warncke-Wang).
Considered Options
- Continue using the monoschema and onboard more feature teams to the client libraries
- Create a schema and/or a set of schema fragments that represent most interactions, while maintaining the existing monoschema and API in the client libraries
- Create a schema and/or a set of schema fragments that represent most interactions and immediately sunset the monoschema
Decision Outcome
We (see above) chose to explore option 2.
Positive Consequences
- The requirement from Data Engineering and Product Analytics that there is a contract between data producer and data consumer(s) is satisfied
- At least 50% of in-production instruments can be covered by a simple schema
- We can continue to use those parts of the monoschema that aren't the
custom_data
property
Negative Consequences
- The work that went into implementing the
custom_data
property is lost - There is a loss of trust between the Metrics Platform team and the teams that they are already working with
Links
-
Additional Comments
-