Metrics Platform/Decision Records/Deprioritize Custom Data

From Wikitech

Status: Accepted

Author: Sam Smith

Deciders: Sam Smith, Clare Ming, Marcel Forns, Morten Warncke-Wang

Consulted: Product Analytics

Informed:

Date authored: 2023-08-01

Date decided: 2023-08-01

Technical story: T343311: [EPIC] Create Core Interactions schemas/schema fragments

Keywords: monoschema, multischema, schemas, schema fragments, fragments, core interactions, interactions

Context and Problem Statement

Per T343311: [EPIC] Create Core Interactions schemas/schema fragments:

The goals of eliminating schema creation during the creation of an analytics instrument and having a data contract between a data producer through to a Data Scientist are irreconcilable. The first attempt at eliminating schema creation was to create the analytics/mediawiki/metrics_platform schema (AKA the monoschema). The monoschema contains many optional "Core Fields" – properties that are used relatively frequently in analytics instruments – and a custom_data property, which can contain arbitrary amounts of data that don't fit into those Core Fields. Using the monoschema in your instrument means that you don't have to create a new schema but also that there's no data contract between your instrument (the data producer) and the Data Scientist.

Decision Drivers

Per the above, Data Engineering (represented by Marcel Forns) and Product Analytics (represented by Morten Warncke-Wang).

Considered Options

  1. Continue using the monoschema and onboard more feature teams to the client libraries
  2. Create a schema and/or a set of schema fragments that represent most interactions, while maintaining the existing monoschema and API in the client libraries
  3. Create a schema and/or a set of schema fragments that represent most interactions and immediately sunset the monoschema

Decision Outcome

We (see above) chose to explore option 2.

Positive Consequences

  1. The requirement from Data Engineering and Product Analytics that there is a contract between data producer and data consumer(s) is satisfied
  2. At least 50% of in-production instruments can be covered by a simple schema
  3. We can continue to use those parts of the monoschema that aren't the custom_data property

Negative Consequences

  1. The work that went into implementing the custom_data property is lost
  2. There is a loss of trust between the Metrics Platform team and the teams that they are already working with

Links

-

Additional Comments

-