Metrics Platform/How to/Create a Custom Schema

From Wikitech

Metrics Platform Core Interaction Schemas

Metrics Platform maintains a set of core interactions schemas by platform. These schemas provide the basis for capturing data from common use case events (i.e. click events, etc).

The API section of the Implementations page elucidates the current submit methods that are available for use in client code. These base API methods will grow over time as common event types are added and incorporated into the client libraries. Usage of these common event type submit methods do not require the creation of custom schemas.

Custom Data

If prospective instruments require custom data properties that are not able to be captured using the current base API methods and their corresponding Metrics Platform owned schemas, custom schemas can be created as needed by referencing the Metrics Platform base schemas as fragments with custom data items set as top-level properties.

This is the preferred way to create custom schemas using Metrics Platform base schemas because common metrics can be included with every event pending production configuration of a given instrument.

Top Level Properties

By passing in custom data objects as parameters into available submit methods, each of the client libraries parses custom data as top-level properties to be submitted with an event. A schema id is a required parameter in this case if custom data is passed in from client code.

Formatting Requirements

Custom data in this context must be passed in as key-value pairs with the key formatted as a string which serves as the name of the custom data property, and its corresponding value type being one of the currently allowed enums:

  • String
  • Integer
  • Boolean
  • Null

Note that any custom data value submitted that does not conform to one of the allowable enums will log an error in the client library and it will be omitted from the event which could result in the event being invalidated if the said custom data property is required.

Custom Schema Creation

Event Platform Schema Guidelines explain how to properly include fragments in concrete schemas.

Here is an example of a patch (currently not yet merged into production) that introduces custom concrete schemas referencing the app base schema: https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/974674

Examples of custom schemas using Metrics Platform base schema fragments will be included here once some are in production.

Stream Configuration

If a custom concrete schema is created with a Metrics Platform base fragment, and custom data is passed into an API submit method along with the custom schema id and stream name, there needs to be production stream configuration that specifies the schema title and associates it with the stream name, as well as identify the event names and/or event name prefixes that will enable an instrument to submit the event to its corresponding stream.

The submitInteraction() method that serves as the base API method that all common event type submit methods will call takes stream name and schema id as required parameters.

See Metrics Platform/Creating a Stream Configuration for more information on how to declare a Metrics Platform stream.

Ownership

While Metrics Platform will own and maintain the base core interactions schemas, product and feature teams who create custom concrete schemas with references to Metrics Platform base schemas or fragments for their instruments are responsible for owning, updating, and maintaining them.

Metrics Platform base core interactions schemas currently reside in https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/product_metrics/ while the Metrics Platform fragments these base schemas reference live in https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/fragment/analytics/product_metrics/

Namespacing

As a convention, it is recommended that wherever a product/feature team keeps their schemas in the secondary repository, a product_metrics directory is created for holding these custom schemas that make use of the Metrics Platform base schemas and fragments.

During onboarding to Metrics Platform, a product/feature team will typically port an existing instrument to submit events via Metrics Platform in parallel with how that instrument currently submits events in order to perform data parity checks. By placing custom schemas in a product_metrics directory alongside where the current schema resides helps organize Metrics Platform-based schemas until complete adoption/migration is undertaken. Note that these custom schemas should be considered owned by the product team.