Jump to content

Metrics Platform/Create an instrument

From Wikitech

This guide helps you create a Metrics Platform instrument from start to finish, including:

  1. Creating a measurement plan
  2. Creating an instrumentation specification
  3. Following the data collection guidelines
  4. Configuring the stream
  5. Coding the instrument
  6. Documenting the instrument
  7. Reviewing your data
  8. Decommissioning the instrument

Create a measurement plan

Instruments collect data about user interactions so that we can answer questions about product experiences. Before you can start collecting data, create a measurement plan (template) that documents what data you plan to collect, why, and how you plan to analyze the data.

You can write your measurement plan in a document or on a Phabricator task, depending on the scale of the project. For examples of measurement plans, see the folder on Google Drive.

Create an instrumentation specification

Once you have a measurement plan, the next step is to create an instrumentation specification (template). The instrumentation spec defines all the data you'll collect for your instrument. The spec is also a useful tool for engineers to ensure that all events are being produced and received correctly. For a template and examples of instrumentation specs, see the folder on Google Drive.

Specifying event data

Start by mapping each metric defined in your measurement plan to one or more events. An event is a data object that represents a user interaction happening at a definite time.

For example:

Metric Events
Proportion of pages that are read, measured by users scrolling at least halfway down the article User loads page
User scrolls down 50%

Once you have a set of events, define the data that each event should send based on the standard set of properties supported by the Metrics Platform. Read the following sections to learn about what event data is available and how to use it.

Here's an example of an event data specification for a "User scrolls down 50%" event:

Interaction data:

  • action: set to "scroll" in the instrument
  • action_subtype: set to either "up" or "down" in the instrument
  • action_context: percent of the page where the user completes scrolling (example: "0.5" for 50%), set in the instrument
  • page_content_language: populated by the Metrics Platform
  • page_title: populated by the Metrics Platform
  • mediawiki_skin: populated by the Metrics Platform
  • performer_pageview_id: populated by the Metrics Platform

The action property

Each instrument is required to set the action property when submitting an event. For click interactions, the value of action should be set to "click". For other types of interactions, you can choose a custom value for action, such as "session_init".

Interaction data

In addition to action, Metrics Platform supports interaction data that you can customize to fit the needs of your instrument. You can set these properties to any meaningful value to provide the data required for your event.

Contextual attributes

Contextual attributes are fields in the event data that provide information about the performer who triggered the event and the wiki where the event occurred. The values of contextual attributes are populated automatically by Metrics Platform when the event is generated. While each client includes a few contextual attributes automatically, most attributes must be enabled in the instrument's event stream configuration. For a list of available attributes, see Contextual attributes.

Choosing a schema

Every instrument must designate a schema that will be used to validate the event data. Most instruments should use one of the Metrics Platform base schemas:

If your event requires data that is not supported by the base schemas, you can create a custom schema for your instrument.

Follow the data collection guidelines

All data collection activities must follow the data collection guidelines. Once you've identified the applicable risk tier, you can use your measurement plan and instrumentation spec to complete the steps in the guidelines under "What should WMF teams do next?".

Configure the stream

Once you've completed the necessary steps in the data collection guidelines, you're ready to launch your instrument. The first step to launching an instrument is configuring the event stream where the events will be published. See the stream configuration guide.

Code the instrument

You can write your instrument code in the WikimediaEvents extension or in your product codebase. See the API docs to learn how to code an instrument.

You should write instrument code in the WikimediaEvents extension unless your codebase is clearly intended for use only by Wikimedia. This is because MediaWiki and other Wikimedia-maintained codebases can be installed by third-party wiki admins. Those admins shouldn't have to install instruments and supporting code (the EventLogging, EventStreamConfig, and EventBus extensions) in order to get a feature to work.

Document the instrument

Now that your instrument is active, complete these steps to document your instrument:

  • Publish a summary of your instrument on wiki: Create a wiki page that summarizes your measurement plan and the data collected by your instrument. This helps provide transparency to the wider community about the data that WMF collects. You can create this page as a subpage of your codebase or project page. For example, see mw:Extension:WikiLambda/Metrics.
  • Update the instrument list: Add your instrument to the instrument list with links to documentation.
  • Make your instrument documentation discoverable: Your measurement plan and instrumentation spec are important resources for data analysts and code maintainers. To help people find these documents, link to them from your codebase wiki page, README, project page, DataHub page, or other frequently used documents. Duplicated documentation is more likely to be outdated, so always try to maintain a single source of information.

Review your data

The Event Platform has documented the process for viewing and querying events in both Beta and Product environments. This is the best current method to check that your instrument is generating events as expected.

Decommission the instrument

While instruments that measure product health may be long lived, instruments that capture metrics related to an experiment should be disabled once the experiment is complete. See Decommission an instrument.