Metrics Platform/How to/Create First Metrics Platform Instrument

From Wikitech

Before starting with this guide, be sure that you already have Setup Mediawiki for Metrics Platform. That article shows you how to install MediaWiki and the needed extensions and configuration to work on anything related with this project.

The following are the steps to build an instrument:

Instrument documentation

Document the instrument

We should build in a sustainable manner and key to that is documentation!

Our expectation is that each instrument will be documented via a README.md kept with associated schema.

Multiple instruments can be associated with a single schema. A new instrument can be documented by adding the following to the README.md

  • [Team] <Instrument Name><Phabricator task><link to instrument patch or MR><link to schema patch or MR><Notebook if applicable>

Make a plan to decommission the instrument

As important as creating and operating your instrument is ensuring that you have a complete lifecycle plan for your instrument. Instruments should not, by default, be long lived.

Set a date

When creating a new instrument the responsible team and, specifically, the Product Owner must set a start and end date for operation of the instrument.

Schemas

Unlike past iteration the Metrics Platform(MP) has deliberately made schema creation more expensive. Before creating a new schema fragment teams need to think through what new data they need and why the information is not already captured by existing schemas. Creating a new schema should be an exception not an expectation.

We will publish a guide to data collection and schema design - for now you can consult directly with Data Products through our intake process.

At present, there is no need to remove schemas from the existing system in order to decommission an instrument. If you have a concern about an existing schema please don't hesitate to contact Data Products directly.

Remove Stream Configuration

The most straightforward path, at present, is to remove the newly added stream configuration code that you added under the Creating a Stream Configuration step. This will completely disable the instrument.

In future iterations of the Metrics Platform, we are planning to provide a Configuration and Experimentation Management application that will allow Product Owners to control their experiments without code deploys and this interface will support configuring instrument life spans.

Mapping strategy

Core interaction schemas

Metrics Platform maintains a set of core interactions schemas by platform. These schemas provide the basis for capturing data from common use case events (i.e. click events, etc).

The API section of the Implementations page elucidates the current submit methods that are available for use in client code. These base API methods will grow over time as common event types are added and incorporated into the client libraries. Usage of these common event type submit methods do not require the creation of custom schemas.

Creating a custom fragment/schema

The goal for the Core Interaction schema is to radically reduce schema creation, in general, and move our data capture toward consistency across teams. As such we want to be deliberate and conscientious when we decide to create a new schema fragment. At present, we are working on a Data Capture Guide to support teams in mapping their existing data fields to the Core Interaction schema. In the interim, please reach out to the Data Products team directly for support in deciding if a new schema fragment is needed.

If you are already certain that you will need a new schema fragment, please follow our guide for Creating a Custom Schema.

API

A Metrics Platform instrument is creating by calling one of the submit API methods provided by the MP library. These methods can take the following arguments:

  • streamName, a string, is required.
  • schemaId, a string, is optional/required depending on which submit method is used.
  • action, a string, is optional/required depending on which submit method is used.
  • InteractionData, a group of key/value pairs, is optional. If it's present, each key must be in snake case, using only lowercase letters, digits, and '_', and starting with a lowercase letter.

The available keys that can be used for InteractionData:

  • action_subtype
  • action_source
  • action_context
  • element_id
  • element_friendly_name
  • funnel_name
  • funnel_entry_token
  • funnel_event_sequence_position

Each value must be a string with the exception of funnel_event_sequence_position which is required to be an integer (this constraint is codified in the definition of common product metrics in the product_event YAML file. If additional types are supported in future, those types will be added there.)

Code the instrument

Javascript

Requirements

First, make sure you have a MediaWiki instance running as a Docker container

In addition to that, take a look at Metrics Platform/How to/Getting Started#Javascript to be sure that your working environment is configured with all you need to work on a Javascript instrument. And at Metrics Platform/How to/Getting Started#Recommended IDEs per language you can find some recommendations about which are the best tools to work on this.

Create the instrument

To create an instrument in JavaScript, you need to add code that calls one of the submit methods such as mw.eventLog.submitInteraction. The following code will fire events whenever an interwiki link is hovered over.

// 'a.extiw' will match anchors that have a extiw class.  extiw is used for interwiki links.
$( '#content' ).on( 'mouseover', 'a.extiw', function ( jqEvent ) {
	var link = jqEvent.target;
	var linkHoverInteractionData = {
		action_source: link.href,
		action_context: link.title,
	};

    var action = 'extiw.hover';
    var streamName = 'mediawiki.stream_name';
    var schemaId = '/analytics/product_metrics/web/base/1.1.0';
	mw.eventLog.submitInteraction( streamName, schemaId, action, linkHoverInteractionData );
} );

The same example using the submitClick method:

    ...
    var streamName = 'my.stream';
	mw.eventLog.submitClick( streamName, linkHoverInteractionData );
    ...

Now when you hover over a link, Metrics Client method mw.eventLog.submitInteraction or mw.eventLog.submitClick will be called. The Metrics Client will then construct an event and send it to the specified stream. The event will include the two interaction data elements you have provided in linkHoverInteractionData, as well as several contextual attributes that are specified with the stream's declaration (below).

Note that the first part of the event name, extiw, need not correspond to the anchor class. We've chosen to reuse that class name because of its brevity and its familiarity to the developers of this code.

In summary:

  • mw.eventLog.submitInteraction needs: (1) the stream name, (2) the schema ID, (3) the action (event name), and (4) the interaction data for the event.
  • mw.eventLog.submitClick needs only: (1) the stream name and (2) the interaction data for the event.

PHP

Requirements

First, make sure you have a MediaWiki instance running as a Docker container

In addition to that, take a look at Metrics Platform/How to/Getting Started#PHP to be sure that your working environment is configured with all you need to work on a Javascript instrument. And at Metrics Platform/How to/Getting Started#Recommended IDEs per language you can find some recommendations about which are the best tools to work on this.

Create the instrument

The Metrics Platform PHP interface is essentially the same as the JavaScript one. To send a server side event, you can use for example the EventLogging::getMetricsPlatformClient()->submitClick() }} function.

$interactionData = [

'action_source' => 'value for action source'

// ... Other interaction data fields

];

$streamName = 'my.stream';

EventLogging::getMetricsPlatformClient()->submitClick( $streamName, $interactionData );

Java

Requirements

First, take a look at Metrics Platform/How to/Getting Started#Java to be sure that your working environment is configured with all you need to work on a Java instrument. And at Metrics Platform/How to/Getting Started#Recommended IDEs per language you can find some recommendations about which are the best tools to work on this.

Create the instrument

Swift

Requirements

In addition to that, take a look at Metrics Platform/How to/Getting Started#Swift to be sure that your working environment is configured with all you need to work on a Swift instrument. And at Metrics Platform/How to/Getting Started#Recommended IDEs per language you can find some recommendations about which are the best tools to work on this.

Create the instrument

TBD

Configure the stream

Once you have your instrument built, you will next need to create and setup your stream configuration. Right now, we are working on a Configuration UI that will remove a lot of the manual work required, you can track our progress via our workboards.

For today, the next step is to follow the Creating a Stream Configuration guide.

Validate the events

During active development, it is important to be able to test that events can be validated by EventGate against a specified JSONSchema. Each of the client libraries contain unit and integration tests for ensuring that events are properly formatted and serialized for submission. However testing end-to-end is a valuable exercise in instilling confidence that events will be produced to Kafka and ensuing data is able to be consumed.

The basic process for testing end-to-end requires having a local EventGate instance. See Metrics_Platform#Quick_Start for development environment recommendations.

Javascript

Because the Javascript client library is provided through the MediaWiki EventLogging extension, having an instance of MediaWiki running in a local development environment is required. Once a change set is submitted as a patch to the EventLogging extension (presumably after corresponding changes have been merged in the Javascript client library), the following steps should be followed:

$wgEventStreams = [
  'test.metrics_platform.interactions' => [
    'schema_title' => '/analytics/product_metrics/web/base',
    'destination_event_service' => 'eventgate-analytics-external',
    'producers' => [
      'metrics_platform_client' => [
        'provide_values' => [
          'performer_is_logged_in',
          'mediawiki_skin',
        ],
      ],
    ],
  ],
];

// When $wgEventLoggingStreamNames is false (not falsy), the EventLogging
// JavaScript client will treat all streams as if they are configured and
// registered.
$wgEventLoggingStreamNames = [ 'test.metrics_platform.interactions' ];
mw.eventLog.submitInteraction('test.metrics_platform.interactions', '/analytics/product_metrics/web/base/1.0.0', 'init');
  • Observe that an event has been sent to, validated and accepted by the local EventGate instance.

Source: https://phabricator.wikimedia.org/T351293#9337205

PHP

The exact same process as for Javascript can be used.

Java

While the Java client library includes end-to-end integration testing from reading a local stream config data fixture to building a metrics client to submitting events and then stubbing expected responses, it does not ultimately provide a true end-to-end testing experience wherein production config is read and the metrics client submits an event to EventGate whereupon it is validated and accepted.

There are a few ways to confirm the validation of events. One method can be done with a local EventGate instance. Another can be accomplished by utilizing the EventStreams Beta API.

Validate using local EventGate

To emulate a true end-to-end testing scenario locally, we can create a test that instantiates a metric client which fetches production stream configs (from the meta api) and submits an event using a production stream associated with a production schema and have the event validated with a local EventGate instance.

By using production config, we can submit an event using an example stream:

@Test void submitEventTimerStreamConfig() throws IOException, InterruptedException {
        // Create the metrics client.
        MetricsClient testMetricsClient = MetricsClient.builder(testClientData).build();
        await().atMost(10, SECONDS).until(testMetricsClient::isFullyInitialized);

        testMetricsClient.submitInteraction(
                "android.metrics_platform.find_in_page_interaction",
                DataFixtures.getTestClientData(getExpectedEventClick()),
                DataFixtures.getTestInteractionData("TestClick")
        );
        Thread.sleep(10_000);
        await().atMost(30, SECONDS).until(() -> testMetricsClient.getEventQueue().size() == 0);
    }

In the example above, the MetricsClient::submitInteraction() method takes the sample production stream name as a parameter along with some test ClientData and test InteractionData. The default schema id is set as the app base schema inside the library.

Once the metrics client has fetched the production stream configs, it batch processes the event queue to send serialized events to the DestinationEventService which in this case should be set as a local EventGate instance.

Because the sample production stream android.metrics_platform.find_in_page_interaction adds only a few requested values in its producer config, we can prevent ContextController:enrichEvent() (which enriches the event with configuration-specified contextual values) from filtering the ClientData and simply have all available contextual values sent with the event by commenting out the lines the method that resets the ClientData.

The test above waits until the event queue is empty so the following method can be added to the MetricsClient class though it is not currently in the latest release:

public BlockingQueue<EventProcessed> getEventQueue() {
    return this.eventQueue;
}

Make sure local eventlogging is running and then run the test. Observe that the test event is sent to and validated/accepted by the local EventGate instance:

Source: https://phabricator.wikimedia.org/T351292#9347381

Troubleshooting

If one is using Docker for the local EventGate instance and has it running on port 8192, note that WireMock also runs on this port so lines related to WireMock need to be commented out as well. Otherwise try running a local EventGate on a different port.

Validate using EventStreams Beta API

To validate events using the EventStream API (Beta Cluster), client code needs to include the Metrics Platform Java library which should be installed locally as well. For example, using the Wikipedia Android app repository as the client code base, one can trigger the sending of events using their dev debug build (see Android's getting started documentation for local development).

The basic steps include:

  • Installing a local version of the Metrics Platform Java library with the EventStreams API Beta url.
  • Rebuilding the Android installation using the local MP Java installed version.
  • Setting up the Android emulator or hardware device for debugging.
  • Triggering events with known streams in production.
  • Observing the triggered event in the EventStreams UI.
Install a local MP Java library version

In order to see events streaming in real time, the MP Java client library needs to send events to the EventStreams API Beta intake service which is:

https://intake-analytics-beta.wmflabs.org

Because DestinationEventService::ANALYTICS in the Java library is set with the production url, this value needs to be replaced in the library (and associated tests in order to compile/build) by the beta url. Replace https://intake-analytics.wikimedia.org in DestinationEventService::ANALYTICS with https://intake-analytics-beta.wmflabs.org and update the assertions in DestinationEventServiceTest::testDestinationEventServiceAnalytics() and DestinationEventServiceTest::testDestinationEventServiceAnalyticsDev() with the base beta url.
Note that these changes should not be committed.

Once the updates are made in your local MP repo, build a local installation of the library by running the following command from the root MP directory:

./mvnw -f java install

This will build the library locally and bump the next major version number with -SNAPSHOT appended. So if the current version of the Java library is 2.3, the local installed version will be 2.4-SNAPSHOT.

Rebuild the client code app with the local MP version

Once you have built a local version of your library with the updated DestinationEventService::ANALYTICS enum, go to your local version of the Android app and replace the value of metricsVersion in the build.gradle (Module app) file with the local installed version (i.e. 2.4-SNAPSHOT). Make sure that the active build variant of the Android app is devDebug.

Clean the project (from the Android Studio toolbar - Build > Clean Project), sync the Gradle files (icon: Sync Project with Gradle Files), then rebuild the project (from the Android Studio toolbar - Build > Rebuild Project).

Set up the Android emulator or hardware device for debugging

Assuming Android Studio is the IDE, one can run installations of the Wikipedia Android app using an emulator or a hardware device. You can verify that your local Gradle build is running the local snapshot version of the MP library dependency by navigating to File > Project Structure, and confirming that the Metrics Platform version matches the local installed version:

org.wikimedia.metrics:metrics-platform:2.4-SNAPSHOT

Reinstall the app on your virtual or hardware device.
For virtual devices, you can run the app on the emulator.
For hardware devices, you can use either USB or Wi-Fi to run the app.

Trigger events

If you know which instruments are currently in production, you can perform the triggering action in the dev build of the Android app. As of this writing, there are 4 article-related instruments in production that collect event data related to clicking on preview links, interacting with the toolbar, interacting with the table of contents, and utilizing the Find in page feature. Performing any of the above actions will trigger the sending of events to the EventStreams API Beta UI.

Observe events

Prior to triggering the event, navigate to https://stream-beta.wmflabs.org/v2/ui/#/ and select the streams associated with the instruments you would like to test. Once you've selected your streams, click the Stream button and trigger the events in the dev build of the app. In the example of the article instruments above, by triggering an event, you should see the corresponding event data in the stream if they pass validation against their respective schema ids.

Troubleshooting

The Android documentation has excellent instructions on debugging.

For hardware devices, it is as simple as unplugging the USB cable of your device to eject it and re-plugging it in to see the device in Android Studio. It is important to reinstall the app on the device by clicking the Run 'app' icon at the top of the Android Studio interface every time you make changes to the library dependency or to the Android app itself.

Deploy the stream

Once your instrumentation code has been tested, reviewed and merged, you are ready for deployment, which involves creating a stream and associating it with the event name(s) used by your instrument(s). To create a new stream, you will make some changes to the mediawiki-config repository to declare and configure the stream, and register it for use by the Metrics Platform.

Event streams are declared using the wgEventStreamsand wgEventLoggingStreamNames config variables. Metrics Platform streams have the same form, and appear in the same configuration files, as other Event Platform streams, but there are a couple requirements on the content to associate them with the Metrics Platform. These requirements are explained in Metrics Platform/Creating a Stream Configuration.

You will edit these variables in wmf-config/ext-EventStreamConfig.php (for production and beta clusters) or wmf-config/InitialiseSettings-labs.php (for beta only). ext-EventLogging.php will be used in both beta and production if no corresponding values are found in InitialiseSettings-labs.php. See Configuration files for additional information about the use of these files.

(Note: in a local development environment, these variables are normally declared in LocalSettings.php.)

Stream Configuration

We'll create a new Metrics Platform event stream called mediawiki.interwiki_link_hover. First, declare your stream in the wgEventStreams config variable, in one of the config files mentioned above.

<?php

// …

$wgEventStreams = [

	// …

    'mediawiki.interwiki_link_hover' => [
        'schema_title' => 'analytics/product_metrics/web/base',
		'destination_event_service' => 'eventgate-analytics-external',
        'producers' => [
            'metrics_platform_client' => [
                // The contextual attributes that should be mixed into the event
                // before it is submitted to this stream.
                'provide_values' => [
                    "page_id",
		            "page_title",
    				"page_revision_id",
                    'mediawiki_is_production',
                ],
            ],
        ],
    ],
];

In this example, showing the declaration of stream mediawiki.interwiki_link_hover, we see that schema_title specifies the Metrics Platform web base schema. Finally, we state that the Metrics Platform client should include the 4 contextual attributes page_id, page_title, page_revision_id, and mediawiki_is_production in the events it produces. See the definition of context_attribute in metrics_platform_client.schema.json for a complete list of supported contextual attributes.

See Metrics Platform/Creating a Stream Configuration for additional details regarding Metrics Platform stream configuration.

Register your stream for use

Next, list your stream in wgEventLoggingStreamNames so that Metrics Platform (by way of the EventLogging extension) will get the config for your stream and be able to produce these events.

'wgEventLoggingStreamNames' => [
	'default' => [
		// ...
		'mediawiki.interwiki_link_hover',
	],
],

If you've made these changes in InitialiseSettings-labs.php, you can find a reviewer to just merge your change and the config will be automatically synced to the beta cluster. If your instrumentation code changes are also merged, you'll then be sending these events in the beta environment.

If you've made these changes in ext-EventLogging.php, you'll need to schedule a Backport window deployment to sync out your config change to the production cluster. See Deployments and Backport windows for instructions.

Review your data

Long term, Metrics Platform will provide a user interface that supports Product Owners in creating, launching and managing instrumentation. This interface will also provide a means to track the status of the associated stream.

In the short term, the Event Platform has documented the process for viewing and querying events in both Beta and Product environments. This is the best current method to check that your instrument is generating events as expected.