Metrics Platform/How to/Create An Instrument

From Wikitech
The development of the Metrics Platform, organized in three phases, is ongoing. Some MP components are currently available for use, but be aware that future versions will call for updates to the content on this page, especially with respect to the use of schemas.

This page provides a "how-to guide" for using the Metrics Platform (MP) to create instrumentation in its supported programming languages – JavaScript, PHP, Java, and Swift.

The Metrics Platform is built on the Event Platform. A Metrics Platform client is a specialized type of Event Platform client – designed to require less work in creating instrumentation. From the code developer's perspective, the most important differences are:

  • The Metrics Platform owns and maintains a set of core interactions schemas which are part of an evolving array of common event shapes against which events will be validated. You can submit events to these core interaction schemas as is without having to create custom schemas for a given instrument. The core interactions schemas have attributes for the most common instrument-agnostic data (called contextual attributes or, in some settings, event metadata) that teams might need to answer their questions, e.g. session ID, pageview ID, namespace and title of the current page, etc.
  • If you need custom data that is not currently available in any of the core interactions schemas, you can reference these schemas for use in your own custom concrete schema with the additional custom properties you need. You are responsible however for owning/maintaining whatever custom schemas you create with the Metrics Platform core interactions schemas.
  • Depending on whether you use Metrics Platform core interactions schemas as is or a custom schema derived from them, your code passes event names (with stream names, schema IDs, and custom data if applicable) to the Metrics Platform. See the Metrics Platform API for details.
  • Metrics Platform streams are declared in the same way, and appear in the same configuration files, as other Event Platform streams, but there are a couple requirements on the content of their declarations for them to be used with the Metrics Platform.

In the following sections, a simple lifecycle example will demonstrate creating an instrument and a new stream to log whenever a user hovers over an interwiki link. Instrument coding (including a bit more about custom data) is illustrated in the Development section, and stream creation (including a bit more about contextual attributes) is illustrated in the Deployment section.

Development

Currently, the JS and PHP Metrics Platform Clients only work within the EventLogging extension, which is part of the Event Platform environment for MediaWiki. Before getting started with Metrics Platform, you need to have an up-to-date MediaWiki development environment. If you do not have one, consider using MediaWiki-Docker.

Development Environment

In MediaWiki Docker

Follow the instructions in this Event Platform configuration recipe for MediaWiki-Docker to set up an Event Platform environment alongside your MediaWiki development environment. Even if you have previously installed this Event Platform environment, make sure you have completed the instructions of the Metrics Platform section on that page. The Legacy EventLogging and Event Platform with Local Schema Repositories sections are not needed.

In MediaWiki Vagrant

Enabling the wikimediaevents role will also include the eventlogging role for you, and set up other Event Platform backend components on MediaWiki Vagrant including EventGate.

$ vagrant roles enable wikimediaevents --provision
$ vagrant git-update

This will clone WikimediaEvents into mediawiki/extensions/WikimediaEvents and the schemas/event/secondary repository at srv/schemas/event/secondary (and also install its npm dependencies for schema materialization and tests).

Events will be written to /vagrant/logs/eventgate-events.json. eventgate logs, including validation errors, are in /vagrant/logs/eventgate-wikimedia.log.

To verify that eventgate is working properly, you can force a test event to be produced by curl-ing http://localhost:8192/v1/_test/events. You should see a test event logged into eventgate-events.json.

In your local dev environment with eventgate-devserver

If you aren't using MediaWiki Docker or Mediawiki-Vagrant, or you'd rather have more manual control over your development environment, EventLogging comes with an 'eventgate-devserver' that will accept events and write them to a local file. Clone mediawiki/extensions/EventLogging and run

$ cd extensions/EventLogging/devserver
$ npm install --no-optional
$ npm run eventgate-devserver

This should download EventGate and other dependencies and run the eventgate-devserver accepting events at http://localhost:8192 and writing them to ./events.json. See devserver/README.md for more info.

Writing MediaWiki instrumentation code using the EventLogging extension

A Metrics Platform instrument is creating by calling one of the submit API methods provided by the MP library. These methods can take the following arguments:

  • streamName, a string, is required.
  • schemaId, a string, is optional/required depending on which submit method is used.
  • action, a string, is optional/required depending on which submit method is used.
  • InteractionData, a group of key/value pairs, is optional. If it's present, each key must be in snake case, using only lowercase letters, digits, and '_', and starting with a lowercase letter.

The available keys that can be used for InteractionData:

  • action_subtype
  • action_source
  • action_context
  • element_id
  • element_friendly_name
  • funnel_name
  • funnel_entry_token
  • funnel_event_sequence_position

Each value must be a string with the exception of funnel_event_sequence_position which is required to be an integer (this constraint is codified in the definition of common product metrics in the product_event YAML file. If additional types are supported in future, those types will be added there.)

Language-specific details of calling submit methods are presented in the following subsections. The JavaScript subsection provides additional explanatory material, so is worth reading even if you are working in a different language. See Metrics_Platform/Implementations#API for additional details.

In JavaScript

To create an instrument in JavaScript, you need to add code that calls one of the submit methods such as mw.eventLog.submitInteraction. The following code will fire events whenever an interwiki link is hovered over.

// 'a.extiw' will match anchors that have a extiw class.  extiw is used for interwiki links.
$( '#content' ).on( 'mouseover', 'a.extiw', function ( jqEvent ) {
	var link = jqEvent.target;
	var linkHoverInteractionData = {
		action_source: link.href,
		action_context: link.title,
	};

    var action = 'extiw.hover';
    var streamName = 'mediawiki.stream_name';
    var schemaId = '/analytics/product_metrics/web/base/1.1.0';
	mw.eventLog.submitInteraction( streamName, schemaId, action, linkHoverInteractionData );
} );

The same example using the submitClick method:

    ...
    var streamName = 'my.stream';
	mw.eventLog.submitClick( streamName, linkHoverInteractionData );
    ...

Now when you hover over a link, Metrics Client method mw.eventLog.submitInteraction or mw.eventLog.submitClick will be called. The Metrics Client will then construct an event and send it to the specified stream. The event will include the two interaction data elements you have provided in linkHoverInteractionData, as well as several contextual attributes that are specified with the stream's declaration (below).

Note that the first part of the event name, extiw, need not correspond to the anchor class. We've chosen to reuse that class name because of its brevity and its familiarity to the developers of this code.

In summary:

  • mw.eventLog.submitInteraction needs: (1) the stream name, (2) the schema ID, (3) the action (event name), and (4) the interaction data for the event.
  • mw.eventLog.submitClick needs only: (1) the stream name and (2) the interaction data for the event.

In PHP

The Metrics Platform PHP interface is essentially the same as the JavaScript one. To send a server side event, you can use for example the EventLogging::getMetricsPlatformClient()->submitClick() function.

$interactionData = [
	'action_source' => 'value for action source'
	// ... Other interaction data fields
];

$streamName = 'my.stream';
EventLogging::getMetricsPlatformClient()->submitClick( $streamName, $interactionData );

Writing MediaWiki instrumentation code for apps

In Java

The Metrics Platform Java client library operates similarly to the other libraries in that different submit methods can be used to submit event data to the Metrics Platform. For example, once a MetricsClient object is instantiated, the methods MetricsClient::submitClick and MetricsClient::submitInteraction can be invoked to submit an event to the Metrics Platform, with zero or more interactionData elements (key / value pairs). The syntax of keys and the types of values are constrained in the same way described for JavaScript above.

// Parameters derived from client at instantiation of the Metrics Client object with options to pass in dynamic arguments at the time of event submission
ClientData clientData = new ClientData(
   agentData,
   pageData,
   mediawikiData,
   performerData,
);
InteractionData interactionData = new InteractionData(
   'action_value',
   'action_subtype_value',
   'action_source_value',
   'action_context_value',
);

metricsClient.submitClick(clientData, interactionData);

To use MetricsClient::submitInteraction with a custom schema:

Map<String, Object> customData = new HashMap<String, Object>();
customData.put("font_size", "small");
customData.put("is_full_width", true);
customData.put("screen_size", 1080);

metricsClient.submitInteraction(
   "custom_schema_id",
   "some_prefix.some_event_name",
   clientData,
   interactionData,
   customData
);

See Metrics_Platform/Implementations#Language-specific_Notes_and_Key_Differences for additional details about the Java library.

In Swift

FIXME:

Deployment

Once your instrumentation code has been tested, reviewed and merged, you are ready for deployment, which involves creating a stream and associating it with the event name(s) used by your instrument(s). To create a new stream, you will make some changes to the mediawiki-config repository to declare and configure the stream, and register it for use by the Metrics Platform.

Event streams are declared using the wgEventStreamsand wgEventLoggingStreamNames config variables. Metrics Platform streams have the same form, and appear in the same configuration files, as other Event Platform streams, but there are a couple requirements on the content to associate them with the Metrics Platform. These requirements are explained in Metrics Platform/Creating a Stream Configuration.

You will edit these variables in wmf-config/ext-EventStreamConfig.php (for production and beta clusters) or wmf-config/InitialiseSettings-labs.php (for beta only). ext-EventLogging.php will be used in both beta and production if no corresponding values are found in InitialiseSettings-labs.php. See Configuration files for additional information about the use of these files.

(Note: in a local development environment, these variables are normally declared in LocalSettings.php.)

Stream Configuration

We'll create a new Metrics Platform event stream called mediawiki.interwiki_link_hover. First, declare your stream in the wgEventStreams config variable, in one of the config files mentioned above.

<?php

// …

$wgEventStreams = [

	// …

    'mediawiki.interwiki_link_hover' => [
        'schema_title' => 'analytics/product_metrics/web/base',
		'destination_event_service' => 'eventgate-analytics-external',
        'producers' => [
            'metrics_platform_client' => [
                // The contextual attributes that should be mixed into the event
                // before it is submitted to this stream.
                'provide_values' => [
                    "page_id",
		            "page_title",
    				"page_revision_id",
                    'mediawiki_is_production',
                ],
            ],
        ],
    ],
];

In this example, showing the declaration of stream mediawiki.interwiki_link_hover, we see that schema_title specifies the Metrics Platform web base schema. Finally, we state that the Metrics Platform client should include the 4 contextual attributes page_id, page_title, page_revision_id, and mediawiki_is_production in the events it produces. See the definition of context_attribute in metrics_platform_client.schema.json for a complete list of supported contextual attributes.

See Metrics Platform/Creating a Stream Configuration for additional details regarding Metrics Platform stream configuration.

Register your stream for use

Next, list your stream in wgEventLoggingStreamNames so that Metrics Platform (by way of the EventLogging extension) will get the config for your stream and be able to produce these events.

'wgEventLoggingStreamNames' => [
	'default' => [
		// ...
		'mediawiki.interwiki_link_hover',
	],
],

If you've made these changes in InitialiseSettings-labs.php, you can find a reviewer to just merge your change and the config will be automatically synced to the beta cluster. If your instrumentation code changes are also merged, you'll then be sending these events in the beta environment.

If you've made these changes in ext-EventLogging.php, you'll need to schedule a Backport window deployment to sync out your config change to the production cluster. See Deployments and Backport windows for instructions.

Viewing and querying events

As described in Metrics_Platform/FAQ#Metrics_Platform_Development, the development of Metrics Platform is planned in three phases, of which the first phase is nearing completion. The second phase will focus on the development of a new Control Plane component, which will provide a GUI for managing Event Stream Coordination, Event Stream Creation & Configuration, and Data Transformation Management. As that component nears completion, documentation about its usage will be added here. Until then, please refer to Event_Platform/Instrumentation_How_To#Viewing_and_querying_events.

Overriding event stream config settings

The Control Plane component, mentioned in the previous section, will also help with overriding config settings for event streams. Until it is available, please refer to Event_Platform/Instrumentation_How_To#Overriding_event_stream_config_settings.

Decommissioning

As noted in Event_Platform/Instrumentation_How_To#Decommissioning, stream-related code and configuration can be removed at anytime to stop producing an event stream.