Metrics Platform/How to/Create First Metrics Platform Instrument
Before starting with this guide, be sure that you already have Setup Mediawiki for Metrics Platform. That article shows you how to install MediaWiki and the needed extensions and configuration to work on anything related with this project.
The following are the steps to build an instrument:
Instrument documentation
Document the instrument
We should build in a sustainable manner and key to that is documentation!
Our expectation is that each instrument will be documented via a README.md kept with associated schema.
Multiple instruments can be associated with a single schema. A new instrument can be documented by adding the following to the README.md
- [Team] <Instrument Name><Phabricator task><link to instrument patch or MR><link to schema patch or MR><Notebook if applicable>
Make a plan to decommission the instrument
As important as creating and operating your instrument is ensuring that you have a complete lifecycle plan for your instrument. Instruments should not, by default, be long lived.
Set a date
When creating a new instrument the responsible team and, specifically, the Product Owner must set a start and end date for operation of the instrument.
Schemas
Unlike past iteration the Metrics Platform(MP) has deliberately made schema creation more expensive. Before creating a new schema fragment teams need to think through what new data they need and why the information is not already captured by existing schemas. Creating a new schema should be an exception not an expectation.
We will publish a guide to data collection and schema design - for now you can consult directly with Data Products through our intake process.
At present, there is no need to remove schemas from the existing system in order to decommission an instrument. If you have a concern about an existing schema please don't hesitate to contact Data Products directly.
Remove Stream Configuration
The most straightforward path, at present, is to remove the newly added stream configuration code that you added under the Creating a Stream Configuration step. This will completely disable the instrument.
In future iterations of the Metrics Platform, we are planning to provide a Configuration and Experimentation Management application that will allow Product Owners to control their experiments without code deploys and this interface will support configuring instrument life spans.
Mapping strategy
Core interaction schemas
Metrics Platform maintains a set of core interactions schemas by platform. These schemas provide the basis for capturing data from common use case events (i.e. click events, etc).
The API section of the Implementations page elucidates the current submit methods that are available for use in client code. These base API methods will grow over time as common event types are added and incorporated into the client libraries. Usage of these common event type submit methods do not require the creation of custom schemas.
Creating a custom fragment/schema
The goal for the Core Interaction schema is to radically reduce schema creation, in general, and move our data capture toward consistency across teams. As such we want to be deliberate and conscientious when we decide to create a new schema fragment. At present, we are working on a Data Capture Guide to support teams in mapping their existing data fields to the Core Interaction schema. In the interim, please reach out to the Data Products team directly for support in deciding if a new schema fragment is needed.
If you are already certain that you will need a new schema fragment, please follow our guide for Creating a Custom Schema.
API
A Metrics Platform instrument is creating by calling one of the submit API methods provided by the MP library. These methods can take the following arguments:
streamName
, a string, is required.schemaId
, a string, is optional/required depending on which submit method is used.action
, a string, is optional/required depending on which submit method is used.InteractionData
, a group of key/value pairs, is optional. If it's present, each key must be in snake case, using only lowercase letters, digits, and '_', and starting with a lowercase letter.
The available keys that can be used for InteractionData
:
action_subtype
action_source
action_context
element_id
element_friendly_name
funnel_name
funnel_entry_token
funnel_event_sequence_position
Each value must be a string with the exception of funnel_event_sequence_position
which is required to be an integer (this constraint is codified in the definition of common product metrics in the product_event YAML file. If additional types are supported in future, those types will be added there.)
Code the instrument
Javascript
Requirements
First, make sure you have a MediaWiki instance running as a Docker container
In addition to that, take a look at Metrics Platform/How to/Getting Started#Javascript to be sure that your working environment is configured with all you need to work on a Javascript instrument. And at Metrics Platform/How to/Getting Started#Recommended IDEs per language you can find some recommendations about which are the best tools to work on this.
Create the instrument
To create an instrument in JavaScript, you need to add code that calls one of the submit methods such as mw.eventLog.submitInteraction
. The following code will fire events whenever an interwiki link is hovered over.
// 'a.extiw' will match anchors that have a extiw class. extiw is used for interwiki links.
$( '#content' ).on( 'mouseover', 'a.extiw', function ( jqEvent ) {
var link = jqEvent.target;
var linkHoverInteractionData = {
action_source: link.href,
action_context: link.title,
};
var action = 'extiw.hover';
var streamName = 'mediawiki.stream_name';
var schemaId = '/analytics/product_metrics/web/base/1.1.0';
mw.eventLog.submitInteraction( streamName, schemaId, action, linkHoverInteractionData );
} );
The same example using the submitClick
method:
...
var streamName = 'my.stream';
mw.eventLog.submitClick( streamName, linkHoverInteractionData );
...
Now when you hover over a link, Metrics Client method mw.eventLog.submitInteraction
or mw.eventLog.submitClick
will be called. The Metrics Client will then construct an event and send it to the specified stream. The event will include the two interaction data elements you have provided in linkHoverInteractionData
, as well as several contextual attributes that are specified with the stream's declaration (below).
Note that the first part of the event name, extiw
, need not correspond to the anchor class. We've chosen to reuse that class name because of its brevity and its familiarity to the developers of this code.
In summary:
mw.eventLog.submitInteraction
needs: (1) the stream name, (2) the schema ID, (3) the action (event name), and (4) the interaction data for the event.mw.eventLog.submitClick
needs only: (1) the stream name and (2) the interaction data for the event.
PHP
Requirements
First, make sure you have a MediaWiki instance running as a Docker container
In addition to that, take a look at Metrics Platform/How to/Getting Started#PHP to be sure that your working environment is configured with all you need to work on a Javascript instrument. And at Metrics Platform/How to/Getting Started#Recommended IDEs per language you can find some recommendations about which are the best tools to work on this.
Create the instrument
The Metrics Platform PHP interface is essentially the same as the JavaScript one. To send a server side event, you can use for example the
EventLogging::getMetricsPlatformClient()->submitClick()
}}
function.
$interactionData = [
'action_source' => 'value for action source'
// ... Other interaction data fields
];
$streamName = 'my.stream';
EventLogging::getMetricsPlatformClient()->submitClick( $streamName, $interactionData );
Java
Requirements
First, take a look at Metrics Platform/How to/Getting Started#Java to be sure that your working environment is configured with all you need to work on a Java instrument. And at Metrics Platform/How to/Getting Started#Recommended IDEs per language you can find some recommendations about which are the best tools to work on this.
Create the instrument
Swift
Requirements
In addition to that, take a look at Metrics Platform/How to/Getting Started#Swift to be sure that your working environment is configured with all you need to work on a Swift instrument. And at Metrics Platform/How to/Getting Started#Recommended IDEs per language you can find some recommendations about which are the best tools to work on this.
Create the instrument
TBD
Configure the stream
Once you have your instrument built, you will next need to create and setup your stream configuration. Right now, we are working on a Configuration UI that will remove a lot of the manual work required, you can track our progress via our workboards.
For today, the next step is to follow the Creating a Stream Configuration guide.
Validate the events
During active development, it is important to be able to test that events can be validated by EventGate against a specified JSONSchema. Each of the client libraries contain unit and integration tests for ensuring that events are properly formatted and serialized for submission. However testing end-to-end is a valuable exercise in instilling confidence that events will be produced to Kafka and ensuing data is able to be consumed.
The basic process for testing end-to-end requires having a local EventGate instance. See Metrics_Platform#Quick_Start for development environment recommendations.
Javascript
Because the Javascript client library is provided through the MediaWiki EventLogging extension, having an instance of MediaWiki running in a local development environment is required. Once a change set is submitted as a patch to the EventLogging extension (presumably after corresponding changes have been merged in the Javascript client library), the following steps should be followed:
- Make sure Docker (if using) is running as well as a local MediaWiki instance and a local EventGate instance
- Apply the patch locally
- Add the necessary stream configuration in
LocalSettings.php
i.e.:
$wgEventStreams = [
'test.metrics_platform.interactions' => [
'schema_title' => '/analytics/product_metrics/web/base',
'destination_event_service' => 'eventgate-analytics-external',
'producers' => [
'metrics_platform_client' => [
'provide_values' => [
'performer_is_logged_in',
'mediawiki_skin',
],
],
],
],
];
// When $wgEventLoggingStreamNames is false (not falsy), the EventLogging
// JavaScript client will treat all streams as if they are configured and
// registered.
$wgEventLoggingStreamNames = [ 'test.metrics_platform.interactions' ];
- Navigate to a page on localhost i.e. http://localhost:8080/wiki/Main_Page
- Run an eventLog submit method in the console i.e.:
mw.eventLog.submitInteraction('test.metrics_platform.interactions', '/analytics/product_metrics/web/base/1.0.0', 'init');
- Observe that an event has been sent to, validated and accepted by the local EventGate instance.
Source: https://phabricator.wikimedia.org/T351293#9337205
PHP
The exact same process as for Javascript can be used.
Java
While the Java client library includes end-to-end integration testing from reading a local stream config data fixture to building a metrics client to submitting events and then stubbing expected responses, it does not ultimately provide a true end-to-end testing experience wherein production config is read and the metrics client submits an event to EventGate whereupon it is validated and accepted.
There are a few ways to confirm the validation of events. One method can be done with a local EventGate instance. Another can be accomplished by utilizing the EventStreams Beta API.
Validate using local EventGate
To emulate a true end-to-end testing scenario locally, we can create a test that instantiates a metric client which fetches production stream configs (from the meta api) and submits an event using a production stream associated with a production schema and have the event validated with a local EventGate instance.
By using production config, we can submit an event using an example stream:
@Test void submitEventTimerStreamConfig() throws IOException, InterruptedException {
// Create the metrics client.
MetricsClient testMetricsClient = MetricsClient.builder(testClientData).build();
await().atMost(10, SECONDS).until(testMetricsClient::isFullyInitialized);
testMetricsClient.submitInteraction(
"android.metrics_platform.find_in_page_interaction",
DataFixtures.getTestClientData(getExpectedEventClick()),
DataFixtures.getTestInteractionData("TestClick")
);
Thread.sleep(10_000);
await().atMost(30, SECONDS).until(() -> testMetricsClient.getEventQueue().size() == 0);
}
In the example above, the MetricsClient::submitInteraction()
method takes the sample production stream name as a parameter along with some test ClientData
and test InteractionData
. The default schema id is set as the app base schema inside the library.
Once the metrics client has fetched the production stream configs, it batch processes the event queue to send serialized events to the DestinationEventService
which in this case should be set as a local EventGate instance.
Because the sample production stream android.metrics_platform.find_in_page_interaction
adds only a few requested values in its producer config, we can prevent ContextController:enrichEvent()
(which enriches the event with configuration-specified contextual values) from filtering the ClientData
and simply have all available contextual values sent with the event by commenting out the lines the method that resets the ClientData
.
The test above waits until the event queue is empty so the following method can be added to the MetricsClient
class though it is not currently in the latest release:
public BlockingQueue<EventProcessed> getEventQueue() {
return this.eventQueue;
}
Make sure local eventlogging is running and then run the test. Observe that the test event is sent to and validated/accepted by the local EventGate instance:
Source: https://phabricator.wikimedia.org/T351292#9347381
Troubleshooting
If one is using Docker for the local EventGate instance and has it running on port 8192, note that WireMock also runs on this port so lines related to WireMock need to be commented out as well. Otherwise try running a local EventGate on a different port.
Validate using EventStreams Beta API
To validate events using the EventStream API (Beta Cluster), client code needs to include the Metrics Platform Java library which should be installed locally as well. For example, using the Wikipedia Android app repository as the client code base, one can trigger the sending of events using their dev debug build (see Android's getting started documentation for local development).
The basic steps include:
- Installing a local version of the Metrics Platform Java library with the EventStreams API Beta url.
- Rebuilding the Android installation using the local MP Java installed version.
- Setting up the Android emulator or hardware device for debugging.
- Triggering events with known streams in production.
- Observing the triggered event in the EventStreams UI.
Install a local MP Java library version
In order to see events streaming in real time, the MP Java client library needs to send events to the EventStreams API Beta intake service which is:
https://intake-analytics-beta.wmflabs.org
Because DestinationEventService::ANALYTICS
in the Java library is set with the production url, this value needs to be replaced in the library (and associated tests in order to compile/build) by the beta url. Replace https://intake-analytics.wikimedia.org
in DestinationEventService::ANALYTICS
with https://intake-analytics-beta.wmflabs.org
and update the assertions in DestinationEventServiceTest::testDestinationEventServiceAnalytics()
and DestinationEventServiceTest::testDestinationEventServiceAnalyticsDev()
with the base beta url.
Note that these changes should not be committed.
Once the updates are made in your local MP repo, build a local installation of the library by running the following command from the root MP directory:
./mvnw -f java install
This will build the library locally and bump the next major version number with -SNAPSHOT
appended. So if the current version of the Java library is 2.3, the local installed version will be 2.4-SNAPSHOT
.
Rebuild the client code app with the local MP version
Once you have built a local version of your library with the updated DestinationEventService::ANALYTICS
enum, go to your local version of the Android app and replace the value of metricsVersion
in the build.gradle (Module app)
file with the local installed version (i.e. 2.4-SNAPSHOT
). Make sure that the active build variant of the Android app is devDebug
.
Clean the project (from the Android Studio toolbar - Build > Clean Project), sync the Gradle files (icon: Sync Project with Gradle Files), then rebuild the project (from the Android Studio toolbar - Build > Rebuild Project).
Set up the Android emulator or hardware device for debugging
Assuming Android Studio is the IDE, one can run installations of the Wikipedia Android app using an emulator or a hardware device. You can verify that your local Gradle build is running the local snapshot version of the MP library dependency by navigating to File > Project Structure, and confirming that the Metrics Platform version matches the local installed version:
org.wikimedia.metrics:metrics-platform:2.4-SNAPSHOT
Reinstall the app on your virtual or hardware device.
For virtual devices, you can run the app on the emulator.
For hardware devices, you can use either USB or Wi-Fi to run the app.
Trigger events
If you know which instruments are currently in production, you can perform the triggering action in the dev build of the Android app. As of this writing, there are 4 article-related instruments in production that collect event data related to clicking on preview links, interacting with the toolbar, interacting with the table of contents, and utilizing the Find in page
feature. Performing any of the above actions will trigger the sending of events to the EventStreams API Beta UI.
Observe events
Prior to triggering the event, navigate to https://stream-beta.wmflabs.org/v2/ui/#/ and select the streams associated with the instruments you would like to test. Once you've selected your streams, click the Stream
button and trigger the events in the dev build of the app. In the example of the article instruments above, by triggering an event, you should see the corresponding event data in the stream if they pass validation against their respective schema ids.
Troubleshooting
The Android documentation has excellent instructions on debugging.
For hardware devices, it is as simple as unplugging the USB cable of your device to eject it and re-plugging it in to see the device in Android Studio. It is important to reinstall the app on the device by clicking the Run 'app'
icon at the top of the Android Studio interface every time you make changes to the library dependency or to the Android app itself.
Deploy the stream
Once your instrumentation code has been tested, reviewed and merged, you are ready for deployment, which involves creating a stream and associating it with the event name(s) used by your instrument(s). To create a new stream, you will make some changes to the mediawiki-config repository to declare and configure the stream, and register it for use by the Metrics Platform.
Event streams are declared using the wgEventStreams
and wgEventLoggingStreamNames
config variables. Metrics Platform streams have the same form, and appear in the same configuration files, as other Event Platform streams, but there are a couple requirements on the content to associate them with the Metrics Platform. These requirements are explained in Metrics Platform/Creating a Stream Configuration.
You will edit these variables in wmf-config/ext-EventStreamConfig.php
(for production and beta clusters) or wmf-config/InitialiseSettings-labs.php
(for beta only). ext-EventLogging.php
will be used in both beta and production if no corresponding values are found in InitialiseSettings-labs.php
. See Configuration files for additional information about the use of these files.
(Note: in a local development environment, these variables are normally declared in LocalSettings.php
.)
Stream Configuration
We'll create a new Metrics Platform event stream called mediawiki.interwiki_link_hover
. First, declare your stream in the wgEventStreams
config variable, in one of the config files mentioned above.
<?php
// …
$wgEventStreams = [
// …
'mediawiki.interwiki_link_hover' => [
'schema_title' => 'analytics/product_metrics/web/base',
'destination_event_service' => 'eventgate-analytics-external',
'producers' => [
'metrics_platform_client' => [
// The contextual attributes that should be mixed into the event
// before it is submitted to this stream.
'provide_values' => [
"page_id",
"page_title",
"page_revision_id",
'mediawiki_is_production',
],
],
],
],
];
In this example, showing the declaration of stream mediawiki.interwiki_link_hover
, we see that schema_title
specifies the Metrics Platform web base schema. Finally, we state that the Metrics Platform client should include the 4 contextual attributes page_id
, page_title
, page_revision_id
, and mediawiki_is_production
in the events it produces. See the definition of context_attribute
in metrics_platform_client.schema.json for a complete list of supported contextual attributes.
See Metrics Platform/Creating a Stream Configuration for additional details regarding Metrics Platform stream configuration.
Register your stream for use
Next, list your stream in wgEventLoggingStreamNames
so that Metrics Platform (by way of the EventLogging extension) will get the config for your stream and be able to produce these events.
'wgEventLoggingStreamNames' => [
'default' => [
// ...
'mediawiki.interwiki_link_hover',
],
],
If you've made these changes in InitialiseSettings-labs.php
, you can find a reviewer to just merge your change and the config will be automatically synced to the beta cluster. If your instrumentation code changes are also merged, you'll then be sending these events in the beta environment.
If you've made these changes in ext-EventLogging.php
, you'll need to schedule a Backport window deployment to sync out your config change to the production cluster. See Deployments and Backport windows for instructions.
Review your data
Long term, Metrics Platform will provide a user interface that supports Product Owners in creating, launching and managing instrumentation. This interface will also provide a means to track the status of the associated stream.
In the short term, the Event Platform has documented the process for viewing and querying events in both Beta and Product environments. This is the best current method to check that your instrument is generating events as expected.