Metrics Platform/Client/Implementations

From Wikitech
This page is currently in archived status. It is not currently maintained, and some content may be out of date. After some of its relevant content has been moved to other pages, this page will be deleted.

This page provides an overview of the existing Event Platform Client (EPC) implementations and describes the outstanding differences among them. Our goal is for the various implementations to match one another as closely as possible while taking differences imposed by varying programming language constructs, project-specific design patterns, etc. into account.

Note: At present there is a 1:1 relationship between programming language and client: effectively, the JavaScript client is the MediaWiki frontend client; the PHP client is the MediaWiki backend client; the Java client is the Android app client; and the Swift client is the iOS app client. In principle, however, the libraries should be language-specific but platform-agnostic, and should be able to be shared by multiple clients implemented in the same language.

JavaScript

The EPC JavaScript implementation is provided in the EventLogging extension. Specifically, it lives in the core.js file shipped to clients as part of the ResourceLoader module ext.eventLogging, and its methods are exposed via mw.eventLog.

Public methods

FIXME: Currently, a number of other methods are attached to the core object and thereby exported via mw.eventLog. These include streamInSample(), and all methods under mw.eventLog.id and mw.eventLog.storage. These should not be invoked by external callers and will be removed from the de facto public interface in a future refactor.

mw.eventLog.submit( streamName, eventData )

Evaluates the stream and submitted data for submission to the Event Platform intake service according to any sampling and filtering rules specified in the stream configuration. If the event passes all sampling and filtering rules, it is supplemented with additional metadata and submitted to the event platform intake service.

Params
  • streamName (string): The destination stream name. Unless stream configuration is globally disabled, streamName must correspond to a stream configured in $wgEventStreams, or the request will fail.
  • eventData (object): An object containing the event data.

PHP

The EPC PHP implementation is also part of the EventLogging extension, where it can be found in includes/EventLogging.php along with helper methods in includes/EventLoggingHelper.php.

No sampling support is provided in the PHP client at present. Sampling support can be added if and when there is a use case for it and an appropriate sampling unit defined.

Public methods

EventLogging::submit($streamName, $event, $logger = null)

Evaluates the stream and submitted data for submission to the Event Platform intake service according to any sampling and filtering rules specified in the stream configuration. If the event passes all sampling and filtering rules, it is supplemented with additional metadata and submitted to the event platform intake service.

After filtering and supplementing the event, the implementation delegates to EventBus::send for event submission.

Params
  • $streamName (string): The destination stream name. Unless stream configuration is globally disabled, streamName must correspond to a stream configured in $wgEventStreams, or the request will fail.
  • $event (array): An associative (string-keyed) array containing the event data.
  • $logger (?Psr\Log\LoggerInterface): An optional Logger instance. This is intended only for automated testing, and should not be used by production callers.
FIXME: For the sake of cross-platform consistency, the submit() method should be updated to remove the optional $logger parameter. This will probably require restoring the standalone EventPlatformClient class and injecting a logger in the class constructor.

Java

The Java EPC library is currently implemented in the org.wikipedia.analytics.eventplatform package in the Wikipedia for Android app repository (see app/src/main/java/org/wikipedia/analytics/eventplatform/). The main implementation is in the EventPlatformClient class, and the remaining classes are largely POJOs used for serializing to and deserializing from JSON strings using Gson and Retrofit.

Stream configurations are fetched from the MediaWiki API via Meta-Wiki on app startup and stored in SharedPreferences for use in future sessions. There is currently no attempt to retry fetching the stream configs in case of failure, and no attempt to retain events that occur before stream configs are fetched for submission when they become available. Outgoing events are enqueued in an OutputBuffer and submitted in batches every 30 seconds. Additionally, if the queue exceeds 128 events in size, all events are immediately sent to the event platform intake service.

FIXME: For consistency with the Swift library, add an InputBuffer to temporarily retain events that occur before stream configs first become available.

Public methods

EventPlatformClient.submit(Event event)

Evaluates the stream and submitted data for submission to the Event Platform intake service according to any sampling and filtering rules specified in the stream configuration. If the event passes all sampling and filtering rules, it is supplemented with additional metadata and enqueued for submission to the event platform intake service.

Params
  • event (Event): The event data. The Event class is intended as a base class containing all fields that are required of all app analytics events. It allows modeling event data as Gson POJOs and can be subclassed for specific event types (e.g., UserContributionEvent). The stream name and schema are passed in to the constructor.
FIXME: Update public method sig to: submit(String name, Event data)

Swift

The Swift EPC library is contained in the Event Platform group within the WMF Framework module of the Wikipedia app for iOS. The main client functionality is implemented in EventPlatformClient class, along with the StorageManager and SamplingController support classes defined in separate files. Additional files in the group contain Core Data model definitions for event storage.

Stream configurations are fetched from the MediaWiki API via Meta-Wiki on app startup. If the stream configuration request fails, it is retried up to 10 times, with an increasing delay period between retries. Stream configurations are not held in persistent storage for subsequent launches.

Before stream configurations are fetched, any events received by the client are stored unconditionally in an InputBuffer with a maximum size of 128 events. If the input buffer reaches its maximum size, the oldest events are removed as needed to make room for new events. After stream configurations are loaded, events in the InputBuffer are evaluated and conditionally moved to persistent storage in Core Data, where they are held for eventual submission to the Event Platform intake service. Subsequently, the InputBuffer is no longer used, and all events received are evaluated and conditionally held in Core Data for submission.

Every 30 seconds, stale entries are pruned from the event storage table, and an attempt is made to submit all remaining entries pending submission. A stale entry is defined as one that has either been successfully submitted or has existed in the storage table for more than 30 days.

The database storage model is a deviation from the Event Platform Client specification and was carried over from the previous analytics client implementation at the iOS app team's request. There is no plan to update other clients to match this behavior.

Public methods

EventPlatformClient.submit<E: EventInterface>(stream: Stream, event: E, domain: String? = nil)

Evaluates the stream and submitted data for submission to the Event Platform intake service according to any sampling and filtering rules specified in the stream configuration. If the event passes all sampling and filtering rules, it is supplemented with additional metadata and enqueued for submission to the event platform intake service.

Params
  • stream (Stream): The destination stream name. Stream is an enum defined in the EventPlatformClient class that contains the expected destination stream names as values.
  • event (E: EventInterface): The event data. EventInterface is a protocol (interface) requiring that the event data contain a schema field and implement Codable.
  • domain (String?): An optional domain string, intended to be used where the wiki domain for the current app language does not apply to the event being submitted.
FIXME: Update clients to pass in the intended domain value as meta.domain, and remove the optional domain parameter.

See also