Metrics Platform/Client/Definition

From Wikitech

An Event Platform Client is a member of a family of software libraries that carry out the production of events from a client application, such as the MediaWiki browser environment, the Android Wikipedia app, or the iOS Wikipedia app. The software libraries conform to a common definition, so that the entire event production is portable across platforms. This page is that definition.

Introduction

Description of the software goes here.

Conformance

How conformance is determined. RFC 2199

Terminology

Event

An object representing a part of the program's state at a discrete point in time.

Event production

Event stream

Event stream configuration

Instrument

Intake

Client

Library

Event

An event is a JSON string that can be validated against a corresponding JSONSchema event schema. The related phrase event data refers to the data structure that, when serialized as a JSON string, becomes the event.

Event data

Event data is the collection of properties (key, value pairs) that together represent an software event at a discrete moment in time. Its properties should match a schema. Most of these properties will be provided as the second argument in .produce( streamName, eventData ). Others may be provided automatically by the event platform library.

{
    /* Reserved properties */

    /* 'meta' holds properties that have particular meaning to EventGate */
    'meta': {
        'stream':  /* (Required) Name of stream to which the event belongs */,
        'dt':      /* (Required) Locally-generated UTC timestamp with subsecond resolution in ISO 8601 format */,
        'id':      /* (Required) UUID used to de-duplicate events */,
    },
    '$schema':     /* (Required) URI of schema to validate against */
    'pageview_id': /* (Optional) ID */,
    'session_id':  /* (Optional) ID */,

    /* Additional properties .... */
}

Reserved properties

Some properties are common to all schema in our system, meaning they will appear on every event regardless of how its schema is otherwise defined. To make this more convenient, the event platform client automatically manages these properties, which are enumerated below. Anything not enumerated below is unreserved. These properties originate from a combination of EventGate and the common schema which most schema inherit (LINK).

Name Required Type Format
meta.id No String UUIDv4
meta.dt Yes String ISO 8601 UTC datetime
meta.stream Yes String Should match the name of a configured stream.
$schema Yes String (JSON reference) Should match a JSONSchema URI
pageview_id No String See identifier format (cite)
pageview_id No String See identifier format (cite)

meta.id

  • MUST be UUID format (see schema)
  • SHOULD provide sufficient uniqueness (see defn of sufficient)
  • MAY be assigned a value server-side

The purpose of this field is to allow the intake to recognize events that have been duplicated as a result of bugs in the user agent. For applications where this is not necessary, it may be left unset, according to the default rules in the schema.

meta.dt

  • MUST be recorded as UTC
  • MUST be formatted as ISO 8601 (see RFC 3339)
  • MUST support millisecond resolution
  • SHOULD be assigned a value client side
  • MAY be assigned a value server side

Millisecond resolution is necessary to discriminate between events that may be sent during the same second as a result of bursting transmission. If value assigned by client, MUST be generated according to client clock (see: client clocks. MAY be assigned a value server-side. Records when the event was triggered. Date/time MUST be recorded as UTC. Format MUST be ISO 8601 (see RFC 3339) and MUST support millisecond resolution. Millisecond resolution is necessary to discriminate between events that may be sent during the same second as a result of bursting transmission. If value assigned by client, MUST be generated according to client clock (see: client clocks). MAY be assigned a value server-side.

meta.stream

Name of the stream to which this event belongs. SHOULD match a stream registered with the stream configuration service. If such a stream exists, its stream configuration will be retrieved and if its $schema field value matches the value of the event object's $schema field, then validation will proceed, otherwise it will be rejected.

$schema

  • MUST be provided a value by the client
  • SHOULD match the value of $schema in the stream configuration corresponding to meta.stream.

Reference to the JSONSchema schema which this event object should be validated against. Its value SHOULD match the value of $schema in the stream configuration corresponding to meta.stream. Every event should match a schema. An event signals which schema it believes it matches, by setting its $schema property to the address of that schema in the public schema repository. The intake server (e.g. EventGate) will attempt to validate the event against this schema.

Event stream configuration

Production

Configuration

Transmission

Identifiers

Sampling

Persistence

Time

Randomness

Stream configuration

References

Acknowledgements