Metrics Platform/Event Schema

From Wikitech

The schema defines:

  1. The standard contextual attributes / values that can be recorded when a user performs an instrumented interaction with a MediaWiki instance; and
  2. The standard shape of the events that model those instrumented interactions.

These values convey information about the who, what, when, and where of the instrumented interaction. They are sent alongside custom data, which is provided by the instrument and conveys information about the how of the instrumented interaction.

The contextual values are collected by many instruments in many products and are used in many kinds of analysis. To ensure these values are being implemented faithfully, the Metrics Platform is responsible for assigning values to them. Note well that the Metrics Platform does not assign a value to any of the values by default and must be configured to do so on a per-attribute basis.

The contextual attributes are organized in groups: agent, page, mediawiki, and performer. In the table below, each attribute starts with the group name followed by a dot; for example, agent.app_install_id. (Note, however, that the dot becomes an underscore when a contextual attribute appears in a schema config declaration; for example, agent_app_install_id.)

The Metrics Platform is used to create events which can include both contextual values and custom data values, and which conform to the schema. The schema can be found here: https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/mediawiki/client/metrics_event .

Additional details about the properties used as sampling units (pageview_id, session_id, app_install_id) are given in Metrics_Platform/Sampling_Units.

The development of the Metrics Platform, organized in three phases, is ongoing. To learn what MP components are currently available for use, see FAQ details.

Properties

The following table was generated from v1.1.0 of the schema.

Property Description
dt ISO-8601 timestamp recording the estimated UTC time when the event arrives at the Metrics Platform client library.
name The name of the event. All events have a name used to identify them. This name is passed in by the instrumentation code, and is also used by event stream configuration subscribers to identify which events they would like to appear in the stream.
agent.app_install_id UUIDv4 identifier generated when an application is installed. Identifies a particular install on a particular device.
agent.client_platform The client platform on which the event was produced.
agent.client_platform_family The family of the client platform on which which the event was

produced.

page.id Unique identifier assigned to a MediaWiki page when it is created.

The identifier remains the same across edits, renames, and moves. It may change if the page is deleted and then restored, however.

See https://www.mediawiki.org/wiki/Manual:Page_table#page_id

page.title The MediaWiki page title, with the namespace prefix removed and with spaces replaced by underscores, e.g. "Talk:Foo Bar" becomes "Foo_Bar"

See

page.namespace The ID of the namespace that the page is in.

See

page.namespace_name The canonical (English) name of the namespace that the page is in.

Namespace names can be translated and translatable aliases can be created for them in the MediaWiki configuration. MediaWiki Core defines the canonical name for a namespace as the English one.

See

page.revision_id The head revision ID of the page at the time of the event.
page.wikidata_id Wikibase item ID corresponding to the page at the time of the event.

See https://www.wikidata.org/wiki/Wikidata:Identifiers


Superseded by page.wikidata_qid.

page.wikidata_qid Wikibase item ID corresponding to the page at the time of the event.

See https://www.wikidata.org/wiki/Wikidata:Identifiers

page.content_language The language of the page content, formatted as a language code.

Semantics to be documented as the "page content language algorithm".

See

page.is_redirect Whether the MediaWiki page is a redirect or not at the time of the

event. A page is considered a redirect if it starts with

#REDIRECT pagename

Note well that this state is also stored on the Mediawiki page table.

See

page.user_groups_allowed_to_move User groups with permission to move the page at the time of the event.

This will be an empty array if any user is allowed to move the page.

page.user_groups_allowed_to_edit User groups with permission to edit the page at the time of the event.

This will be an empty array if any user is allowed to edit the page.

mediawiki.skin MediaWiki skin at the time of the event, e.g. "vector"
mediawiki.version MediaWiki version at the time of the event, e.g. "1.38.0-wmf.18".
mediawiki.is_production Whether the MediaWiki instance is considered to be running in production, e.g. mediawikiwiki.
mediawiki.is_debug_mode Whether the MediaWiki instance's EventLogging extension is considered to be running in debug mode.
mediawiki.database The name of the database used by the MediaWiki instance.

See https://www.mediawiki.org/wiki/Manual:$wgDBname

mediawiki.site_content_language The site content language, formatted as a language code.

See

mediawiki.site_content_language_variant The site content language variant, formatted as a language code.

See

performer.is_logged_in Whether the user is currently logged in.
performer.id The ID associated with the user account.

User must be be logged in for the property to appear.

performer.name The username associated with the user account.

User must be be logged in for the property to appear.

performer.session_id Eighty uniform random bits formatted as a string of twenty hexadecimal digits. Identifies a single user session.
performer.pageview_id Eighty uniform random bits formatted as a string of twenty hexadecimal digits. Identifies a single pageview within a user session.
performer.groups Groups that the user is in at the time of the event.
performer.is_bot Whether the user is considered a bot at the time of the event.

A user is considered a bot if they are in the "bot" group and they have the "bot" right.

performer.language The user's language at the time of the event, formatted as a language code.

User must be logged in for attribute to appear.

See

performer.language_variant The user's language variant at the time of the event, formatted as a language code.

User must be logged in for attribute to appear.

See

TODO: Documented how the user's language variant is determined.

performer.can_probably_edit_page True if the current page at the time of the event is editable by the user, given the user and page's permissions.

User must be be logged in for attribute to appear.

performer.edit_count Number of edits that the user has performed at the time of the event.

User must be be logged in for attribute to appear.

performer.edit_count_bucket The number of edits that the user has performed at the time of the event placed into one five low-granularity buckets.

User must be be logged in for attribute to appear.

performer.registration_dt Datetime when the user account was registered.

User must be be logged in for attribute to appear.

custom_data[].data_type undefined
custom_data[].value undefined