Metrics Platform/Analytics/Fragments

From Wikitech

Schema Fragments

Index of schema fragments for referencing in schemas now
Fragment $ref
Common /fragment/analytics/common/1.0.0#
App Identifiers /fragment/analytics/app_identifiers/1.0.0#
Web Identifiers /fragment/analytics/web_identifiers/1.0.0#

In the schema, reference the fragment(s) you wish to use and list which fields are required in every event.

Note: You must always reference the /fragment/analytics/common fragment in your schema. This fragment provides the client_dt field which is assigned by EventLogging on the web and the Event Platform Client on iOS (and soon Android). It is recommended to include the client_dt field in the required list.

Index of work-in-progress schema fragments for referencing in schemas in the future
Fragment $ref
Activity sequencing (WIP) /fragment/analytics/activity_seq/1.0.0#
MediaWiki Common (WIP) /fragment/analytics/mediawiki_common/1.0.0#
User (WIP) /fragment/analytics/mediawiki_user/1.0.0#
Page (WIP) /fragment/analytics/mediawiki_page/1.0.0#
User Interface (UI) (WIP) /fragment/analytics/ui/1.0.0#
A/B Testing (WIP) /fragment/analytics/ab_testing/1.0.0#
Campaign attribution (UTM parameters, WIP) /fragment/analytics/utm_parameters/1.0.0#

Example 1

Suppose we're running an A/B test on a new default skin for anonymous users and we are interested in measuring session length and average number of visited articles per session.

The schema would use the following fragments: core identifiers, page, UI, and A/B testing via:

allOf:
    - $ref: /fragment/analytics/common/1.0.0#
    - $ref: /fragment/analytics/web_identifiers/1.0.0#
    - $ref: /fragment/analytics/mediawiki_page/1.0.0#
    - $ref: /fragment/analytics/ui/1.0.0#
    - $ref: /fragment/analytics/ab_testing/1.0.0#

And the following fields would need to be included in the one (1) event logged by the instrument on every page load:

required:
    - client_dt
    - web_session_id
    - web_pageview_id
    - page_namespace
    - ui_screen
    - test_name
    - test_group

The remainder of this section describes these fields and others in those fragments.

Identifiers

App Identifiers

Use the following to include the app_identifiers fragment in your schema for Android and iOS mobile apps:

allOf:
    - $ref: /fragment/analytics/app_identifiers/1.0.0#
app_identifiers
app_install_id (string)
Identifies an install of the app and persists across all sessions. When the user uninstalls the app and re-installs it, a new app install ID is randomly generated.
app_session_id (string)
Identifies an app session: a cluster of actions taken by the user in the app within a limited period of time. A session ID is generated the first time it is requested by the instrumentation code, which will usually be soon after the user launches the app. A new session ID is generated anytime the app has been inactive (that is, in the background state) for at least 15 minutes or has been forcibly stopped by the OS or the user.

Web Identifiers

Use the following to include the web_identifiers fragment in your schema for MediaWiki-based desktop and mobile websites:

allOf:
    - $ref: /fragment/analytics/web_identifiers/1.0.0#
web_identifiers
web_session_id (string)
Identifies a web session: a cluster of actions taken by the user on a website within a limited period of time. A session ID is generated the first time it is requested by the instrumentation code, which is usually the first time the user visits the website. In the current implementation, this ID is shared across windows, tabs, and page views in the same browser. The ID is normally regenerated after the browser is shut down; however, if the browser's "restore previous session" feature is used when it restarts, the previous ID is retained. Interactions across multiple pages in the same web session may be linked together via this identifier.
web_pageview_id (string)
Identifies a single web page view (visit). This identifier is randomly generated the first time it is requested by the instrumentation code on any page view and persists for the lifetime of the page. When the user navigates to another page or refreshes/reloads the page, this identifier disappears and a new one is regenerated (when needed). Different visits to the same page will yield different pageview IDs (also called tokens). Interactions with multiple features (instrumented separately) on the same web page may be linked together via this identifier.

Sequences

Use the following to include the activity sequencing fragment in your schema:

allOf:
    - $ref: /fragment/analytics/activity_seq/1.0.0#
Activity sequencing (for reconstructing sequences of events)
activity_id (string)
Identifies a sequence of actions in the same context or funnel. In the past, teams have used terms like "session ID" and "sub-session ID" to refer to a set of connected events, such as interacting with a widget. This identifier is useful for grouping together impressions with corresponding clicks, and for grouping together steps in a process such as making an edit. Activity identifier can be randomly generated or a counter.
sequence_id (integer)
Starting at 1, this is a counter for reconstructing the order of events in the same activity. For a variety of reasons we cannot trust the timestamp of receipt or the client-side timestamp of when the event was generated for putting events in order. In cases where the exact sequence of events needs to be established, this identifier can be used to record which event happened 1st, which happened 2nd, and so on.

For example, suppose the user is making an edit. We group the actions performed in this activity with activity_id. In the old way of doing things it would be feature-specific "editing_session_id". As the user interacts with various (instrumented) features/elements in the editor, previews the edit, continues editing, and finally publishes the edit, specific data about all of those interactions can be tracked in schema-specific fields, but the order in which those interactions happen is recorded in sequence_id.

Data

User

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/user/1.0.0#

Information about the user associated with the event

Information about the user generating the event
is_anon (boolean)
Whether user is logged-in (false) or anonymous (true)
user_id (integer)
User's MW user ID; 0 if user is anonymous. User ID is specific the wiki that the event came from.
user_name (string)
Cross-wiki username
user_edit_count (integer)
The total number of edits by the user at the time of the event. Growth team retrieves this with mw.config.get( 'wgUserEditCount' ) to record it for their experiments. May be useful as a proxy for experience at the time of the event.

Page

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/page/1.0.0#

Information about the page associated with the event

Information about the page the event generated on
wiki_db (string)
Database name of the wiki (e.g. "enwiki", "commonswiki")
page_id (integer)
Page's numeric ID in MediaWiki
page_ns (integer)
Page's namespace code in MediaWiki (e.g. 0 for Main/Article, -1 for Special)
page_title (string)
Title of the page
page_is_redirect (boolean)
Whether the page is a redirect or not at the time of the event

User Interface

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/ui/1.0.0#

Information about the UI associated with the event

Information about the interface the user saw when the event was generated
ui_mw_skin (string)
MediaWiki skin name (e.g. "Vector", "MinervaNeue", "Modern") at the time of the event; only applicable on MediaWiki, not on mobile apps
ui_color_mode (string, enum)
Mode at the time of the event, currently only applicable on mobile apps, but Web is experimenting with it for MediaWiki.[1] One of: "light", "sepia", "dark", "black", "night"
ui_text_scale (integer)
Only applicable for mobile apps where the user chooses from predefined text scales. 0 is for the middle (application default), -1 is for the smaller size while 1 is for the larger size. The actual size in points or pixels will vary by app and device, so we record a relative scale.
ui_screen (object)
Information about the screen, such as dimensions, detailed below:
ui_screen.width_px (integer)
Width of the screen in pixels
ui_screen.height_px (integer)
Height of the screen in pixels

A/B Testing

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/ab_testing/1.0.0#

Information about the A/B test (experiment) associated with the event

Information about the experiment the user was enrolled in when the event was generated
tests (array)
Any and all A/B tests the user is enrolled in at the time of the event. If the array is empty, the user was not in any A/B tests. If there is only one item, the user was in exactly one A/B test. If there are two or more items, the user was in several A/B tests.

Each item in the tests array is an object identifying enrollment in a single A/B test with the following fields:

name (string)
Name of the A/B test the user is enrolled in (e.g. "Desktop Redesign (Phase 3)" or "desktop-redesign-3"
group (string)
Name of the group (sometimes called "bucket") the user was randomly assigned to – e.g. "control", "variant-a", "variant-b", "variant-c"

Examples:

"tests": []

"tests": [ { "name": "growth-homepage", "group": "control" } ]

"tests": [ { "name": "growth-homepage", "group": "variant-1" }, { "name": "growth-help-panel", "group": "variant-2" } ]

Campaign Attribution

Use the following to include this fragment in your schema:

allOf:
    - $ref: /fragment/analytics/utm_parameters/1.0.0#

Information about the UTM parameters associated with the event

Information about where the user came from
utm_source
Identifies which site sent the traffic, and is a required parameter. For example: "Wikipedia", "Twitter", "Facebook"
utm_medium
Identifies what type of link was used such as "socialmedia" or "email"
utm_campaign
Identifies a specific product promotion or strategic campaign. For example: "app_marketing_20200704" or "india_awareness_2017"
utm_term
Identifies search terms (e.g. "mobile+app")
utm_content
Identifies what specifically was clicked to bring the user to the site, such as a banner ad, a text link, or a sidebar button. It is often used for A/B testing and content-targeted ads.

References

  1. en:User:Volker E. (WMF)/dark-mode