This page contains forward-looking content and may not accurately reflect current-state or planned feature sets or capabilities.
What core capabilities will the Metrics Platform deliver?
- Simpler set of tools for on-wiki instrumentation (Improvement).
- Off-wiki instrumentation support (NEW)
- Feature flagging (NEW)
- Experimentation capabilities. (NEW)
What are the benefits of the Metrics Platform?
- Generate a complete dataset quickly. Get up and running with analytics quickly with 18 standard contextual attributes collected automatically.
- Customisation and Enrichment Create your own custom events, and capture bespoke data with each event all from within your own codebase.
- Enhanced data modeling. Simplified data modelling by modelling upon ingestion. This will allow you to set up multiple views per datastream, have consistent modelling between data streams and use pre-optimised models for BI and data visualisation use-cases.
What does the Metrics Platform enable and why is that valuable to the Movement?
It enables several key initiatives:
- Trusted Datasets - Through a standardised mechanism of ingestion.
- Metrics that matter - easily and verifiably collect and blend data.
- Knowledge equity - Completes the set analytics tooling for wiki projects outside WMF.
Metrics Platform Development
How is the development of the Metrics Platform being approached?
We are using an iterative and experimental approach in order to learn and better understand the complexities, challenges, and opportunities in developing a Metrics Platform. Due to the complexity of the infrastructure changes that were needed to be made and staffing constraints we have faced up until now progress has been slow. The metrics platform is being developed in three distinct phases.
- Phase 1 - Infrastructure and Client Libraries Goal: Develop the required Metrics Platform Client Libraries and make the necessary changes to Event Platform to enable simple instrumentation creation.
- Phase 2 - Control Plane for Instrumentation Management Goal: Enable the creation, management and deployment of Event Streams and Data Streams from a centralised UI.
- Phase 3 - Experiment Platform for experiment configuration. Goal: Enable streamlined experiment creation and configuration through the use of feature flags and traffic segmentation using the metrics platform libraries and control plane.
How will the Metrics Platform work?
- Engineers will use the Metrics Platform Clients in their codebases to instrument events in place. On inclusion, the Metrics Platform can automatically collect a standard set of data points as well as the option to include custom data fields in a custom data object.
- Engineers & Analysts will enable the instrument to start collecting data from their feature from within the Metrics control plane. From here they will be able to see and configure which instruments are collecting data, what data they are collecting, and with what sample rate.
- Analysts & PMs will be able to select the fields from the instrument they want to include in their tables, where they want those tables to be written, and what they want to be included in the dataset documentation in datahub.
- These configurations will then be picked up by EventGate so that it can capture the events, instantiate the desired data model and write events to the desired storage destination.
- Analysts & Data Users can now query the data Based on the choices made earlier configured data will get written to tables in the desired location and made available for querying.
Where are we currently focusing:
|Focus: Metrics Platform Client Libraries||Focus: Metrics Platform Control Plane||Focus: Experimentation Platform|
(1) Development and adoption of client libraries to generate MP Events.
(2) Test the Metrics Platform client through
|Key Milestones:(1) Develop a Control Plane GUI which manages:
(1) Integrate feature flag functionality into all the Metrics Platform libraries.
(2) Implement partial traffic segmentation through the Control Plane and Feature flags
(3) Deliver a mechanism to run AB tests using MP libraries and Control Plane.
What will be different if I use the Metrics Platform for instrumentation?
- There’s no need to deploy a new schema. Instrumentation happens in the codebase in accordance with the metrics platform clients' rules.
- Enablement of data collection from an instrument is performed by the Feature Engineer in consultation with an Analyst.
- All configuration of the event stream happens in the control plane.
- Data Models can be applied to event streams directly for faster and more flexible querying.
|Process Steps||Metrics Platform||Current Process|
|Number of steps to start collecting data.||3||10-12|
|Time taken to start collecting data.||1 Day||6-10 Weeks|
|Requires a hard Schema.||No||Yes|
|Can be included in volunteer projects?||Yes||No|
|Auto Stream Expiry||Supported||No|
|Auto Privacy Enforcement||Supported||No|
Can I still use the existing process for instrumentation?
Yes. Backwards compatibility with the current Event Platform is and will be maintained if you and your team prefer to use this process.
Will I be forced to migrate all my existing schemas and instrumentation to the metrics platform?
No. Backwards compatibility with the current Event Platform is and will be maintained and so your existing data will continue to be collected.
If we no longer create schemas, where will instrumentation documentation go?
Recently we introduced DataHub, a data catalog that provides information about the various different datasets, fields, etc to make it easier for users to discover, understand, and monitor data changes over time. Configuration of event streams and data stream through the Metrics Platform Control Plane will allow you to directly surface documentation, lineage and metadata about tables you instantiate to DataHub. Documentation for the fields included by default will automatically surface, while field descriptions for custom data would need to be entered in the feature codebase, in the control plane, or from within Datahub. To learn more about DataHub see Data Engineering/Systems/DataHub - Wikitech (wikimedia.org).
When should I start creating new instrumentation with the Metrics Platform?
This depends on what platform you are instrumenting for:
|MediaWiki PHP Only||Ready - Documentation Needed|
|Java - Android||In Progress T281772|
|Swift - iOS||In Progress T281768|
What is the process for creating new instrumentations?
See: Metrics Platform#Quick Start for updates.
If I migrate existing instrumentations, how does that impact existing Analytics scripts, visualisations, etc.?
Not at all, if you leave the existing instrumentation in place while using the metrics platform. If you chose to deprecate existing instrumentation and re-instrument using Metrics Platform - you will need to change your queries to match your new data model. Your data model can result in the exact same output if desired - in which case you would just need to change the table names where appropriate.
How do we know that data quality is consistent with existing instrumentations?
Let’s break down “data” into two parts: 1) common contextual attributes, e.g. bucketized user edit count, page title and namespace, etc.; and 2) instrumentation-specific data. In our experience, a lot of instrumentation collects more of 1 than of 2. Regardless, the Metrics Platform (MP) aims to provide the same common contextual attributes that your instrumentation requires but in a way that is more convenient and consistent across instrumentations. Further, the MP does not alter the value of the instrumentation-specific data that it is passed.
You can think of MP as an opinionated Event Platform (EP) client. With that in mind, you can expect the same event rate and data quality regardless of whether you use the MP or not.
Metrics Platform & Events
How Does the Metrics Platform Relate to the Event Platform?
The Event Platform allows you to stand up instruments to capture rich data to answer equally rich questions. For example, you can answer a question like "How frequently does the user perform this action?", "How frequently does a user enter the funnel and then leave it it?", and "How many unique users performed that action?"
The Metrics Platform is an opinionated Event Platform client.
Firstly, the Metrics Platform owns and maintains the schema with which your events will be validated – we call it the monoschema – so you do not have to create a new schema for each new instrument. The monoschema has properties for the most common/instrument-agnostic data that teams might need to answer their questions, e.g. session ID, pageview ID, the namespace and title of the current page, and it also has a property that can hold instrument-specific data.
Secondly, the Metrics Platform works with event names and data rather than streams and events. That is, rather than writing an instrument that submits events to a specific stream, you write an instrument that dispatches events to zero or more interested streams.
Will it replace the current event platform?
No it will not. The Metrics Platform represents a new model of data collection to make the instrumentation process faster and easier to use. It works with the Event Platform to achieve this by making the process more accessible and devoid of the organisational complexities and dependencies. This allows it to be used by community members and teams without Data Engineers.
Experimentation & Feature Flag FAQs - WIP
What is a Feature Flag?
Features Flags enable you to change your products behavior from a central location without requiring an entirely new deployment. For example, turn on/off a change to a toolbar or change the placement of buttons in a UI. Engineers and PMs can set a global value for everyone, use traffic rules to assign values to user demographics, and run experiments between different implementations of a feature.
Why Feature Flags?
Flags are required for us to support the following sets of use cases.
- Decouple code deploys and releases
- Kill switch for high risk features.
- Allow for the gradual rollout of features within a single wiki.
- Enable targeting and segmentation of which users see which features
- Validating releases with A/B tests
What is Experimentation?
Experimentation is a way to optimize a digital experience towards particular goals, as measured by actual production usage. Basically, we have an idea that if we change something in our product, our users will like it, and they will use it more. The important word here is 'optimize'. Experimentation works well when you have lots of users, and a good baseline. Use an experiment when you need to quantify the impact of product changes (eg. click rate, conversion rates, other metrics) or when you need to validate a solution.
Experimentation allows us to learn quickly and helps us develop the right kinds of experiences to our community. As our product portfolio grows we need to develop a strategy to systematically validate that our decisions and investments are moving us in the right direction to address the movement's goals. Although experimentation can seem straightforward, without a consistent and equitable mechanism to encode a set of best practices the risk of making inaccurate conclusions is high. An experimentation strategy is needed to ensure we have a path for conducting equitable, sustainable, scalable experiments that produce meaningful results in a self-service way.
How could experiments and feature flags work?
- Feature Developers use the Metrics Platform libraries to wrap features and functionality in feature flags.
- Once deployed, Tech leads & Product managers can switch on, off or select how much traffic gets served wrapped features by adjusting configurations and sending this to the Metrics Platform.
- Client requests a page from us.
- The application pings the Metrics Platform to ask which features to render, which the metrics platform determines based on the configuration set by the user.
- The desired front end then gets served to the client.
- A header gets set for which feature flags are active. Then can then get analyzed to determine if a user was part of an experiment.
- Experiment observations can then be sent back to the metrics platform for analysis.