Performance/Real user monitoring
Real user monitoring involves collecting RUM metrics from readers that browse the live Wikipedia.org site.
RUM metrics provide valuable insights in the following ways:
- How well Wikipedia is performing in terms of important things like how fast it loads (Google Web Vitals and other user experience metrics). This information helps us identify any issues or problems that may have occurred, so that we can fix them and ensure a great user experience for everyone visiting Wikipedia.
- It allows us to gain insights into how people from all over the world are using Wikipedia and how well it is working for them. We can understand their experience and performance while using Wikipedia, which helps us make improvements to ensure that it is enjoyable and efficient for users regardless of their location.
- It helps us understand the conditions in which our users access Wikipedia, such as the speed of their device's CPU and internet connection. This information allows us to ensure that Wikipedia performs well for everyone, regardless of their location or the type of device they are using. It helps us optimize the experience so that all users can access Wikipedia's knowledge without any issues or delays.
Metrics
The metrics include: Page Load Time, Time to First Paint, Time to First Byte, CPU Benchmark, Google Web Vitals, and much more.
The data is sliced by device (mobile/desktop skin, browser family), geography (continent, country), and several other factors.
Dashboards
- Real user monitoring, introduction to the most important metrics, with several slices to help spot regressions.
- Navigation Timing breakdown, focus on a single metric, broken down by every available slice.
- Navigation Timing by continent, compare a metric between countries and continents.
- Navigation Timing drilldown, investigate regressions by analyzing a metric in context.
- Google Web Vitals, the same data plotted using Google's thresholds for its Google Web Vitals metrics.
- CPU Benchmark, compare the CPU performance of end-user devices.
- AS Performance Report (introduction post) web performance metrics correlated internet providers and mobile/desktop device benchmarks.
Check Grafana for full list of dashboards that explore the navtiming dataset.
How we collect metrics
We collect performance metrics from page views using the NavigationTiming extension for MediaWiki. This extension automatically activates on a sample of MediaWiki page views in production. The values are provided to MediaWiki via the W3C Navigation Timing and Paint Timing APIs that are built-in to every web browser.
The NavigationTiming exension sends the collected metrics via a beacon back to Wikipedia.org, and then through a couple layers make their way to Graphite and Prometheus. For more details about the metric flow, refer to the infrastructure runbook and diagram.
How to add a new metric
If you have new metric that needs to be added there are usually a couple of steps that you need to follow.
- First, you need to collect the actual metric in the https://www.mediawiki.org/wiki/Extension:NavigationTiming extension. Do that by clone the repository, add the new metric and add tests for it. Depending on when the metric appear in the browser you can send it with the Navigation Timing event or create a new event for your metric.
- Create a new schema or edit the navtiming schema in the schema repository.
- Update navtiming.py and push the metric to Prometheus.
- Create/update a graph/dashboard in Grafana to to make the metric visible.
See also
- Performance/Metrics#CruX, Chrome User Experience dashboard.