Performance/Real user monitoring

From Wikitech
Jump to navigation Jump to search

Real user monitoring

Real user monitoring, which involves collecting performance metrics from users who visit Wikipedia, provides us with valuable assistance in the following ways:

  • How well Wikipedia is performing in terms of important things like how fast it loads (Google Web Vitals and other user experience metrics). This information helps us identify any issues or problems that may have occurred, so that we can fix them and ensure a great user experience for everyone visiting Wikipedia.
  • It allows us to gain insights into how people from all over the world are using Wikipedia and how well it is working for them. We can understand their experience and performance while using Wikipedia, which helps us make improvements to ensure that it is enjoyable and efficient for users regardless of their location.
  • It helps us understand the conditions in which our users access Wikipedia, such as the speed of their device's CPU and internet connection. This information allows us to ensure that Wikipedia performs well for everyone, regardless of their location or the type of device they are using. It helps us optimize the experience so that all users can access Wikipedia's knowledge without any issues or delays.

How we collect metrics

We collect real user performance metrics using the Navigation Timing extension. From the beginning we collected metrics only using the Navigation Timing API but today we collect many metrics from different API:s in the extension.

The metrics is collected from the user, beacon back to Wikipedia and the through a couple layers sent to Graphite/Prometheus. You can see the metric flow in our infrastructure diagram.

WMF Performance Team infrastructure

How to add a new metric

If you have new metric that needs to be added there are usually a couple of steps that you need to follow.

  1. First, you need to collect the actual metric in the extension. Do that by clone the repository, add the new metric and add tests for it. Depending on when the metric appear in the browser you can send it with the Navigation Timing event or create a new event for your metric.
  2. Create a new schema or edit the navtiming schema in the schema repository.
  3. Update and push the metric to Prometheus.
  4. Create/update a graph/dashboard in Grafana to to make the metric visible.


In 2023 we started to move all the metrics from Graphite to Prometheus:

Dashboard showing data from our real users, using Google thresholds on Google Web Vital metrics:

Real user monitoring dashboard that shows the most important metrics cut and sliced to make it easier to spot regressions:

CPU benchmark dashboard showing the CPU performance of our users: