Performance/Essay/Google Web Vitals (2021)

Background

Google Web Vitals is Google version of guidance for quality signals that are essential to delivering a great user experience on the web. Sometimes in July 2021 these web vitals will start to affect ranking algorithms when you use Google for search.

At the moment the focus are on three web vitals:

Largest Contentful Paint (LCP) - marks the point in the page load timeline when the page's main content has likely loaded - a fast LCP helps reassure the user that the page is useful.
First Input Delay (FID) - measuring load responsiveness and quantifies the experience users feel when trying to interact with unresponsive pages - a low FID helps ensure that the page is usable.
Cumulative Layout Shift (CLS) - measuring visual stability because it helps quantify how often users experience unexpected layout shifts - a low CLS helps ensure that the page is delightful.

There's also First Contentful Paint (when something is painted on the screen for the first time when you load a page) and Total Blocking Time (that measures when the main thread was blocked for long enough to prevent input responsiveness) that we measure. Google has hinted that First Contentful Paint may become a web vital and Total Blocking Time is easier to measure when we run our synthetic tests (instead of measuring First Input Delay).

You can read more about how Google Web Vitals work in Nicolás Peña Moreno Google Web Vitals deep dive presentation.

Collecting Google Web Vitals

We have three ways of making sure that Wikipedia meets Google criteria:

Collect data from the Chrome user experience report - the data/metrics that Google shares that they collect from users.
Collect data from synthetic performance tests - tests that we run using in a lab environment where we can get more valuable data/metrics from the browser.
Collecting data from our users - we can collect data from users that uses Chrome(ium) using our Navigation Timing extension.

Chrome User Experience report

The Chrome User Experience Report have metrics aggregated from users who uses Chrome and who have opted-in to syncing their browsing history, have not set up a sync passphrase, and have usage statistic reporting enabled. The data is a 28-day rolling average of aggregated metrics. This means that the data presented in the report at any given time is actually data for the past 28 days aggregated together. We collect that data once a day for 28 Wikipedia domains on mobile and 28 on desktop. We store the data for two years and you can check it out at https://grafana.wikimedia.org/d/t_bhsNGMk/chrome-user-experience-report

We collect four metrics: First Contentful Paint, Largest Contentful Paint, Cumulative Layout Shift and First Input Delay. When the Chrome User Experience report include more metrics we plan to include them too.

We have automated alerts for finding regressions in the Chrome User Experience report data for en.m.wikipedia.org and en.wikipedia.org: https://grafana.wikimedia.org/d/hJ5ZbhrMk/chrome-user-experience-alerts

Synthetic performance test

We continuously monitor the performance of Wikipedia using synthetic tools: we simulate a user accessing different Wikipedia pages on desktop and mobile and collect performance metrics. For the Google Web Vitals we use Chrome to collect those metrics. The synthetic tools makes it easier to identify what we are actually are measuring and give more in details of what is happening.

We have two different test strategies: We use Chrome (on desktop, emulated mobile and mobile) to access a couple of Wikipedia pages and measure Google Web Vitals (and other metrics). We also use a replay-proxy, where we record the content of a Wikipedia page, replay the page back to the browser from the proxy. That way we only measure changes in front end performance and that makes it easier to find regressions.

The synthetic tools helps us understand what Chrome actually is measuring. Here we highlight the largest contentful paint element with red.

You can see all the pages that we measure using the page drilldown dashboard.

We have automated alerts for Largest Contentful Paint for group 0, group 1 and enwiki using a replay proxy and directly through the WebPageTest dashboard.

Collecting metrics from our users

TODO