Jump to content


From Wikitech

This is a living document trying to explain web performance words.

Word Abbreviation Description Dashboard Read more
Backend Performance The time it takes for the server to generate the page and start sending the HTML. Usually a user spend 20% of time in the backend and 80% in the frontend.
Bare metal server Bare metal servers or physical servers are dedicated machines that runs our synthetic performance tests. Using dedicated physical servers helps us getting stable metrics (we can pin the CPU speed) so we can find smaller performance regression. Running on cloud instances often introduce disturbance in metrics.
Chrome User Experience Report CrUX Google collects metrics from Chrome from all people who have "opted-in" by syncing with their Google account. These are used by Google Search as real-world signal from how a website performs in practice. The data Google collects is publicly available through the Chrome User Experience Report and through an API. We collect that data once a day from the API and store it in our time series database to make it easy to see trends. The data from Google is a 28 days rolling average. Crux dashboard https://developer.chrome.com/docs/crux
Content size / transfer size The size of an asset (HTML, JavaScript, CSS or image). The content size is the actual size when it is unpacked. The transfer size is the size over the wire (sent to the user). At Wikimedia we make sure all content that can be compressed are compressed when it's sent to the user so it can be downloaded faster.
CPU benchmark The CPU benchmark is a JavaScript loop that we run and measure how long time it takes. We do it for a small portion of our real users and in synthetic testing. The metric gives us insights of device performance for users around the world (and over time) and helps us configure our synthetic tests to run with the same performance. CPU benchmark dashboard Blog post about the CPU benchmark
CPU throttling In our synthetic tests (and potentially on real device testing) we can throttle the CPU to better match the same CPU as our users with slower devices. We also pin the CPU on our devices to make sure it runs with the same speed all the time.
Cumulative Layout Shift CLS Measuring visual stability because it helps quantify how often users experience layout shifts (content moving around on the screen). A low CLS makes the user experience better. CLS is one of the Google Core Web Vitals.
Element Timing API A browser API (only supported in Chrome(ium) at the moment) that lets you measure when elements are painted on the screen. We don't use it today but it's easy way to get more paint information from the browser. By adding the attribute elementtiming to an element, we can get the paint information in RUM. In our synthetic test we pickup the attribute independently of browser and get the paint information. https://developer.mozilla.org/en-US/docs/Web/API/Element/elementTiming


First Contentful Paint FCP First Contentful Paint (FCP) measures the time from navigation to the time when the browser renders and show the first bit of content from the DOM. FCP exist in Chrome, Edge, Firefox and Safari (actually Wikimedias performance team paid for the implementation in Safari) so this is the best cross browser metric we have today. We collect it from both real users and synthetic testing. Its a Google Web Vital metric. https://web.dev/articles/fcp
First Input Delay FID First Input Delay measures the time from when a user first interacts with your site (when they click a link, tap on a button etc) to the time when the browser is actually able to respond to that interaction. It used to be a Google Core Web Vital but was exchanged to Interaction To Next Paint. FID can still be a part of INP so it is stil important to keep it low. https://web.dev/articles/fid
First Paint This is when the first paint happens on the screen for the user (it can potentially include painting white on a white background but we have not seen that for Wikipedia). It was the first metric that gave us information on when something is painted on the screen. It's collected from real users and in synthetic testing It's not supported by Safari. For Wikipedia first paint is often a paragraph on the page.
First view tests First view tests is our synthetic test where we run our test with an empty browser cache, simulating that the user visits a Wikipedia page for the first time.
First Visual Change First Visual Change is when something first is painted on the screen. It's measured using a video recording of the screen and is collected using synthetic testing.
Frontend performance The time it takes for the browser to parse and create the page.
Google (core) Web Vitals Google Web Vitals is Google version of guidance for quality signals that are essential to delivering a great user experience on the web. Google says these metrics can affect ranking algorithms. The core web vital metrics are: Largest Contentful Paint, Cumulative Layout Shift and Interaction To Next Paint. There are also two more metrics Time To First Byte and First Contentful Paint that are (only) Web Vitals. https://web.dev/articles/vitals
Interaction To Next Paint INP Interaction to next paint observes the latency of all click, tap, and keyboard interactions with a page throughout its lifespan, and reports the longest duration, ignoring outliers. A low INP means the page is consistently able to respond quickly to the vast majority of user interactions. We do not collect INP from our real users but we plan to do that in T327246. We can collect INP in synthetic testing using user journey tests and use TBT as a proxy. INP is part of the core web vitals. https://web.dev/articles/inp
Largest Contentful Paint LCP The Largest Contentful Paint (LCP) metric reports the render time of the largest content element visible in the viewport. We collect LCP from real users and our synthetic tests. The largest paint is often a paragraph or an image for us but it can other elements too. It depends on the page and the users viewport size. See this image for an example of LCP. LCP is a Google Core Web Vital. https://web.dev/articles/lcp
Mann Whitney U Test We use the Mann Whitney U Test to know if we have a performance regression in on our synthetic tests. The Mann-Whitney U test doesn't assume our data follows a specific distribution (like the normal distribution) and is good to analyse performance metrics.

The test tells us if a regression is statistically significant. That means that the results or changes observed in our data are unlikely to have occurred by random chance alone. In simpler terms, it's a measure that tells us whether the changes or differences we see in our metrics (like first visual change) are meaningful and not just due to random fluctuations.

Navigation Timing API The Navigation Timing API was the first API in the browser that reported some kind of performance metrics and the first one we adopted (that's why our performance extension is called the Navigation Timing Extension). The API is focused on when the browser has finished some tasks and doesn't necessary tells anything what the user is experiencing. https://developer.mozilla.org/en-US/docs/Web/API/Performance_API/Navigation_timing
Network throttling When we run our synthetic tests we throttle the network connection. That means we try make the connection slower (and the same speed all time) to make our test simulate the performance our users with a slower type of connection. Slower connection also makes it easier to find regressions.
Performance device lab The performance device lab is mobile devices that we use for synthetic testing. We have tried two providers in the past and we plan to start with a new hosting provider sometimes 2024. Real devices makes it easier to get the same performance as our users and we try to match the users with slower devices. That way we can make sure that our changes do not create performance regressions for those users.
Performance monitoring We monitor the performance of Wikipedia in two ways: Real user measurements (RUM) with metrics that we collect from real users 24/7 and with synthetic tests that also runs on all hours. The metrics is stored in time series databases and visualised in our Grafana instance.
Real user measurements RUM Collect performance metrics from real users that use Wikipedia. With RUM we can collect metrics that are offered from browser APIs. We use the Navigation Timing extension to collect metrics from some of our users. RUM dashboard Performance/Real user monitoring
Synthetic testing Synthetic testing (or lab testing) is collecting web performance metrics using web browsers in a controlled environment. Since you are in full control, you can record a video of the screen, collect metrics by analysing the video and collect timeline/trace logs from the browser to get indeep information. We run synthetic tests using Chrome and Firefox to find performance regressions. Synthetic test dashboard Performance/Synthetic testing
Speed Index The Speed Index is the average time at which visible parts of the page are displayed. It is expressed in milliseconds and dependent on size of the view port. We get Speed Index from our synthetic testing, but it has been to complicated to implement in a browser API so in Chrome Largest Contentful Paint is the substitue.
Time To First Byte TTFB The time it takes for the server to deliver the first byte.
Total Blocking Time TBT The blocking time of a given long task is its duration in excess of 50 ms. And the total blocking time for a page is the sum of the blocking time for each long task that happens after first contentful paint. This is a useful metric in synthetic measurement to know potential blocking time on the main thread. TBT is dependent of the CPU speed so it's important to measure using the same CPU speed as our users.
Trace/timeline log In synthetic testing we can collect the browser timeline trace (the data that you can see in Chromes performance panel) and that's the most useful data that you can get when debugging your frontend code. We can collect that Chrome and from Firefox. Tracing/timeline add some overhead to the browser and will increase other metrics but it's useful because it gives us so much more to work with and you can configure what kind of data that you want.
User Journey tests User journey tests are synthetic tests where we simulate the user visiting multiple pages and interacting with the pages. Historically we haven't focused so much on user journeys, however they match more what real users actually do. User Journey login dashboard
User Timing API A browser API that let the developer create own timer metrics to measure performance. In RUM we collect some metrics using the API (mwStartup and mwCentralNoticeBanner). If you create a new measurement to collect from our real users, we need to adjust our backend code to receive the metric. In our synthetic testing the metric will be automatically picked up. https://developer.mozilla.org/en-US/docs/Web/API/Performance_API/User_timing
Visual Metrics Metrics that tells us when content is painted on the screen. Collected using synthetic tests where we record a video of the browser screen that we analyse and that gives us metrics like Speed Index, First Visual Change and Last Visual Change. These metrics have historically been more correct than the paint metric reported by the browser APIs.
WebPageReplay WebPageReplay is a reply proxy created by the Chrome team that we use in our performance regression tests. It replay Wikipedia pages locally on a server so we can measure performance changes in the front end performance. WebPageReplay repo