Performance/Guides/WebPageReplay alert

From Wikitech

We use Grafana alerts for WebPageReplay tests to get notified of potential frontend performance regressions, based on the metrics we collect from the Browsertime and WebPageReplay infrastructure..

Meta

Alert setup

We use Mann Whitney U test to find out if a regression is statistically significant and then we alert if Cliffs Delta is larger than 0.3.

We have alerts for FirstVisualChange and Largest Contentful Paint that will fire if three or more pages have a significant regression. We run tests using Firefox and Chrome for desktop and emulated mobile Chrome for mobile. The tests runs against a baseline test. The baseline is collected every Sunday and throughout the week we test against that baseline.

We do 21 runs for all URLs that we test to get statistically significant data.

The main test repo hold the URLs for all pages tested.

You find all the alerts under Qualtity and test in Grafana

WebPageReplay alert fired

Our WebPageReplay tests measures the front end performance of Wikipedia (using a WebPageReplay proxy). If an alert fires it can be caused by:

  • A front end performance regression of Wikipedia
  • A regression in the browser that is used for the test
  • Instability on the server that runs the tests

Front end performance regression

  1. Go to the WebPageReplay alert Grafana dashboard to see/verify the alert.
  2. Go to the individual page dashboard and use the zoom in on the regression. Try to find the time of the regression (+- 2 hours or something like that). Check all tested URLs and see if they all have the regression.
  3. Verify the regression on Browsertime/sitespeed.io tests that runs direct against Wikipedia and check if you can see anything in the RUM data (that normally lags since we switch browser versions fast, and for users it takes time).
    1. If you can't find anything in the other tools, check if its a browser regression or a test server regression.
  4. Check Server Admin Log to see if there's been a change that correlate to the regression.

If you can verify that it is a regression, create a Phabricator task in and include everything you know. Please take screenshots of the dashboards and include links. If you could identify the code change that caused the change, please include the team/person in the issue.

Browser performance regression

  1. Go the the dashboard for WebPageReplay tests
  2. Make sure the domain, page and browser matches the alert that fired (=you are looking at the right data).
  3. Zoom in using the time dropdown, use the last 24 hours or two days, make sure the regression happened within that time window
  4. Click on Show each tests and wait a couple of seconds until you see the green vertical lines appearing on the graphs.
  5. Hover the mouse on the green lines before the regression and after the regression. Hovering will show a screenshot of the test and what versions of sitespeed.io and browser that was used when the test was executed. It will look something like this: 20.3.0 - 95.0.4638.54 The first part is the sitespeed.io version and the second part is the browser.
  6. Verify that it is the exact same browser version before the regression and after the regression
  7. If the browser version differ, verify the regression on all tested URLs and check if you can see the same thing on the tests running without WebPageReplay.

If we can see that the browser caught the regression we can rollback the version running WebPageReplay (look at the changelog to see what sitespeed.io version that includes what browser version) to 100% verify the regression. If the regression is verified, you should create an upstream bug for the browser.

Test server performance regression

If the regression is on emulated mobile, make sure the dashboard type is emulatedMobile and Test type is webpagereplay in the dashboard. The default links are for desktop.

  1. Check the standard deviation of the CPU benchmark it should be something like 1 ms.
  2. Look at the min/median/max values of the CPU benchmark.
  3. If the standard variation is high contact the performance team that need to deploy the tests on a another server.