This is the runbook for WebPageTest alerts.
WebPageTest alert fired
Our WebPageTests tests measures the front end performance of Wikipedia, going to our Wikipedia servers If an alert fires it can be caused by:
- A performance regression of Wikipedia
- A regression in the browser that is used for the test/or WebPageTest auto updated
- Instability on the server that runs the test
- Go to the WebPageTest alert Grafana dashboard to see/verify the alert
- Go to the individual page dashboard and use the zoom in on the regression. Try to find the time of the regression (+- 2 hours or something like that). Check all tested URLs and see if they all have the regression.
- Is the TTFB the same before and after the regression? Then the regression is about front end. If the TTFB differ, check if the data is server from the same data center.
- Verify the regression on WebPageReplay and check if you can see anything in the RUM data (that normally lags since we switch browser versions fast, and for users it takes time).
- Check Server Admin Log to see if there's been a change that correlate to the regression.
If you can verify that it is a regression, create a Phabricator task in and include everything you know. Please take screenshots of the dashboards and include links. If you could identify the code change that caused the change, please include the team/person in the issue.
Browser performance regression
- Go the the dashboard for WebPageTest tests
- Make sure the wiki, page and location matches the alert that fired (=you are looking at the right data).
- Zoom in using the time dropdown, use the last 24 hours or two days, make sure the regression happened within that time window
- Click on Show each tests and wait a couple of seconds until you see the green vertical lines appearing on the graphs.
- Hover the mouse on the green lines before the regression and after the regression. Hovering will show a screenshot of the test and what versions of sitespeed.io and browser that was used when the test was executed. It will look something like this: 20.3.0 - 95.0.4638.54 The first part is the sitespeed.io version and the second part is the browser.
- Verify that it is the exact same browser version before the regression and after the regression.
- If the browser version differ, verify the regression on all tested URLs.
Browsers automatically updates on WebPageTest when the new version is available, there's no way for us to revert them. But we can use the sitespeed.io and WebPageReplay tests to revert to an old version.
Test server performance regression
If the regression is on emulated mobile, make sure the dashboard type is webpagetestEmulatedMobile in the dashboard. The default links are for desktop.
- Check the median of the CPU benchmark. It should differ 1-2 ms between runs ad the most.
- If the median variation is high contact the performance team that need to deploy the tests on an another server.