Performance/Guides/RUM Alert

From Wikitech

This is the runbook for RUM alerts (real user measurements performance alerts).

Meta

RUM performance regression alert

We alert on three different performance metrics: first paint, response start and load event end. Make sure that you use the alerting metrics in the dashboards when you try find out more about the regression.

Use these steps to try get more information about the regression:

  1. Identify when the regression started, so you can use that time stamp when looking in other dashboards. You should be able to see when it happened in the navigation timing alert dashboard
  2. Is the regression on desktop or mobile or both? Check the navigation metrics by platform dashboard
  3. Check different percentiles and different metrics to try to understand what has changed) in the navigation timing dashboard.
  4. Is the regression caused by one browser or by a specific browser version? Check the navigation timing by browser dashboard
  5. Do we get more or less metrics than before? Check the report rate by metric dashboard
  6. Can we see the regression using our synthetic tools? Look at the synthetic tests dashboard. If we can see the issue also on WebPageReplay we know it's a front end regression.
  7. Is the regression caused by a code change? Use time time stamp/ time span when you think the regression started and check the Server Admin Log and the Show sync-wikiversions toggle.

Create a Phabricator task and include everything you know. Please take screenshots of the dashboards and include links. If you could identify the code change that caused the change, please include the team/person in the issue.