Jump to content

WebPageReplay/Runbook

From Wikitech

This is the runbook for deploying new versions of WebPageReplay/Browsertime/Firefox/Chrome.

Meta

Update to new version

Firefox and Chrome are bundled in the sitespeed.io Docker container. When there's a new version, check the changelog and update like this:

  1. Clone the Gerrit repo: ssh://USER@gerrit.wikimedia.org:29418/performance/synthetic-monitoring-tests.git
  2. Go into the new cloned repo: cd synthetic-monitoring-tests
  3. Create a new branch named with the new version number: git checkout -b my_new_version
  4. Edit the run.sh file and change one of the first lines that looks something like this: DOCKER_CONTAINER=sitespeedio/sitespeed.io:10.3.2
  5. Change the version number (10.3.2 in this case) to your new version.
  6. Commit the file and send in the commit for a review.
  7. When the change is approved, the new version will be automatically picked up the next iteration of running the tests.

Deploy first time

On a new server you need to install the dependencies (Docker) and follow the instructions.

Debug missing metrics

If metrics stops arriving in Grafana the reason can be two different things: Either something is wrong with Graphite or something is broken on the WebPageReplay server. First check the #synthetic-tests-error-reporting Slack channel and see if you have any related errors in that channel.

If you can't see any errors, then focus on the WebPageReplay server. You can see which server that run which test at WebPageReplay#Servers. Log into the server and check of if any tests is running. Do that by running docker ps

If everything is ok it should look something like:

CONTAINER ID   IMAGE                              COMMAND                  CREATED         STATUS         PORTS     NAMES
1ae4f0e6946c   sitespeedio/sitespeed.io:35   "/start.sh --config …"   6 minutes ago   Up 6 minutes             sitespeedio

We start (and stop) the container for every new test so a container should have been created for maximum a couple of minutes ago. If the created is a couple of hours (or days) ago, something is wrong. The container is stuck, probably something happened with the browser. You can fix the test by killing the container: docker kill sitespeedio


The root cause (that the container got stuck) is still there but restarting the test usually works. After you killed the container, wait a minute and check again that a new container is running by using docker ps

Check Grafana the coming hour to make sure it doesn't get stuck again.