WebPageReplay/Runbook
This is the runbook for deploying new versions of WebPageReplay/Browsertime/Firefox/Chrome.
Meta
- Issue tracker (Phabricator): synthetic-performance-testing
- Documentation: WebPageReplay
Update to new version
Firefox and Chrome are bundled in the sitespeed.io Docker container. When there's a new version, check the changelog and update like this:
- Clone the Gerrit repo: ssh://USER@gerrit.wikimedia.org:29418/performance/synthetic-monitoring-tests.git
- Go into the new cloned repo:
cd synthetic-monitoring-tests
- Create a new branch named with the new version number:
git checkout -b my_new_version
- Edit the run.sh file and change one of the first lines that looks something like this:
DOCKER_CONTAINER=sitespeedio/sitespeed.io:10.3.2
- Change the version number (10.3.2 in this case) to your new version.
- Commit the file and send in the commit for a review.
- When the change is approved, the new version will be automatically picked up the next iteration of running the tests.
Deploy first time
On a new server you need to install the dependencies (Docker) and follow the instructions.
Debug missing metrics
If metrics stops arriving in Grafana the reason can be two different things: Either something is wrong with Graphite or something is broken on the WebPageReplay server. First check the #synthetic-tests-error-reporting Slack channel and see if you have any related errors in that channel.
If you can't see any errors, then focus on the WebPageReplay server. You can see which server that run which test at WebPageReplay#Servers. Log into the server and check of if any tests is running. Do that by running docker ps
If everything is ok it should look something like:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1ae4f0e6946c sitespeedio/sitespeed.io:35 "/start.sh --config …" 6 minutes ago Up 6 minutes sitespeedio
We start (and stop) the container for every new test so a container should have been created for maximum a couple of minutes ago. If the created is a couple of hours (or days) ago, something is wrong. The container is stuck, probably something happened with the browser. You can fix the test by killing the container: docker kill sitespeedio
The root cause (that the container got stuck) is still there but restarting the test usually works. After you killed the container, wait a minute and check again that a new container is running by using docker ps
Check Grafana the coming hour to make sure it doesn't get stuck again.