Performance/WebPageReplay

From Wikitech
Jump to navigation Jump to search

Background

In the path to have more stable metrics in our synthetic testing we have been trying out mahimahi, mitmproxy and WebPageReplay to record and replay Wikipedia. For mahimahi we have used patched version fixed by Gilles over Benedikt Wolters HTTP2 version of https://github.com/worenga/mahimahi-h2o. With mitmproxy and WebPageReplay we use the default version. The work has been done in T176361.

We have put mahimahi on ice because it is too much of hack to get HTTP/2 to work at the moment and WebPageReplay works out of the box with HTTP/2. mitmproxy worked fine but offered no clear benefit over WebPageReplay.

Replaying vs non replaying

Let us compare what the metrics looks like comparing WebPageTest vs WebPageReplay (Chrome).

Compare emulated mobile First Visual Change on Obama
Compare emulated mobile Speed Index on Obama
First Visual Change on Desktop using WPT vs WebPageReplay
Compare Speed Index on Desktop using WPT vs WebPageReplay

WebPageReplay setup

The current version run that collects the data for https://grafana.wikimedia.org/dashboard/db/webpagereplay is a Docker container with this setup:

https://github.com/soulgalore/browsertime-replays/tree/master/webpagereplay and the setup looks like this:

WebPageReplay setup

Running on AWS (instance type c4.large) we get stable metrics. We have tried running the same code on WMCS, bare metal and Google Cloud and in all those cases the metrics stability over time was at least 2 to 4 times worse than AWS. This difference remains unexplained and probably lies somewhere in AWS's secret sauce (custom hypervisor, custom kernel).

On desktop we can use 30 frames per second for the video and we get a metric stability span of 33 ms for first visual change. Which is 1 frame of accuracy, since at 30fps one frame represents 33.33ms. Speed Index's stability span is a little wider but still ok (less than 50 points but it depends on the content).

For emulated mobile, we can use 60 frames per second and get the same first visual change and Speed Index stability span as desktop at 30fps. We run the both desktop and mobile with 100ms simulated latency during the replays.

Server setup

Here are the details of our current setup. We currently run the tests on a C4.large VM on AWS using Ubuntu 16.

First time install

To make it work, we need to install three things:

  1. Install Docker and grant your user right privileges to start Docker.
  2. Install Node.js+npm (latest LTS)
  3. Install bttostatsv: npm install bttostatsv -g
  4. Install directory-to-s3: npm install directory-to-s3 -g
  5. Make sure to EXPORT AWS_ACCESS_KEY_ID and EXPORT AWS_SECRET_ACCESS_KEY with the S3 account ids for the user that will run the tests.
Access

Access the server:

ssh -i "webpagereplay.pem" ubuntu@50.19.169.203
Log

You can find the log file at /tmp/webpagereplay.log.

Job setup

We run this job as an infinite loop and when we wanna update the script, we remove the control file.

#!/bin/bash
CONTROL_FILE=/home/ubuntu/browsertime.run
LOG_FILE=/tmp/webpagereplay.log
exec > $LOG_FILE 2>&1

if [ -f "$CONTROL_FILE" ]
then
  echo "$CONTROL_FILE exist, do you have running tests?"
  exit 1;
else
  touch $CONTROL_FILE
fi

CHROME_RUNS=11
FIREFOX_RUNS=11
MOBILE_RUNS=7
CHROME_FRAMERATE=30
MOBILE_FRAMERATE=60
FIREFOX_FRAMERATE=30
WIKI=enwiki
CONTAINER=sitespeedio/browsertime:3.0.2
DOCKER_SETUP="--cap-add=NET_ADMIN --shm-size=2g -v /etc/localtime:/etc/localtime:ro --name browsertime"

function cleanup() {
  docker system prune --all --volumes -f
  docker pull $CONTAINER
}

function control() {
  if [ -f "$CONTROL_FILE" ]
  then
    echo "$CONTROL_FILE found. Make another run ..."
  else
    echo "$CONTROL_FILE not found - stopping after cleaning up ..."
    cleanup
    echo "Exit"
    exit 0;
  fi
}

function sendMetrics() {
    bttostatsv result/browsertime.json $GRAPHITE_PREFIX.$GRAPHITE_KEY https://www.wikimedia.org/beacon/statsv >> /tmp/s.log 2>&1
    sleep 3
    sudo mkdir -p data/$WIKI/$TYPE/$BROWSER/$LATENCY/$GRAPHITE_KEY/$DATE
    sudo mv result/* data/$WIKI/$TYPE/$BROWSER/$LATENCY/$GRAPHITE_KEY/$DATE
    directory-to-s3 -d data webpagereplay-wikimedia
    sudo rm -fR result
    sudo rm -fR data
    control
}
function runChrome() {
    GRAPHITE_KEY=$(basename $URL)
    DATE=`date '+%Y-%m-%d-%H-%M'`
    docker run $DOCKER_SETUP --rm -v "$(pwd)":/browsertime -e REPLAY=true -e LATENCY=$LATENCY $CONTAINER -b $BROWSER -n $RUNS --resultDir result --cacheClearRaw --videoParams.framerate $FRAMERATE --connectivity.alias $LATENCY --chrome.timeline true --gzipHar --videoParams.nice 8 --videoParams.createFilmstrip false --resultURL https://s3.amazonaws.com/webpagereplay-wikimedia/$WIKI/$TYPE/$BROWSER/$LATENCY/$GRAPHITE_KEY/$DATE/ --screenshot true $URL
    if [ $? -eq 0 ]
    	then
    		sendMetrics
	    else
		    echo 'Browsertime returned an error, not sending metrics'
		    sudo rm -fR result
    fi
}

function runFirefox() {
    GRAPHITE_KEY=$(basename $URL)
    DATE=`date '+%Y-%m-%d-%H-%M'`
    docker run $DOCKER_SETUP --rm -v "$(pwd)":/browsertime -e REPLAY=true -e LATENCY=$LATENCY $CONTAINER --resultDir result -n $RUNS -b $BROWSER --cacheClearRaw --videoParams.framerate $FRAMERATE --connectivity.alias $LATENCY --skipHar --videoParams.nice 8 --videoParams.createFilmstrip false --resultURL https://s3.amazonaws.com/webpagereplay-wikimedia/$WIKI/$TYPE/$BROWSER/$LATENCY/$GRAPHITE_KEY/$DATE/ --screenshot true $URL
    if [ $? -eq 0 ]
        then
            sendMetrics
        else
		    echo 'Browsertime returned an error, not sending metrics'
            sudo rm -fR result
    fi
}

function runMobile() {
    GRAPHITE_KEY=$(basename $URL)
    DATE=`date '+%Y-%m-%d-%H-%M'`
    docker run $DOCKER_SETUP --rm -v "$(pwd)":/browsertime -e REPLAY=true -e LATENCY=$LATENCY $CONTAINER --resultDir result -b $BROWSER -n $RUNS --cacheClearRaw --videoParams.framerate $FRAMERATE --chrome.mobileEmulation.deviceName 'iPhone 6' --videoParams.nice 8 --connectivity.alias $LATENCY --gzipHar --chrome.timeline true --videoParams.createFilmstrip false --resultURL https://s3.amazonaws.com/webpagereplay-wikimedia/$WIKI/$TYPE/$BROWSER/$LATENCY/$GRAPHITE_KEY/$DATE/ --screenshot true $URL
    if [ $? -eq 0 ]
        then
            sendMetrics
        else
		    echo 'Browsertime returned an error, not sending metrics'
            sudo rm -fR result
    fi

}

while true
do
  declare -a DESKTOP_URLS=(https://en.wikipedia.org/wiki/Barack_Obama https://en.wikipedia.org/wiki/Facebook https://en.wikipedia.org/wiki/Sweden https://en.wikipedia.org/wiki/Aretha_Franklin https://en.wikipedia.org/wiki/Metalloid)

  declare -a MOBILE_URLS=(https://en.m.wikipedia.org/wiki/Barack_Obama https://en.m.wikipedia.org/wiki/Facebook https://en.m.wikipedia.org/wiki/Sweden https://en.m.wikipedia.org/wiki/Aretha_Franklin https://en.m.wikipedia.org/wiki/Metalloid)

  LATENCY=100
  echo "Run Chrome tests 100"
  FRAMERATE=$CHROME_FRAMERATE
  RUNS=$CHROME_RUNS
  BROWSER=chrome
  TYPE=desktop
  GRAPHITE_PREFIX=browsertime.enwiki.$TYPE.$BROWSER.anonymous.replay.$LATENCY
  for URL in "${DESKTOP_URLS[@]}"
  do
    runChrome
  done

  echo "Run Firefox tests 100"
  FRAMERATE=$FIREFOX_FRAMERATE
  RUNS=$FIREFOX_RUNS
  BROWSER=firefox
  TYPE=desktop
  GRAPHITE_PREFIX=browsertime.enwiki.$TYPE.$BROWSER.anonymous.replay.$LATENCY
  
  for URL in "${DESKTOP_URLS[@]}"
  do
    runFirefox
  done

  echo "Run emulates mobile tests"
  FRAMERATE=$MOBILE_FRAMERATE
  RUNS=$MOBILE_RUNS
  BROWSER=chrome
  TYPE=mobile
  GRAPHITE_PREFIX=browsertime.enwiki.$TYPE.$BROWSER.anonymous.replay.$LATENCY
  
  for URL in "${MOBILE_URLS[@]}"
  do
    runMobile
  done

  sleep 30
  control
  cleanup
done
Start and restart

Start the script:  nohup ./run.sh &

Restart: First remove /home/ubuntu/browsertime.run and then tail the log and wait for the script to exit. Then start as usual.

Also make sure the script start on server restart: crontab -e

And add @reboot rm /home/ubuntu/browsertime.run;/home/ubuntu/run.sh

That will remove the run file and restart everything.

Store the data

The metrics/videos and HAR files are sent to S3 where they are kept for one week.

http://webpagereplay-wikimedia.s3-website-us-east-1.amazonaws.com/

Alerts

We also run alerts on the metrics we collect from WebPageReplay. Checkout Performance/WebPageReplay/Alerts.