Performance/Mobile Device Lab

From Wikitech
Jump to navigation Jump to search

The Mobile Device Lab is provided by the Performance Team for executing tests on real mobile devices, in a variety of network environments. We use this to measure how changes impact mobile users.


In the path to get more realistic performance metric run tests on real mobile devices. That makes it easier to find performance regressions. We use BitBar as the provider of our devices. All tasks are reported under the Performance Device Lab tag in Phabricator.

Performance testing on mobile phones

Running tests on mobile phones we want a stable environment that do not change, so we can see performance regressions. To make that happen we use:

  • A stable network: We throttle the connection to look like a 3g or 4g connection.By limiting the upload/download speed and adding delay, we make the phone get requests in a more realistic scenario. By making sure the network is the same all the time, the network will not affect our metrics.
  • Low phone temperature: We measure the battery temperature as a proxy for CPU temperature. Android phones change behavior when the CPU gets warm and we want to avoid that. Some of our phones are rooted to try to make sure the phone has the same performance characteristics. We use the same settings as the Mozilla performance team to setup the rooted phones. We measure the temperature before and after we start a test. If the temperature is too high we wait X minutes and try again.


We use five mobile phones, a server and two wifi:s setup with throttled connection to simulate 3g and 4g traffic. The wifi connections is provided by two Raspberry Pis 4 running humble.

Setup showing the mobile performance device lab.

The workflow: The jobs are started on the server that runs that drives the phones using WebDriver. The configuration and URLs to tests exists in a public Git repo. The tests runs on the phones, access Wikipedia and we record a video of the screen and analyze the result to get visual metrics. The metrics is sent to our Graphite instance and the test results (screenshots, HTML result pages) are stored on S3. We also run one tests using WebPageReplay where we record and replay Wikipedia locally on the server to try to get as stable metrics as possible between runs.

Performance device lab at BitBar.

Setup the phones

BitBar is handling the setup of phones. If we need to change anything we need to contact them.

We have five phones running at the Performance Device Lab with the following setup.

Id Type Internet connection Extras OS version Usage
ZY322DJBW9 Motorola Moto G5 #1 Simulated 4g (wifi wikimedia93) Root 8.1.0 Used for testing
ZY322GXQN8 Motorola Moto G5 #2 Simulated 4g (wifi wikimedia94) Root 8.1.0 All tests agains Wikipedia
ZY322H9XKL Motorola Moto G5 #3 Simulated 4g (wifi wikimedia93) Root 8.1.0 All tests agains Wikipedia
R58NC31FK3Y Samsung A51 WebPageReplay Root 11 Running against WebPageReplay
Samsung A51 WebPageReplay Root 11 Running against WebPageReplay

Using rooted phones makes it possible to stabilise CPU and GPU performance by configuring governors.

The the Moto G5 #2 and #3 are setup as one phone as one group. We then add tests to that group and the first available phone start the tests. Both Samsung phones are also in the same group, meaning they act as one and as soon as one phone is available, it will take on jobs.

Raspberry Pi

See also: Performance/Mobile Device Lab/Raspberry Pi Image.

BitBar setup

At BitBar our test use a generic setup with a start bash file (, a secrets.json file and a Slack bash file ( that are uploaded in the BitBar GUI in a zip file. The bash file is called when a test is started and looks like this:

# We unpack that contains the secrets.json configuration file

# Clone the git repo where we have all the code
git clone
cd performance-mobile-synthetic-monitoring-tests

# There's a hack on BitBar where you can pass on a parameter
touch sitespeed.log
../ sitespeed.log "$1" &
./ "$1" 2>&1 |  tee sitespeed.log

And the secrets.json file contains configuration to to be able to send metrics to our Graphite instance and send data to S3. All tests then extends that configuration file and we can have those configurations file open to the public in our Git repo. We also have a bash scrip that just reads the log from the test and report all errors to a Slack channel:

tail -n0 -F "$1" | while read LINE; do
  (echo "$LINE" | grep -A 3 -e "ERROR:") && curl -X POST --silent --data-urlencode \
  "payload={\"text\": \"[bitbar $2] $(echo $LINE | sed "s/\"/’/g")\"}";

We then have cronjob calling the API using CURL with settings to use our zip file and sending one extra parameter that will choose what test to run. At the moment the job is fired from

When we setup the job at BitBar we need to make that the job run on the first available instance (that is not default) so you need to change that in the GUI.

You also need to pass on which test to run. The BitBar documentation is wrong and the only way to do it is to use the CALABASH_TAGS run parameter. You set that in the GUI. In this example we tell the script to run the

Performance tests

All configuration and setups for our tests lives in Gerrit in performance/mobile-synthetic-monitoring-tests. To add or change tests you clone the repo and send in your change for review.

Add a test

All configuration files exists in our synthetic monitoring tests repo. Clone the repo and go into the tests folder:

git clone ssh://

cd mobile-synthetic-monitoring-tests/tests

Change configuration

If you need to change the secrets.json file, the easiest way to do that it to do like this:

  1. Upload a new file to BitBar and check the id for that file.
  2. Login to the runner server and edit the bash scripts that kick off the tests. Change FILE_ID= so it used the new file id.
  3. Done.

Alert and error reporting

We have a three alerts setup to verify that the network and tests work as they should. The alerts verify that we get metrics into Graphite for the phones that uses the 3g, 4g and WebPageReplay tests. The alerts exist in:

We also report error log message to the #bitbar-error-reporting Slack channel (you need to be invited by the performance team). Here we can see smaller kind of failures like one of the runs didn't work, instability in Firefox/Chrome. We use those errors to know if there's something that needs to be tuned or reported upstream.


All phones trouble shooting is handled by BitBar. If one phone is "offline" or "online dirty" we contact BitBar on using the BitBar/Wikimedia Slack channel.


The data is reported under the android key in Graphite. Make sure type is android in the dashboard.

Outstanding issues

No outstanding issues at the moment. One thing to keep track of is that maxAutoRetriesCount needs to be set to 0 (not null as it is default) so that BitBar can debug if something fails, else the failing test is just deleted.