Performance/Mobile Device Lab
The Mobile Device Lab is provided by the Performance Team for executing tests on real mobile devices, in a variety of network environments. We use this to measure how changes impact mobile users.
Summary
In the path to get more realistic performance metric run tests on real mobile devices. That makes it easier to find performance regressions. We use BitBar as the provider of our devices. All tasks are reported under the Performance Device Lab tag in Phabricator.
Performance testing on mobile phones
Running tests on mobile phones we want a stable environment that do not change, so we can see performance regressions. To make that happen we use:
- A stable network: We throttle the connection to look like a 3g or 4g connection.By limiting the upload/download speed and adding delay, we make the phone get requests in a more realistic scenario. By making sure the network is the same all the time, the network will not affect our metrics.
- Low phone temperature: We measure the battery temperature as a proxy for CPU temperature. Android phones change behavior when the CPU gets warm and we want to avoid that. Some of our phones are rooted to try to make sure the phone has the same performance characteristics. We use the same settings as the Mozilla performance team to setup the rooted phones. We measure the temperature before and after we start a test.
Setup
We use five mobile phones, a server and two wifi:s setup with throttled connection to simulate 4g traffic. The wifi connections is provided by two Raspberry Pis 4 running humble.
The workflow: The jobs are started on the server that runs sitespeed.io that drives the phones using WebDriver. The configuration and URLs to tests exists in a public Git repo. The tests runs on the phones, access Wikipedia and we record a video of the screen and analyze the result to get visual metrics. The metrics is sent to our Graphite instance and the test results (screenshots, HTML result pages) are stored on S3. We also run one tests using WebPageReplay where we record and replay Wikipedia locally on the server to try to get as stable metrics as possible between runs.
Setup the phones
BitBar is handling the setup of phones. If we need to change anything we need to contact them.
We have five phones running at the Performance Device Lab with the following setup.
Id | Type | Internet connection | Extras | OS version | Usage |
---|---|---|---|---|---|
ZY322DJBW9 | Motorola Moto G5 #1 | Simulated 4g (wifi wikimedia93) | Root | 8.1.0 | Used for testing |
ZY322GXQN8 | Motorola Moto G5 #2 | Simulated 4g (wifi wikimedia94) | Root | 8.1.0 | Direct Wikipedia |
ZY322H9XKL | Motorola Moto G5 #3 | Simulated 4g (wifi wikimedia93) | Root | 8.1.0 | Direct Wikipedia |
R58NC31FK3Y | Samsung A51 | WebPageReplay | Root | 11 | Using WebPageReplay |
Samsung A51 | WebPageReplay | Root | 11 | Using WebPageReplay |
Using rooted phones makes it possible to stabilise CPU and GPU performance by configuring governors.
The the Moto G5 #2 and #3 are setup as one phone as one group. We then add tests to that group and the first available phone start the tests. Both Samsung phones are also in the same group, meaning they act as one and as soon as one phone is available, it will take on jobs.
BitBar setup
At BitBar our test use a generic setup with a start bash file (run-tests.sh), a secrets.json file and a Slack bash file (slack.sh) that are uploaded in the BitBar GUI in a zip file. The bash file is called when a test is started and looks like this:
# We unpack tests.zip that contains the secrets.json configuration file
unzip tests.zip
# Clone the git repo where we have all the code
git clone https://github.com/wikimedia/performance-mobile-synthetic-monitoring-tests.git
cd performance-mobile-synthetic-monitoring-tests
# There's a hack on BitBar where you can pass on parameters
touch sitespeed.log
../slack.sh sitespeed.log "$1" "$2" "$3" &
./start.sh "$1" "$2" "$3" 2>&1 | tee sitespeed.log
And the secrets.json file contains configuration to sitespeed.io to be able to send metrics to our Graphite instance and send data to S3. All tests then extends that configuration file and we can have those configurations file open to the public in our Git repo. We also have a bash script that just reads the log from the test and report all errors to a Slack channel:
#!/bin/bash
tail -n0 -F "$1" | while read LINE; do
(echo "$LINE" | grep -A 3 -e "ERROR:") && curl -X POST --silent --data-urlencode \
"payload={\"text\": \"[bitbar $2] $(echo $LINE | sed "s/\"/’/g")\"}" https://hooks.slack.com/services/OUR_SERVICE;
done
We then have cronjob calling the API using CURL with settings to use our zip file . At the moment the job is fired from gpsi.webperf.eqiad1.wikimedia.cloud using the bitbar user. The job is kicked off by the bitbarcaller.sh script that lives in the repo.
We can pass on max five parameters from the API to our scripts, today we use three of them. They are named: PARAM1, PARAM2, PARAM3, PARAM4 nd PARAM5.
Performance tests
All configuration and setups for our tests lives in Gerrit in performance/mobile-synthetic-monitoring-tests. To add or change tests you clone the repo and send in your change for review.
Add a test
All configuration files exists in our synthetic monitoring tests repo. Clone the repo and go into the tests folder:
git clone ssh://USERNAME@gerrit.wikimedia.org:29418/performance/mobile-synthetic-monitoring-tests.git
cd mobile-synthetic-monitoring-tests/tests
Change configuration
If you need to change the secrets.json file, the easiest way to do that it to do like this:
- Upload a new file to BitBar and check the id for that file.
- Update the bitbarcaller.sh script to use the new file (use the id).
- Done.
Alert and error reporting
We have a two alerts setup to verify that the network and tests work as they should. The alerts verify that we get metrics into Graphite for the phones that uses the 4g and WebPageReplay tests. The alerts exist in: https://grafana.wikimedia.org/d/frWAt6PMz/synthetic-tool-alerts
We also report error log message to the #synthetic-tests-error-reporting Slack channel (you need to be invited by the performance team). Here we can see smaller kind of failures like one of the runs didn't work, instability in Firefox/Chrome. We use those errors to know if there's something that needs to be tuned or reported upstream.
Troubleshooting
All phones trouble shooting is handled by BitBar. If one phone is "offline", "online dirty" or do not have an internet connection we contact BitBar using the BitBar/Wikimedia Slack channel.
Dashboards
The data is reported under the android key in Graphite. Make sure type is android in the dashboard.
Outstanding issues
There's a problem at BitBar if one phone loses the internet connection. With their current setup that phone still take on jobs. Tracked in T334710.
One thing to keep track of is that maxAutoRetriesCount needs to be set to 0 (not null as it is default) so that BitBar can debug if something fails, else the failing test is just deleted.