Performance/AS Report

The AS Report (Autonomous Systems Performance Report) provides insights into the effective performance of Internet providers around the world. It representes a step toward formalizing collaborations with browsers vendors and ISPs.

AS Report takes the form of a monthly-generated dataset anonymised from visits to Wikipedia and other Wikimedia Foundation sites. The dataset is generated from normalized RUM metrics (Navigation Timing and CPU benchmark) measured in the web browser during sampled page views.

Methodology

Performance metrics are collected using the Navigation Timing API.

CPU microbenchmark

Our world is not evenly distributed in outcome. Whether by circumstance or by choice, there is a wide range of different experencies. This includes the effective speed of a device at any given moment (e.g. laptop or phone). In order for the AS Report to be able to compare Internet providers in the same geographic region fairly, even if their audiences are different in their device choice, we have to account for device CPU performance. In the wild, we have observed wide ranges of CPU scores even for the exact same device model. For example, background memory usage and battery levels can greatly influence the performance of mobile devicess at any given time.

Our CPU microbenchmark (source code) is run for a short time in a Worker thread, to avoid disrupting their pageview experience. The benchmark assesses the overall performance of the device at the time of the measurement.

Using this information, we select a common subset of CPU scores, and then compile medians of Navigation Timing metrics only from devices with comparable performance. This is done separately for each autonomous system, based on the visitors' IP addresses.

Mobile vs desktop

We separate mobile and desktop experiences, as they have significantly different page weights. As such, scores aren't comparable between mobile and desktop for a given country. Just as they are not comparable between countries either, due to different CPU benchmark score slices being selected for each. Reports are per country, as Internet services tend to be sold to consumers on a national basis, and networks in the same country face the same infrastructural challenges and distances to our data centers.

Mobile networks are those with at least one sampled visit having a "cellular" connection type, as reported by the Network Information API. When calculating the scores for mobile, we only consider data from sampled pageviews to the mobile site. Desktop networks are those with at least one sampled visit having a "wifi" connection type. When calculating scores for desktop, we only consider data from desktop pageviews.

We don't want an AS corresponding to an ISP selling femtocell devices widely to be unfairly advantaged or disadvantaged due to differences between the mobile and desktop experience, resulting in lighter or heavier pages on average.

We avoid exposing small population groups to privacy risks, and remove noise from rare use of mobile connections on desktop devices and vice versa. This is done by only publish results for combinations of country, AS organization, and mobile/desktop when sufficient pageview samples exists in the dataset.

RUM metrics

We report the medians for 2 core RUM metrics measures by the visitors' browsers: Time to first byte and Page Load Time, collected using the Navigation Timing API.

Time to first byte (TTFB) is how long it takes between the client requesting the page and it receiving the first byte of data from us. Page load time (PLT) is how long it takes to load the whole page, including all images and critical styles/scripts.

TTFB is the metric closest to latency, which is something ASOs might improve by peering with us or tweaking their routes to us. PLT is the metric that correlates the most to the visitors' perception of performance, as shown by research we've conducted. It's what captures the ASOs' quality of service to their customers the best.

We also report the median transferSize as reported by the Navigation Timing API as a sanity check, to ensure that the RUM metrics comparison is fair between ASOs, and that the differences aren't caused by visitors using a particular ASO accessing much smaller or much bigger pages on average.

Privacy

In order to respect the privacy of our visitors, we only report ASOs for which we aggregate more than 500 unique pageviews to generate scores.

Service

Source code: performance/asoranking.git.
Code review: Gerrit: performance/asoranking

Puppet class: statistics::performance.
Current host: stat1007 (controlled by profile::statistics::explorer::misc_jobs::hosts_with_jobs in Hiera).

How it works

The asoranking.py script run on the first of every month from an Analytics Team stat machine.

Puppet provisions and schedules this the Python script via Scap from a deployment server. It runs as a monthly cron job under the analytics-privatedata user (a requirement to have access to the Kerberos keytab).

The script reads queries Hive to read the Navigation Timing data that we originally collected from web browsers via EventLogging.

The reports are expected to be generated once a month. Our monitoring alert checks that the latest published dataset should be less than 32 days old. The alert also checks that the report is at least 1KB in size.

Output

When used as follows:

python asoranking.py --publish

This will output the generated ranking for the previous calendar month in the form of a TSV file, published to /srv/published-datasets/performance/autonomoussystems/.

As per Analytics/Web publication, these are publicly web-accessible at https://analytics.wikimedia.org/published/datasets/performance/autonomoussystems/.

This data is also periodically used for the web page at https://performance.wikimedia.org/asreport/ (performance.wikimedia.org/Runbook).