Data Platform/Data Lake/Traffic/Unique Devices
Appearance
How is this data computed
We compute this data using the Last-Access
cookie. For details see Analytics/Data Lake/Traffic/Unique Devices/Last access solution and m:Research:Unique Devices.
Tables schema
As of 2017-07, there are 4 'unique devices
' tables available in the wmf
database on Hive:
unique_devices_per_domain_daily
stores unique devices counts per domain (e.g. en.m.wikipedia.org) split by country per dayunique_devices_per_domain_monthly
stores unique devices counts per domain split by country per monthunique_devices_per_project_family_daily
stores unique devices counts per project (e.g. Wikipedia) split by country per dayunique_devices_per_project_family_monthly
stores unique devices counts per project split by country per month
unique_devices_per_domain_daily / unique_devices_per_domain_monthly | ||
---|---|---|
domain
|
string
|
Lower cased domain accessed (en.wikipedia.org for instance) |
country
|
string
|
Country name of the accessing agents (computed using maxmind GeoIP database) |
country_code
|
string
|
2 letter country code |
uniques_underestimate
|
int
|
Under estimation of unique devices based on Last-Access cookie, and the nocookies header. Unique Devices that came to a given host at least twice. |
uniques_offset
|
int
|
Unique devices offset computed as 1-action sessions without cookies. |
uniques_estimate
|
int
|
Estimate of total unique devices seen as uniques_underestimate plus offset |
year
|
int
|
Unpadded year of requests |
month
|
int
|
Unpadded month of requests |
day
|
int
|
Unpadded day of requests (only for the unique_devices_..._daily tables)
|
unique_devices_per_project_family_daily / unique_devices_per_project_family_monthly | ||
---|---|---|
project_family
|
string
|
Lower cased project accessed (Wikipedia or Wikivoyage for instance) |
country
|
string
|
Country name of the accessing agents (computed using the MaxMind GeoIP database) |
country_code
|
string
|
2 letter country code |
uniques_underestimate
|
int
|
Under-estimation of unique devices based on the Last-Access global cookie and the nocookies header. Unique Devices that came to a given project family at least twice. |
uniques_offset
|
int
|
Unique devices offset computed as 1-action sessions without cookies. |
uniques_estimate
|
int
|
Estimate of total unique devices seen as uniques_underestimate plus offset |
year
|
int
|
Unpadded year of requests |
month
|
int
|
Unpadded month of requests |
day
|
int
|
Unpadded day of requests (only for last_access_uniques_global_daily )
|
Sample query to get total uniques for a given host or project_family for a day
SELECT SUM(uniques_estimate) FROM wmf.unique_devices_per_domain_daily WHERE year=2015 AND month=12 AND day=24 AND domain = 'es.wikipedia.org';
SELECT SUM(uniques_estimate) FROM wmf.unique_devices_per_project_family_daily WHERE year=2017 AND month=4 AND day=1 AND project_family = 'wikipedia';
Data Quality
The Last-Access based uniques metric has proven having a lot of variability for small projects.
Please read Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution#Data_Quality_Analysis.
Changes and Known Problems with Dataset
- 2016-02-19: Monthly per-domain data is available as of January 2016.
Date from | Date until | Task | Details |
---|---|---|---|
Feb 9, 2021 | June 30, 2022 | task T316572 | Unique devices by family metrics has been overcounted by approx ~5% globally. For more details, read Analytics/Data Lake/Data Issues/2021-02-09 Unique Devices By Family Overcount |
2020-06-24 (daily) / 2020-06-01 (monthly) | task T250744 | Quality improvement through removal of automated traffic. See Analytics/Data Lake/Traffic/Unique Devices/Automated traffic correction
| |
2018-05-30 | 2018-06-03 | task T199517 | June Unique devices increase of 170% for wikidata |
start | 2017-05-18 | task T165661 | Per-domain unique-devices computation excluded countries that didn't have either underestimates or offset until 2017-05-18. |
start | 2017-06-11 | task T167005 | Per-Domain unique-devices computation was under-counting fresh sessions (offset ) by about 10% until 2017-06-11.
|
2016-11-04 | 2017-02-14 | task T165560 | Artificial spike in offset of unique devices from November to February on wikidata likely related to varnish4 rollout |
See also
- Dashiki dashboard (example), global data only: https://analytics.wikimedia.org/dashboards/vital-signs/#projects=eswiki,itwiki,enwiki,jawiki,dewiki,ruwiki,frwiki/metrics=MonthlyUniqueDevices
- Wikistats v2 dashboard (example), global data only: https://stats.wikimedia.org/v2/#/all-wikipedia-projects
- Unique devices in the Wikimedia Analytics API documentation
- April 2016 announcement