Traffic refers to pageviews to the pages of a wiki project. This page links to detailed information about traffic datasets in the Data Lake.
Most of the datasets below are updated at hourly granularity, meaning that you'll get an hour of new data every hour, with between 2 and 3 hours delay (for the hour to be finished, and the data to be computed).
These datasets are available as Hive tables and can be queried using one of the available SQL engines, or accessed directly through HDFS.
|webrequest hive table
- See also a separate list of Hive tables derived from webrequest
|pageview_actor hive table||The wmf.pageview_actor table is a smaller version of webrequest table with fewer columns.|
|pageview_hourly hive table||The wmf.pageview_hourly table contains 'pre-aggregated' webrequest data, filtered to keep only pageviews, and aggregated over a predefined set of dimensions.|
|projectview_hourly hive table||The |
|unique devices||This dataset gives you how many distinct devices visit our projects|
|browser general||This dataset gives you pageview statistics broken down by user-agent related dimensions like OS family, OS major, browser family, browser major|
|mobile apps session metrics||Contains aggregate stats about pageview sessions on the Android and iOS Wikipedia mobile apps|
|mobile apps uniques||Counts how many different Android and iOS Wikipedia mobile apps installs accessed Wikimedia sites during the given day or month|
|inter language||Traffic between different languages on the same project family|
|virtualpageview_hourly||Provides data about page previews on desktop Wikipedia|
These datasets are made available as files, updated at regular intervals.
- Pageviews and Projectviews dumps [To be updated]
- Compressed pageviews dumps [To be updated]
Deprecated or Obsolete Datasets
The following datasets are no longer in use, but the pages are kept to document history:
Some partial information about the evolution of publishing analytics data at WMF is recorded here in a timeline.