In 2014/15, the Analytics and Research Teams at Wikimedia developed a new and more comprehensive definition and algorithm to count pageviews. "Pageviews" or "Current Pageviews" refers to a tally using the new algorithm. As of 2015, many dashboards and reports continue to use the legacy definition of pageviews and those counts should be referred to as "Legacy Pageviews".
|Data Source||sampled web-request logs||un-sampled web-request logs|
|Examples||WMF Quarterly Report|
Eventually some dashboards will be deprecated
or migrated to use the current pageview definition.
- The Research team has developed a new definition of Pageviews: https://meta.wikimedia.org/wiki/Research:Page_view
- The new definition is better because:
- it is based on all the web-request logs (not a 1:1000 sampling of them);
- it can detect and flag more spiders (which can be a significant fraction of total traffic on some wikis);
- The new definition will evolve over time as new ways of viewing content are developed (e.g. a mobile app using and APIs)
- The new definition is generated on the analytics cluster using Hadoop and web request logs
- Current pageview counts are being tallied starting May 1 2015
- Where appropriate, we will slowly transition uses of legacy pageviews to use current pageviews or we will point out where legacy pageviews are used.
- The previous definition is documented here: https://phabricator.wikimedia.org/diffusion/ANME/browse/master/pageviews/webstatscollector/pageview_definition.png
- The previous definition relied on a tool named webstatscollector: Analytics/Webstatscollector
Comparing current and legacy pageviews
Legacy pageview counts can/are larger than current pageview counts because of traffic from spiders. The current definition makes a better effort at counting traffic from real persons and excluding automata. Please take this into account when trying to plot year over year changes in traffic. For example, when looking at http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm there is a discontinuity in traffic on May 2015 because that is when the current pageview definition is used to report traffic.