Analytics/Data Lake/Traffic/Pagecounts-ez

From Wikitech

This dataset is described on its dumps download page.

This dataset is a compressed format of the best pageview data that the Wikimedia Foundation had at any point in its historyː

  • From 2007 to December 2015, it compressed the pagecounts-raw dataset, which is now deprecated (providing pageviews per project from December 2007 on, and pageviews per article from late 2011 on)
  • From Dec 2015 to Present day, it compresses the pageviews dataset

More information about each of those datasets can be found on their pages.

One hour skewing issue

The data on this dataset, when compared to the canonical Pageviews API, is skewed one hour to the left. This means that on Pagecounts-EZ reports as midnight the pagecount value that in reality corresponds to 11PM the previous day:

Dataset 12am 1am 2am 3am 4am 5am 6am 7am 8am 9am 10am 11am 12pm
Pageview API 23 234 43 345 64 12 534 654 43 645 98 65 75
Pagecounts EZ 89 23 234 43 345 64 12 534 654 43 645 98 65

See also