Analytics/Data Lake

From Wikitech
Jump to: navigation, search

This page is the entry-point of the Analytics Data Lake (ADL) documentation. The ADL is a large, analytics-oriented repository of data, both raw and aggregated, about Wikimedia projects (in industry terms, a w:data lake). All of the data contained in the lake can be accessed through systems allowing to join them.

  • Traffic data -- webrequest, pageviews, unique devices ...
  • Edits data -- Historical data about revisions, pages, and users [in beta as of 2017-04-07].

As the Data Lake matures, we will add any and all data, and try to safely make them public as much as possible.

For Technical aspects of the data lake pipelines, see Analytics/Systems/Data Lake.