The Wikimedia Foundation's Analytics Engineering team is part of the Technology department.
The Analytics Engineering Team's primary responsibility is to "empower and support data informed decision making across the Foundation and the Community".
We make Wikimedia related data available for querying and analysis to both WMF and the different Wiki communities and stakeholders.
We develop infrastructure so all our users, both within the Foundation and within the different communities, can access data in a self-service fashion that is consistent with the values of the movement.
We keep all our documentation here on Wikitech. See also this FAQ
About us - Analytics/Team
If you have questions about our work or the infrastructure we provide, you can contact us in two ways:
- on our public mailing list, email@example.com (subscribe, archives)
- in our public IRC channel, #wikimedia-analytics. You can use the keyword a-team to ping us, so we notice your question.
- during our office hours, which we host as of 2019, January 14th on the second Monday of every month. Add to your calendar or let us know if that time is too early for you, and we can hold a second session when needed.
The analytics team uses Phabricator to track its projects.
- https://phabricator.wikimedia.org/tag/analytics/ for backlog triage
- https://phabricator.wikimedia.org/tag/analytics-kanban/ for in progress tasks
We maintain various datasets, and we provide two ways to access them:
By access system
- Data Lake [Hadoop cluster]
- AQS - Analytics Query Service [TODO - Create new page instead of redirect with Systems]
- Druid and Turnilo (formerly Pivot)
- ReportUpdater reports
- Wikistats 2
- Ad hoc datasets published with documentation by researchers and Analysts at WMF
By data type
- Webrequests [Traffic logs] and derived tables, including:
- Mediawiki raw databases
- EventLogging (in the event database in hive)
- Edits history, Page history, User history
- Other reports
Systems - Analytics/Systems
We maintain various systems to allow querying of our datasets in different fashion.
|System name and link||Type||Accessibility|
|Cluster (Hadoop, Camus, Hive, Oozie, Spark...)||Hadoop||Private|
|AQS - Analytics Query Service||REST API||Public|
|Druid - Fast OLAP||API + User Interface||Private|
|EventLogging||Ad-hoc streaming pipeline||Private|
|EventStreams||Mediawiki events streams||Public|
|Matomo (was Piwik)||Web Analytics (small scale)||Private|
|Wikistats (1 and 2)||Community Dashboard with top level metrics||Public|
Try it out! Analytics/Tutorials
We'd rather have you having fun with our data :)
Please check the link above for something that might help you, and let us know if you don't find what you're after.
Table of Content
Go to the Analytics/TOC page to have a list of all pages we have under Analytics.