Analytics

From Wikitech
Jump to navigation Jump to search

The Wikimedia Foundation's Analytics Engineering team is part of the Technology department.

The Analytics Engineering Team's primary responsibility is to "empower and support data informed decision making across the Foundation and the Community".

We make Wikimedia related data available for querying and analysis to both WMF and the different Wiki communities and stakeholders.

We develop infrastructure so all our users, both within the Foundation and within the different communities, can access data in a self-service fashion that is consistent with the values of the movement.

We keep all our documentation here on Wikitech. See also this FAQ

About us - Analytics/Team

Contact

If you have questions about our work or the infrastructure we provide, you can contact us in two ways:

Work organization

The analytics team uses Phabricator to track its projects.

Prioritization

Datasets

We maintain various datasets, and we provide two ways to access them:

By access system

By data type

Systems - Analytics/Systems

We maintain various systems to allow querying of our datasets in different fashion.

System name and link Type Accessibility
Cluster (Hadoop, Camus, Hive, Oozie, Spark...) Hadoop Private
AQS - Analytics Query Service REST API Public
Druid - Fast OLAP API + User Interface Private
EventLogging Ad-hoc streaming pipeline Private
EventStreams Mediawiki events streams Public
ReportUpdater Job Scheduler Private
Matomo (was Piwik) Web Analytics (small scale) Private
Archiva Jar repository Private
Kafka Distributed log Private
Dashiki Dashboard builder Public
Wikistats (1 and 2) Community Dashboard with top level metrics Public

Try it out! Analytics/Tutorials

We'd rather have you having fun with our data :)

Please check the link above for something that might help you, and let us know if you don't find what you're after.

Table of Content

Go to the Analytics/TOC page to have a list of all pages we have under Analytics.