Analytics/Ad hoc datasets
This page describes ad hoc datasets – a directory hosting datasets published by researchers and analysts working with the Wikimedia Foundation. For guidelines on how to formally release an open dataset (with metadata and persistent identifiers), please refer to Data releases. For regular, structured, and maintained datasets, please see Analytics#Datasets.
This data is created by people studying private data on the WMF Analytics Cluster and aggregated so that it poses no privacy concern, then shared here.
If you're looking for data here, some of it may not be maintained or documented. If possible, please reach out to the authors of the data for help, or to Analytics/Team. If you're publishing data here, there are some guidelines in the README on the server:
- Please name your folders in a friendly way, think of strangers browsing through this data
- Take a look at https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater for ongoing reports
- Always Remember: be careful what you share here
To share data via this server just copy safe, public data to
/srv/published-datasets/ on any of the stat machines (eg. stat1007.eqiad.wmnet). As an example, Analytics/Reportupdater jobs copy their output to