Data Platform Engineering
This page is currently a draft. More information and discussion about changes to this draft on the talk page. |
For a demo of the proposed next iteration of how to structure this page and related content, see User:Triciaburmeister/Sandbox/Data_platform and related project updates in phab:T350911. |
The infrastructure and services maintained by the Data Platform Engineering team support the data producers and consumers in collecting, discovering, and using trustworthy data to derive data insights, conduct research and build new data products. To contact us please use the following intake process.
Data Documentation
- Discover WMF data
- Traffic
- Content
- Edits and contributors
- Instrumentation
- Core & Essential Metrics
- MediaWiki replicas
- Access data
Analytics, Research and Reporting
- Query engines
- Choosing a query engine
- Reporting and Dashboards
- Superset
- Wikistats
- Web Analytics with Matomo
- Turnillo
- Exploration
- Jupyter notebooks
- Python, Conda & WMFdata python library
- Analytics compute
Producing and Publishing Data
- Policies and guidelines
- Data Retention
- Data Publication
- Data Access
- Data Documentation
- Data Classification
- Privacy Policy
- Data lifecycle management process
- Data Issue Reporting
Data Collection and Transformation
- Data Collection
- Event Instrumentation
- Metrics Platform & Experimentation
- Data Ingestion
- Batch Transforms
- Airflow User Guide
- Report Updater / Refine / SystemD
- Streaming Transforms
- Event Platform
Data Platform Systems
- System architecture
- Data systems
- Runbooks
- Query and coding conventions
- Team processes
- Maintainer onboarding
- How we work
Search
- Search platform
- Wikidata Query Service (WDQS)