Data Platform/Systems
Appearance
These subpages explain in technical detail the systems that process data for analytics at Wikimedia Foundation.
They include information about setup, maintenance, architecture, and more.
Search within the Data Platform/Systems docs
Child Pages of Data Platform/Systems
AQS · Airflow · Analytics Meta · Archiva · Bigtop Packages · Ceph · Clients · CloudnativePG · Cluster · Conda · Coordinator · DB Replica · Dashiki · DataHub · Data Quality · Data deletion and sanitization · Dealing with data loss alarms · Druid · Edit data loading · Edit history administration · Edit serving layer · EventLogging · Event Data retention · Exporting from HDFS to Swift · Geolocation · Gobblin · Hadoop · Hadoop Event Ingestion Lifecycle · Hive · Hive to Druid Ingestion Pipeline · Iceberg · Java · Jupyter · Kerberos · Maintenance Schedule · Managing systemd timers · Manual maintenance · MariaDB · Matomo · Mediawiki History Snapshot Check · Mediawiki history reduced algorithm · Page and user history reconstruction · Page and user history reconstruction algorithm · Presto · Refine · Reportupdater · Revision augmentation and denormalization · Siege · Spark · Superset · System Users · Turnilo · Varnishkafka · Wikistats · Wikistats 2 · analytics.wikimedia.org · ua-parser
All Subpages of Data Platform/Systems
- AQS
- AQS/OpenAPI spec style guide
- AQS/Scaling
- AQS/Scaling/2016/Hardware Refresh
- AQS/Scaling/2017/Cluster Expansion
- AQS/Scaling/2020/Cluster Expansion
- AQS/Scaling/LoadTesting
- Airflow
- Airflow/Developer guide
- Airflow/Developer guide/Normalize a DAG
- Airflow/Developer guide/Python Job Repos
- Airflow/Instances
- Airflow/Kubernetes
- Airflow/Upgrading
- Analytics Meta
- Archiva
- Bigtop Packages
- Ceph
- Clients
- CloudnativePG
- CloudnativePG/Clusters
- Cluster
- Cluster/Geotagging
- Cluster/Hadoop/Load
- Cluster/Spark History
- Conda
- Coordinator
- DB Replica
- Dashiki
- Dashiki/Configuration
- DataHub
- DataHub/Administration
- DataHub/Data Catalog Documentation Guide
- DataHub/Upgrading
- Data Quality
- Data deletion and sanitization
- Dealing with data loss alarms
- Druid
- Druid/Alerts
- Druid/Load test
- Edit data loading
- Edit history administration
- Edit serving layer
- EventLogging
- EventLogging/Administration
- EventLogging/Architecture
- EventLogging/Backfilling
- EventLogging/Data representations
- EventLogging/EventCapsule
- EventLogging/Monitoring
- EventLogging/NotErrorLogging
- EventLogging/Outages
- EventLogging/Performance
- EventLogging/Publishing
- EventLogging/Sanitization vs Aggregation
- EventLogging/Schema Guidelines
- EventLogging/Sensitive Fields
- EventLogging/TestingOnBetaCluster
- EventLogging/User agent sanitization
- Event Data retention
- Event Data retention/AppInstallId
- Exporting from HDFS to Swift
- Geolocation
- Gobblin
- Hadoop
- Hadoop/Administration
- Hadoop/Alerts
- Hadoop/Test
- Hadoop Event Ingestion Lifecycle
- Hive
- Hive/Alerts
- Hive/Avro
- Hive/Compression
- Hive/Counting uniques
- Hive/Queries
- Hive/Queries/Wikidata
- Hive/Querying using UDFs
- Hive to Druid Ingestion Pipeline
- Iceberg
- Iceberg/Migration Dependencies
- Java
- Jupyter
- Jupyter/Administration
- Kerberos
- Kerberos/Administration
- Maintenance Schedule
- Managing systemd timers
- Manual maintenance
- MariaDB
- Matomo
- Mediawiki History Snapshot Check
- Mediawiki history reduced algorithm
- Page and user history reconstruction
- Page and user history reconstruction algorithm
- Presto
- Presto/Administration
- Presto/Query Logger
- Refine
- Refine/Deploy Refinery
- Refine/Deploy Refinery-source
- Reportupdater
- Revision augmentation and denormalization
- Siege
- Spark
- Spark/Administration
- Superset
- Superset/Administration
- Superset/Date functions
- System Users
- Turnilo
- Varnishkafka
- Wikistats
- Wikistats/Deprecation of Wikistats 1
- Wikistats/Traffic
- Wikistats 2
- Wikistats 2/Map Component
- Wikistats 2/Metrics/FAQ
- Wikistats 2/Smoke Testing
- analytics.wikimedia.org
- ua-parser
- ua-parser/2019-09-18 Update