Data Platform/Systems
Appearance
(Redirected from Data Engineering/Systems)
These subpages explain in technical detail the systems that process data for analytics at Wikimedia Foundation.
They include information about setup, maintenance, architecture, and more.
Search within the Data Platform/Systems docs
Child Pages of Data Platform/Systems
AQS · Airflow · Analytics Meta · Archiva · Bigtop Packages · Ceph · Clients · CloudnativePG · Cluster · Conda · Coordinator · DB Replica · Dashiki · DataHub · Data Quality · Data deletion and sanitization · Dealing with data loss alarms · Druid · Edit data loading · Edit history administration · Edit serving layer · EventLogging · Event Data retention · Exporting from HDFS to Swift · Geolocation · Gobblin · Hadoop · Hadoop Event Ingestion Lifecycle · Hive · Hive to Druid Ingestion Pipeline · Iceberg · Java · Jupyter · Kerberos · Maintenance Schedule · Managing systemd timers · Manual maintenance · MariaDB · Matomo · Mediawiki History Snapshot Check · Mediawiki history reduced algorithm · Page and user history reconstruction · Page and user history reconstruction algorithm · Presto · Refine · Reportupdater · Revision augmentation and denormalization · Siege · Spark · Superset · System Users · Turnilo · Varnishkafka · Wikistats · Wikistats 2 · analytics.wikimedia.org · ua-parser
All Subpages of Data Platform/Systems
- AQS
- AQS/OpenAPI spec style guide
- AQS/Scaling
- AQS/Scaling/2016/Hardware Refresh
- AQS/Scaling/2017/Cluster Expansion
- AQS/Scaling/2020/Cluster Expansion
- AQS/Scaling/LoadTesting
- Airflow
- Airflow/Developer guide
- Airflow/Developer guide/Normalize a DAG
- Airflow/Developer guide/Python Job Repos
- Airflow/Instances
- Airflow/Kubernetes
- Airflow/Kubernetes/Operations
- Airflow/Upgrading
- Analytics Meta
- Archiva
- Bigtop Packages
- Ceph
- Clients
- CloudnativePG
- CloudnativePG/Clusters
- Cluster
- Cluster/Geotagging
- Cluster/Hadoop/Load
- Cluster/Spark History
- Conda
- Coordinator
- DB Replica
- Dashiki
- Dashiki/Configuration
- DataHub
- DataHub/Administration
- DataHub/Data Catalog Documentation Guide
- DataHub/Upgrading
- Data Quality
- Data deletion and sanitization
- Dealing with data loss alarms
- Druid
- Druid/Alerts
- Druid/Load test
- Edit data loading
- Edit history administration
- Edit serving layer
- EventLogging
- EventLogging/Administration
- EventLogging/Architecture
- EventLogging/Backfilling
- EventLogging/Data representations
- EventLogging/EventCapsule
- EventLogging/Monitoring
- EventLogging/NotErrorLogging
- EventLogging/Outages
- EventLogging/Performance
- EventLogging/Publishing
- EventLogging/Sanitization vs Aggregation
- EventLogging/Schema Guidelines
- EventLogging/Sensitive Fields
- EventLogging/TestingOnBetaCluster
- EventLogging/User agent sanitization
- Event Data retention
- Event Data retention/AppInstallId
- Exporting from HDFS to Swift
- Geolocation
- Gobblin
- Hadoop
- Hadoop/Administration
- Hadoop/Alerts
- Hadoop/Test
- Hadoop Event Ingestion Lifecycle
- Hive
- Hive/Alerts
- Hive/Avro
- Hive/Compression
- Hive/Counting uniques
- Hive/Queries
- Hive/Queries/Wikidata
- Hive/Querying using UDFs
- Hive to Druid Ingestion Pipeline
- Iceberg
- Iceberg/Migration Dependencies
- Java
- Jupyter
- Jupyter/Administration
- Kerberos
- Kerberos/Administration
- Maintenance Schedule
- Managing systemd timers
- Manual maintenance
- MariaDB
- Matomo
- Mediawiki History Snapshot Check
- Mediawiki history reduced algorithm
- Page and user history reconstruction
- Page and user history reconstruction algorithm
- Presto
- Presto/Administration
- Presto/Query Logger
- Refine
- Refine/Deploy Refinery
- Refine/Deploy Refinery-source
- Reportupdater
- Revision augmentation and denormalization
- Siege
- Spark
- Spark/Administration
- Superset
- Superset/Administration
- Superset/Date functions
- System Users
- Turnilo
- Varnishkafka
- Wikistats
- Wikistats/Deprecation of Wikistats 1
- Wikistats/Traffic
- Wikistats 2
- Wikistats 2/Map Component
- Wikistats 2/Metrics/FAQ
- Wikistats 2/Smoke Testing
- analytics.wikimedia.org
- ua-parser
- ua-parser/2019-09-18 Update