User:Ottomata/Data Platform History
Timeline
2007 Webrequest logs over UDP
2010 Analytics Upgrade (strategy) defined as part of Wikimedia Strategy (warehouse for GLAM, base metrics, A/B testing)
2009-2010 - ClickTracking and contributor data metrics
2012 Editor Engagement Experimentation team (E3) launched
2012 E3 creates EventLogging
2012 - Analytics team formed
2012 - Analytics takes on wikistats 1
2012 - Research team begins standardizing editor metrics
2012-2013 WikiMetrics tool for analyzing cohorts of wiki users
2012-2013 - Analytics team builds Limn for public dashboarding
2012-2013 - Hadoop (kraken -> analytics-hadoop cluster) + Oozie, Pig
2013 Wikimetrics for user cohort analysis
2013 Kafka chosen as replacement for udp logging
2013-2014 varnishkafka and kafkatee
2013-2014 Camus deployed to import webrequest and eventlogging data from Kafka into Hadoop
2013-2014 Hive replacing Pig
2015 Dashiki for public dashboards
2015 Parquet introduced for refined Webrequests
2015 EventBus project (events for production)
2015 change-propagation
2015 Wikidata Query Service deployed (BlazeGraph)
2015 Spark introduced
2016 Matamo
2016 AQS 1.0
2016 Wikistats 2.0 on AQS
2016 MediaWiki History Denormalized datasets
2017 Refine job created for EventLogging data in Hive
2017 MediaWiki Change-propagation JobQueue
2017 “Data Lake” term appears
2017 JupyterHub
2018 Superset
2018 Presto
2018 Modern Event Platform program begins
2018 EventGate replaces eventlogging-service-eventbus
2019 EventLogging analysis in MySQL sunsetted
2020 WDQS Flink based streaming updater
2020 Metrics Platform program begins
2021 Analytics team renamed Data Engineering Team
2021 Airflow introduced
2022 AQS 2.0 project starts
2022 DataHub
2022 Iceberg introduced
2023 dse-k8s kubernetes cluster
2023 Data Platform Engineering team creation and reorg
2023-2024 Flink enrichment and Search Update Pipeline
See also
Original timeline draft (more detail)