Jump to content

User:Ottomata/Data Platform History

From Wikitech

Timeline

2007 Webrequest logs over UDP

2010 Analytics Upgrade (strategy) defined as part of Wikimedia Strategy (warehouse for GLAM, base metrics, A/B testing)

2009-2010 - ClickTracking and contributor data metrics

2012 Editor Engagement Experimentation team (E3) launched

2012 E3 creates EventLogging

2012 - Analytics team formed

2012 - Analytics takes on wikistats 1

2012 - Research team begins standardizing editor metrics

2012-2013 WikiMetrics tool for analyzing cohorts of wiki users

2012-2013 - Analytics team builds Limn for public dashboarding

2012-2013 - Hadoop (kraken -> analytics-hadoop cluster) + Oozie, Pig

2013 Wikimetrics for user cohort analysis

2013 Kafka chosen as replacement for udp logging

2013-2014 varnishkafka and kafkatee

2013-2014 Camus deployed to import webrequest and eventlogging data from Kafka into Hadoop

2013-2014 Hive replacing Pig

2015 Dashiki for public dashboards

2015 Parquet introduced for refined Webrequests

2015 EventBus project (events for production)

2015 change-propagation

2015 Wikidata Query Service deployed (BlazeGraph)

2015 Spark introduced

2016 Matamo

2016 Druid + Turnilo (slides)

2016 AQS 1.0

2016 Wikistats 2.0 on AQS

2016 MediaWiki History Denormalized datasets

2017 Refine job created for EventLogging data in Hive

2017 MediaWiki Change-propagation JobQueue

2017 “Data Lake” term appears

2017 JupyterHub

2018 Superset

2018 Presto

2018 Modern Event Platform program begins

2018 EventGate replaces eventlogging-service-eventbus

2019 EventLogging analysis in MySQL sunsetted

2020 WDQS Flink based streaming updater

2020 Metrics Platform program begins

2021 Analytics team renamed Data Engineering Team

2021 Airflow introduced

2022 AQS 2.0 project starts

2022 DataHub

2022 Iceberg introduced

2023 dse-k8s kubernetes cluster

2023 Data Platform Engineering team creation and reorg

2023-2024 Flink enrichment and Search Update Pipeline

See also

Event_Platform#History

Original timeline draft (more detail)