Data Platform Engineering/Ops week/Analytics weekly train
...๐๐๐๐๐๐๐๐๐๐๐๐๐๐
Analytics deployment train
โ๏ธ Only add here stuff that has been merged.
โ๏ธ Link the task and the Gerrit patch.
โ๏ธ List the systems that need deploying, jar versions that need bump-ups, and jobs that need restarting, if there are any.
Extra points if you include what to run and where to run it (e.g. stat1007, an-coord1001...).
โ๏ธ Do you have a way of checking the deployment has been successful?
โ๏ธ Don't move stuff to "ready to deploy" in the kanban unless it's documented here.
โ๏ธ Check Data_Engineering/Ops_week#The_Data_Engineering_deployment_train_๐ for a pointer about Wikistats, as well as links for various types of deployments.
โ๏ธ To see the old log, go to https://etherpad.wikimedia.org/p/analytics-weekly-train/timeslider#59747.
Now use the log below. Eventually we could have some sub-pages or templates to streamline this.
YYYY-MM-DD NEXT TUESDAY TRAIN (REPLACE THIS AFTER DEPLOY)
Next deployment
2025-12-03
Deployer: Antoine
Refinery:
- task T409584 Add JA3N User-Agent queries https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1212214 and https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1213488 and https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1213522 (no need to do anything else!)
2025-11-18
Deployer: Marcel and Javier
Refinery:
- task T405039 - Add HQL for edit_per_editor_per_page_daily and pageview_per_editor_per_page_daily https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1196892
DONE
2025-11-12
Deployer: Joal
Refinery-source:
- task T406531 - Add new referral sources to pageview data https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1203389
- task T408178 - Remove mediawiki.wikistories_* santization allowlist entries https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1202718
- T407239 - Fix Duplicate Pageview metrics records in data quality tables. | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1203129
- T406000 Adapt mediawiki_history to the removal of mediawiki revision.rev_sha10 (1202334)
- 1203124: Fix bug MW Dumper in which vertical bars ( `|` ) were not being honored. | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1203124
- After refine-source release, we should:
- merge https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1795 that will pick up this fix on the File Export DAGs
- wait until merge request makes it to main Airflow instance
- delete DagProperties at https://airflow.wikimedia.org/variable/edit/372 , so that the auto-regenerated one points to new jar
- resume the following DAGs, which have been cleared and are ready to go:
- After refine-source release, we should:
Airflow:
- task T406531 - Add new referral sources to pageview data - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1796
- task T409470 - Fix mediawiki_history_dumps failure - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1797
- https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1795 (see above in refinery-source section)
2025-11-05
Deployer: Joseph
Refinery Source:
- 1199485: Add Data quality check for Pageview Human-Bot ratio anomaly | https://gerrit.wikimedia.org/r/c/aalytics/refinery/source/+/1199485
- task T406531 - Add new referral sources to pageview data https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1198313
- Mediawiki-History Bug fix: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1202191
Airflow:
- T407239 - Add Dag to run daily Human to Bot page views ratio check https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1776 This MR should be deployed after refinery source is deployed. It needs refinery-job jar v0.3.7
- task T406531 - Add new referral sources to pageview data https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1780 This MR should be deployed after refinery source is deployed. It needs refinery-hive jar v0.3.7
2025-10-29
deployer: Sandra
Refinery Source:
- 1198080: Fix various bugs on MW Dumper code. | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1198080
- 1198152: Add utility to create SHA256 fingerprints of the files of a particular HDFS folder. | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1198152
2025-10-22
To-be deployer: Aleksander
- Refinery Source
- Add user_central_id to the mediawiki_history dataset(s) https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1194951
2025-10-14
To-be deployer: Marcel
- Refinery
- task T405533 - Unique devices data uses non-standard domains for Wikidata, Wikifunctions, and MediaWiki.org https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1194885 . Note: This task has a pending Airflow patch to be merged/deployed once this one is deployed: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1743 [DONE]
- task T406000 - Adapt mediawiki_history to the removal of mediawiki revision.rev_sha1 - https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1196716 Nullify sha1 in Sqoop [DONE]
- Refinery Source
- T365203 - Add check for wikis count to Mediawiki history data quality checks https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1193440 [DONE]
- T365203 - Bug Fix: Add support for Deequ Metric value Distribution data type https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1195268 [DONE]
- task T406000 - Adapt mediawiki_history to the removal of mediawiki revision.rev_sha1 - https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1196049 and https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1196469. Note: This patch needs a related Airflow patch: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1750. This one also: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1196485 [DONE]
- task T384945 Modify code to dump all slots AND Template:PabT Adapt MW Content pipelines to the removal of upstream revision.rev_sha1 - https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1195330 [DONE]