Jump to content

Data Platform/Data Lake/Edits/Metrics

From Wikitech

This table stores daily and monthly metrics computed over the denormalized mediawiki history dataset. It is partitioned by wiki_db and metric name to facilitate using its data outside of Hive, namely for display in Dashiki.

Schema


col_name	data_type	comment
dt                  	string              	The date of this measurement, as YYYY-MM-DD
value               	bigint              	The measurement     
snapshot            	string              	Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)
metric              	string              	The metric being computed to measure
wiki_db             	string              	The wiki this measurement pertains to
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
snapshot            	string              	Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)
metric              	string              	The metric being computed to measure
wiki_db             	string              	The wiki this measurement pertains to

As May 2023, possible values for metric include:

1. daily_edits
2. daily_edits_by_anonymous_users
3. daily_edits_by_bot_users
4. daily_edits_by_registered_users
5. daily_unique_anonymous_editors
6. daily_unique_bot_editors
7. daily_unique_editors
8. daily_unique_page_creators
9. daily_unique_registered_editors
10. monthly_new_editors
11. monthly_new_registered_users
12. monthly_surviving_new_editors

Definition

The Hive queries that generate these metrics are in wikimedia/analytics-refinery/oozie/mediawiki/history/metrics.