The Commons Impact Metrics dumps consist of 5 datasets updated at a monthly schedule. They are formatted in TSV (tab separated values) and compressed using Bzip2. Some fields contain lists of strings; in which case, the strings are separated by | (pipe) symbols. You can download them from https://dumps.wikimedia.org/other/commons_impact_metrics/readme.html.
Category metrics snapshot
Field
|
Type
|
Description
|
category
|
string
|
The name of the category this row refers to. Coincides with the page title of the category page in Commons. URL version (with underscores).
|
parent_categories
|
list<string>
|
The immediate ancestor (parent) category names of this row's category.
|
primary_categories
|
list<string>
|
The top ancestor category names of this row’s category. They should be in the Commons institution category allow-list. Ideally, there should be only one primary category, but since we can not control that from MediaWiki, we accept multiple primary categories.
|
media_file_count
|
int
|
The number of media files contained in this (shallow) category.
|
media_file_count_deep
|
int
|
The number of media files contained in this (deep) category tree. Only available for primary allow-listed categories.
|
used_media_file_count
|
int
|
The number of media files from this (shallow) category featured in at least one wiki page.
|
used_media_file_count_deep
|
int
|
The number of media files from this (deep) category tree featured in at least one wiki page. Only available for primary allow-listed categories.
|
leveraging_wiki_count
|
int
|
The number of wikis featuring at least one of this (shallow) category’s media files.
|
leveraging_wiki_count_deep
|
int
|
The number of wikis featuring at least one of this (deep) category tree’s media files. Only available for primary allow-listed categories.
|
leveraging_page_count
|
int
|
The number of (namespace=0) pages featuring at least one of this (shallow) category’s media files.
|
leveraging_page_count_deep
|
int
|
The number of (namespace=0) pages featuring at least one of this (deep) category tree’s media files. Only available for primary allow-listed categories.
|
month
|
string
|
The month after the end of which we calculate the data (YYYY-MM). For example, if we are calculating the data after March 2024 (even if it’s i.e. April 4th) the value should be “2024-03”. This is so, to be consistent with the sibling incremental datasets (Pageviews by category, Pageviews by media file, and Edits).
|
Notes
- Each row corresponds to a category or sub-category.
- The metric values (int) are not aggregatable. All queries to this table should always filter or breakdown by category and month.
Media file metrics snapshot
Field
|
Type
|
Description
|
media_file
|
string
|
The name of the media file this row refers to. Coincides with the page title of the media file page in Commons. URL version (with underscores).
|
media_type
|
string
|
The media type of the media file, coming from the Image table (img_media_type): BITMAP, VIDEO, etc.
|
categories
|
list<string>
|
The category names that the media file is directly associated with.
|
primary_categories
|
list<string>
|
The top ancestor category names of the media file. They should be in the Commons institution category allow-list. Ideally, there should be only one primary category, but since we can not control that from MediaWiki, we accept multiple primary categories.
|
leveraging_wiki_count
|
long
|
The number of wikis featuring this media file at least in one (namespace=0) page.
|
leveraging_page_count
|
long
|
The number of (namespace=0) pages featuring this media file across all wikis.
|
month
|
string
|
The month after the end of which we calculate the data (YYYY-MM). For example, if we are calculating the data after March 2024 (even if it’s i.e. April 4th) the value should be “2024-03”. This is so, to be consistent with the sibling incremental datasets (Pageviews by category, Pageviews by media file, and Edits).
|
Notes:
- Each row corresponds to a media file. Media files that are not used in any wiki (leveraging_wiki_count=0) do not appear in this dataset.
- The metric values are not aggregatable. Queries to this dataset should always filter or breakdown by media_file and month.
Pageviews by category
Field
|
Type
|
Description
|
category
|
string
|
The name of the category this row refers to. Coincides with the page title of the category page in Commons. URL version (with underscores).
|
category_scope
|
string
|
Either “shallow” (meaning only media files directly associated with the category were used to aggregate pageviews) or “deep” (meaning all media files within the category and all its recursive subcategories were used to aggregate pageviews).
|
primary_categories
|
list<string>
|
The top ancestor category names of this row’s category. They should be in the Commons institution category allow-list. Ideally, there should be only one primary category, but since we can not control that from MediaWiki, we accept multiple primary categories.
|
wiki
|
string
|
The canonical name of the visualized wiki, i.e.: “en.wikipedia” or “fr.wiktionary”. Only wikis that feature at least one media file of the corresponding category will appear here.
|
page_title
|
string
|
The title of the visualized (namespace=0) page. URL version (with underscores). Only (namespace=0) pages featuring at least one media file of the corresponding category will appear here.
|
pageview_count
|
long
|
Aggregated pageview count for (namespace=0) pages featuring at least one media file from the category/scope. Rows with pageview_count=0 should be omitted!
|
month
|
string
|
The month for which we aggregate the data (YYYY-MM).
|
Notes:
- This dataset aggregates counts to (namespace=0) wiki pages that include media files belonging to the specified category.
- Each category (or sub-category) has 1 row for each page that includes its media files. Each page will have the corresponding pageview count.
- Primary categories have data for category_scope=shallow (media files associated directly with them) and for category_scope=deep (media files belonging to its whole category tree). Sub-categories only have shallow data.
- You can aggregate the pageview_count value only across the wiki, page_title, and month dimensions. All queries to this table should always filter or breakdown by category and category_scope.
- Pageviews to the Main page are not counted.
Pageviews by media file
Field
|
Type
|
Description
|
media_file
|
string
|
The name of the media file this row refers to. Coincides with the page title of the media file page in Commons. URL version (with underscores).
|
categories
|
list<string>
|
The category names that the media file is directly associated with.
|
primary_categories
|
list<string>
|
The top ancestor category names of the media file. They should be in the Commons institution category allow-list. Ideally, there should be only one primary category, but since we can not control that from MediaWiki, we accept multiple primary categories.
|
wiki
|
string
|
The canonical name of the visualized wiki, i.e.: “en.wikipedia” or “fr.wiktionary”. Only wikis that feature the media file at least once will appear here.
|
page_title
|
string
|
The title of the visualized (namespace=0) page. URL version (with underscores). Only (namespace=0) pages featuring the media file will appear here.
|
pageview_count
|
long
|
Aggregated pageview count for (namespace=0) pages featuring the media file. Rows with pageview_count=0 should be omitted!
|
month
|
string
|
The month for which we aggregate the data (YYYY-MM).
|
Notes:
- This dataset aggregates counts to (namespace=0) wiki pages that include the specified media files.
- Each media file has 1 row for each page that includes it. Each page will have the corresponding pageview count.
- You can aggregate the pageview_count value only across the wiki, page_title, and month dimensions. All queries to this table should always filter or breakdown by media_file.
- Pageviews to the Main page are not counted.
Edits
Field
|
Type
|
Description
|
user_name
|
string
|
The user name of the user who performed the edit. This is resolved from the actor table’s actor_name. If no actor is found, it is set to ‘anonymous’. If it has been suppressed, it is set to ‘redacted’.
|
edit_type
|
string
|
Either “create” (for the first revision of a media file page), or “update” (for all other revisions of the media file page).
|
media_file
|
string
|
The name of the edited media file. Coincides with the page title of the media file page in Commons. URL version (with underscores).
|
categories
|
list<string>
|
The category names that the media file is directly associated with.
|
primary_categories
|
list<string>
|
The top ancestor category names of the media file. They should be in the Commons institution category allow-list. Ideally, there should be only one primary category, but since we can not control that from MediaWiki, we accept multiple primary categories.
|
dt
|
timestamp
|
The timestamp of the edit.
|
Notes:
- This is an event-based dataset, each row corresponds to an edit event.
- You can aggregate across any set of dimensions.