Analytics/AQS/Wikistats 2/Metrics Definition

From Wikitech

This page defines the metrics presented in Wikistats 2 and its API

Dimensions

Time boundaries - [start; end [

The boundaries in time in between which the metrics are computed. Any metric defined below needs it.

Time granularity - {daily, monthly}

Aggregation period for the metric. We provide daily or monthly aggregation period for almost all metrics.

Project

Filter for the wiki project you're interested in, represented as its main internet domain: en.wikipedia.org or www.wikidata.org for instance. The metrics cover almost all wiki projects (see Analytics/AQS/Wikistats 2/Data Quality#Other things).

For metrics making sense over multiple projects, you can use all-projects to have the value accross all projects. Project-families can be used for all metrics except number of edited pages and number of editors (for technical reasons, we plan overcome that limitation in the future). You can access statistics aggregsated for all wikipedias using all-wikipedia-project, while for instance all-wikivoyage-projects will give you statistics aggregated over the wikivoyage project-family. Note: wikidata is not a project family but a project on its own, so you shouldn't use all-wikidata-projects.

Editor-type - {anonymous, group-bot, name-bot, user, all-editor-types}

Filter or the the type of editors having performed the related action. Can be anonymous for users not logged in, group-botfor logged in users that are part of the bot group, name-botfor logged in users whose name contains `bot` (high probability of being a bot, even if counter examples exists), and userfor logged in users not in group-bot nor name-bot sets. Finally, you can use all-editor-typesnot to filter by editor type.

Page-type - {content, non-content, all-page-types}

Filter for the type of page over which the action is performed. Can be contentfor pages belonging in content namespaces. This page type is also referred to as articles, and for most wikis includes pages in namespace 0 only. Can also be non-contentfor pages in namespaces not considered content (talk pages, user pages etc). Finally you can use all-page-typesnot to filter by page type.

Activity-level - {1..4-edits, 5..24-edits, 25..99-edits, 100..-edits, all-activity-levels}

Filter for the activity level of editors or pages, meaning their number of edits over the aggregation time period of the metric. Values are pretty self-explanatory here and can be 1..4-edits, 5..24-edits, 25..99-edits, 100..-edits. As for other filtering dimensions, you can use all-activity-levels not to filter by activity level.

Metrics

Number of edits

The count of edits (or revisions), including edits on redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

Sum of net bytes difference

The sum of the difference in bytes made by each edit (or revisions), including edits on redirects. It is to be noted that this difference can be either positive in case of more content being added, or negative in case of content being removed. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

Sum of absolute bytes difference

The sum of the absolute difference in bytes made by each edit (or revisions), including edits on redirects. In comparison to the previous metric which value can be positive or negative, this metric uses the absolute value of modified bytes, therefore is always positive. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

Number of new edited pages

The count of new pages having been created, excluding pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

Number of edited pages

The number of pages having been edited, and how much, excluding pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type, page-type and activity-level.

List of most edited pages

The list of 100 pages having been most edited, excluding pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

List of pages with biggest net bytes difference

The list of 100 pages with the highest sum of net bytes difference, excluding pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

List of pages with biggest absolute bytes difference

The list of 100 pages with the highest sum of absolute bytes difference, excluding pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

Number of editors

The count of editors having performed edits, and how much, including on pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type, page-type and activity-level.

List of editors with biggest number of edits

The list of 100 editors (either user-ids or user IPs if anonymous) having performed most edits, including on pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

List of editors with biggest net bytes difference

The list of 100 editors (either user-ids or user IPs if anonymous) with the highest sum of net bytes difference, including on pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

List of editors with biggest absolute bytes difference

The list of 100 editors (either user-ids or user IPs if anonymous) with the highest sum of absolute bytes difference, including on pages being redirects. Dimensions available for this metric are time boundaries, time granularity, project, editor-type and page-type.

Newly Registered Users

The count of users having registered themselves. This metric excludes user-account created automatically through the auto-login system, and accounts created by peers. Dimensions available for this metric are time boundaries, time granularity and project. Note: Thanks to CentralAuth, we now are sure that the same username in different wiki-projects belong to the same user, and actually, only the first registration is counted in the new-registered-users metric since the system then automatically creates user-accounts at visit of other projects, and automatic account-creations are not counted. However, For data before CentralAuth, we have no way to know if two accounts with the same name on different wikis belonged to the same user or not. The metric we provide for periods before CentralAuth does NOT deduplicate accounts by name across projects, and therefore can be somehow overcounting when used with project-families.