Data Platform/Data Lake/Traffic/UserRetention
Retention
There are different definitions for what is traditionally consider a "user retention" metric on web properties. Below are some of these definitions and, when possible, we explain how could we calculate them using the data we have.
Definition 1: The ratio of returning visitors to all visitors. Daily and Monthly.
We have this data as part of the data in Unique Devices Table[1]. The "uniques_underestimate" column represents devices that have come to wikimedia projects more than once in the period in which our WMF-Last-Access cookie expires. That is, if we are looking at unique devices data, a "seen-before" device means that a user used that device to access wikipedia at least twice in the last 30 days.
The "uniques_offset" column in Unique Devices Table[2] represents fresh sessions, that is, non returning users.
Returning Visitors is thus defined as unique devices that have come to visit any project in mobile or desktop in the last 30 days. We can report on this metric daily and monthly.
Non returning visitors is defined as fresh sessions (no cookies) on the day in which the metric is reported.
See plot for eswiki:
Findings:
- Ratio of returning daily visitors is fairly constant on both mobile and desktop and seems to fluctuate more on desktop. In plain english this means that any given day about 85% of our visitors have read wikipedia at least once before in the past 30 days.
Findings:
- As we know pageviews exhibits a strong weekly pattern that is not visible on return ratios.
How to measure an increase in readers using retention
An increase of readership for one project will be first visible as a simultaneous increase on Unique Devices and also as a decrease on percentage of returned visits for the project at hand while at the same time we might also see an increase on the number of pageviews in a short timeperiod. If we are able to retain those users both the unique devices metric and pageviews should be consistently higher but percentage of returned visitors will not change very much.
Probably there would be too much noise to see changes "overall" thus the increase would need to be measured per country. We do have per country stats on Unique Devices thus we could calculate retention per country too.
Definition 2: The number of returning visitors (devices) after a month or a a day
We also have this number raw as part of the Unique Devices data. We can plot it for every project in desktop or mobile daily and monthly. An absolute number is far less useful that a ratio though.
Definition 3: Average frequency of the returned visitors
In plain english: We want to know the percentage of our users that return to read wikipedia after a week or a day or a month. The return rate on a month we can already know (see definition 1) but not the two other ones.
In order to calculate this we would need an extra piece of information from varnish. The value the WMF-Last-Access cookie had before we updated it that way we can know when the user came to wikipedia last.