Data Platform/Data Lake/Traffic/mobile apps session metrics
Appearance
The tables mobile_apps_session_metrics
and mobile_apps_session_metrics_by_os
(available on Hive) contain aggregate stats about pageview sessions on the Android and iOS Wikipedia mobile apps, updated weekly.
mobile_apps_session_metrics_by_os
calculates them for a timespan of 7 days each, for iOS and Android separately. Deprecated as of June 2023 (T329310)
mobile_apps_session_metrics
has numbers for overlapping 30-day timespans, without distinguishing between the two apps. Deprecated as of 2018.
A session is defined as a sequence of pageviews from the same app ID that does not exceed 30 minutes of inactivity.
In both tables, each row provides the minimum, maximum, and four percentiles (10th, 50th, 90th, 99th) for one of the following three metrics, as well as an overall count:
- Sessions per user (i.e. app UUID)
- Pageviews per session
- Session length (the time between the first and last event. Not reported for sessions consisting of only one pageview)
Schema
> DESCRIBE wmf.mobile_apps_session_metrics_by_os; col_name data_type comment year int Unpadded year of report run date month int Unpadded month of report run date day int Unpadded day of report run date date_range string Period for which report was run type string Type of session metric os_family string OS family breakdown count int Value of count for given metric min int Min value for given metric max int Max value for given metric p_1 string 1st Percentile for given metric p_50 string 50th Percentile for given metric p_90 string 90th Percentile for given metric p_99 string 99th Percentile for given metric
Notes:
- For each metric, the
count
variable refers to the length of the series that was used to calculate the quantiles. So for sessions per user, the count is the number of users; for pageviews per session, the count is the number of sessions; for session length, the count is the number of sessions that have more than one pageview (see the code).
Caveats
- The field name p_1 and its description are inaccurate - it contains the 10th percentile, not the 1st.
- The stats only cover users who have opted-in to usage data (iOS) / have not opted out (Android).
- On Jan 11, 2018, we found a bug (T184768) in the quantiles function which would result in imprecise ranges of quantiles.
- Session length is calculated as the difference between the first and last pageview timestamp in a session. This means 1) only sessions with at least two pageviews are counted and 2) the recorded session length should be less than or equal to the actual session length on the client side.
See also
- Scala code that calculates the data
- phab:T86535 2015 task about the introduction of these metrics, with an outline of the calculation method
- phab:T117615 2015/16 task about the creation of the by_os variant (providing the data separately for iOS and Android, and for weekly instead of monthly timespans)