Analytics/Data Lake/Traffic/mobile apps session metrics

From Wikitech

The tables mobile_apps_session_metrics and mobile_apps_session_metrics_by_os (available on Hive) contain aggregate stats about pageview sessions on the Android and iOS Wikipedia mobile apps, updated weekly.

  • mobile_apps_session_metrics_by_os calculates them for a timespan of 7 days each, for iOS and Android separately. Deprecated as of June 2023 (T329310)
  • mobile_apps_session_metrics has numbers for overlapping 30-day timespans, without distinguishing between the two apps. Deprecated as of 2018.

A session is defined as a sequence of pageviews from the same app ID that does not exceed 30 minutes of inactivity.

Example application: Impact of the 2016 release of the Android app's feed feature on session length

In both tables, each row provides the minimum, maximum, and four percentiles (10th, 50th, 90th, 99th) for one of the following three metrics, as well as an overall count:

  • Sessions per user (i.e. app UUID)
  • Pageviews per session
  • Session length (the time between the first and last event. Not reported for sessions consisting of only one pageview)


Schema

> DESCRIBE wmf.mobile_apps_session_metrics_by_os;

col_name	data_type	comment
year	int	Unpadded year of report run date
month	int	Unpadded month of report run date
day	int	Unpadded day of report run date
date_range	string	Period for which report was run
type	string	Type of session metric
os_family	string	OS family breakdown
count	int	Value of count for given metric
min	int	Min value for given metric
max	int	Max value for given metric
p_1	string	1st Percentile for given metric
p_50	string	50th Percentile for given metric
p_90	string	90th Percentile for given metric
p_99	string	99th Percentile for given metric

Notes:

  • For each metric, the count variable refers to the length of the series that was used to calculate the quantiles. So for sessions per user, the count is the number of users; for pageviews per session, the count is the number of sessions; for session length, the count is the number of sessions that have more than one pageview (see the code).

Caveats

  • The field name p_1 and its description are inaccurate - it contains the 10th percentile, not the 1st.
  • The stats only cover users who have opted-in to usage data (iOS) / have not opted out (Android).
  • On Jan 11, 2018, we found a bug (T184768) in the quantiles function which would result in imprecise ranges of quantiles.
  • Session length is calculated as the difference between the first and last pageview timestamp in a session. This means 1) only sessions with at least two pageviews are counted and 2) the recorded session length should be less than or equal to the actual session length on the client side.

See also

  • Scala code that calculates the data
  • phab:T86535 2015 task about the introduction of these metrics, with an outline of the calculation method
  • phab:T117615 2015/16 task about the creation of the by_os variant (providing the data separately for iOS and Android, and for weekly instead of monthly timespans)