Talk:Analytics/Data Lake/Traffic/Pageview hourly/Identity reconstruction analysis

From Wikitech
Jump to navigation Jump to search

UA Approach Recommendation

I see a couple options listed regarding the UA to be removal of user_agent_map altogether or having some threshold for inclusion. I recommend that we retain the following -

  • os_family
  • os_major
  • os_minor
  • browser_family

- for all such distinct maps having at least 1000 daily members.

As part of the UA transformation, the Wikipedia apps UA info should be transformed such that browser_family (can we rename to "browser" to avoid confusion?) becomes "WikipediaApp" (instead of, for example, "Mobile Safari" or "Android"). Here are examples of some user_agent_map values for the apps today.

{"browser_major":"-","os_family":"iOS","device_family":"Generic Smartphone","os_major":"7","browser_family":"Mobile Safari","wmf_app_version":"4.1.3","os_minor":"1"}

{"browser_major":"4","os_family":"Android","device_family":"Generic Smartphone","os_major":"4","browser_family":"Android","wmf_app_version":"2.0.110-r-2015-08-31","os_minor":"4"}
That covers part of this example but leaves other vectors of attack opened, this is an example of a wider problem explained here: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly/K_Anonymity_Threshold_Analysis Nuria (talk) 17:38, 11 May 2017 (UTC)[reply]