Talk:Analytics/Data Lake/Traffic/Pageview hourly/Identity reconstruction analysis
Rendered with Parsoid
Latest comment: 6 years ago by Nuria in topic UA Approach Recommendation
UA Approach Recommendation
I see a couple options listed regarding the UA to be removal of user_agent_map altogether or having some threshold for inclusion. I recommend that we retain the following -
- os_family
- os_major
- os_minor
- browser_family
- for all such distinct maps having at least 1000 daily members.
As part of the UA transformation, the Wikipedia apps UA info should be transformed such that browser_family (can we rename to "browser" to avoid confusion?) becomes "WikipediaApp" (instead of, for example, "Mobile Safari" or "Android"). Here are examples of some user_agent_map values for the apps today.
{"browser_major":"-","os_family":"iOS","device_family":"Generic Smartphone","os_major":"7","browser_family":"Mobile Safari","wmf_app_version":"4.1.3","os_minor":"1"} {"browser_major":"4","os_family":"Android","device_family":"Generic Smartphone","os_major":"4","browser_family":"Android","wmf_app_version":"2.0.110-r-2015-08-31","os_minor":"4"}
- That covers part of this example but leaves other vectors of attack opened, this is an example of a wider problem explained here: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly/K_Anonymity_Threshold_Analysis Nuria (talk) 17:38, 11 May 2017 (UTC)