Talk:Analytics/Data Lake/Traffic/Unique Devices/Last access solution

From Wikitech
Jump to: navigation, search

Scope of reports

The current version of Analytics/Unique_clients/Last_access_solution [1] says that the objective of this project is, "The analytics team aims to count unique clients per project per month in a way that does not uniquely identify, fingerprint or otherwise track users." I think this is a reasonable. However the implementation is tracking users at a day granularity, persisted by a cookie that you're attempting to make last 1000 years. I think the implementation is too invasive into the user's privacy, and goes beyond the stated purpose of the project.

For comparison, with anonymous users, we currently only set session cookies, so all evidence that a user has visited a wiki goes away when they close the browser. Setting a cookie that lasts until the end of the month, identifying that the browser has visited within the month seems like a reasonable tradeoff, and fulfills the objective of this project.

Very sorry, it should say "daily and monthly", my mistake, corrected now. Nuria (talk) 18:39, 23 March 2015 (UTC)
We always wanted monthly and daily, but our initial approach to solve this made counting daily too expensive to implement, so we wanted to make clear to stakeholders we would only deliver monthly. Around March 9 2015, we came up with a slightly different approach that made reporting daily trivial, so we're adding it as a requirement. Kevinator (talk) 18:47, 23 March 2015 (UTC)
kevin, can you point me to the projects / stakeholders who are driving the daily requirement? We should be invading our user's privacy to the very minimum necessary to meet our business needs, so I want to make sure the need is well documented when we make significant changes like this. csteipp (talk) 01:46, 8 April 2015 (UTC)
Please let us know in what way having the date you last accessed one of the wikipedia projects is a privacy concern, the idea of not expiring the cookie after 30 days is to avoid time calculations at the varnish layer. Nuria (talk) 18:46, 23 March 2015 (UTC)
Yes, it's a privacy concern. The time calculation for expiring 31 days in the future should be fairly trivial-- I think that is well worth limiting the impact on our users. The calculation to get to the end of the month is slightly more complicated, but again, I think it's well worth it to limit the impact this has on our users. csteipp (talk) 23:47, 23 March 2015 (UTC)
We can certainly change it but can you explain why is that a privacy concern? The information contained in the cookie is the same -regardless of expiration date- it has two bits: 1) when did you last access wmf projects and 2) how long has it been since you last access (derived from 1) Nuria (talk) 17:43, 25 March 2015 (UTC)
Chris, I just wanted to clear this up a bit. So, we have to set an expiration date no matter what, because otherwise the cookie would expire with the browser session. So the question is, how far in the future should we set it. The minimum required for this feature is "at the beginning of next month". However, that means extra computation so we could simplify to 31 days. That would be fine for us, and I think performance wise what we would get from time saved in date arithmetic we would lose in bandwidth used up to receive the unexpired cookie. So we're happy to follow your guidance here, but would 31 day expiration be ok? milimetric (talk) 20:07, 25 March 2015 (UTC)
Nuria / Milimetric, since we're going from only setting session cookies to setting a permanent marker on the user's computer that identifies them as having visited a WMF site, we should try to impact the users as minimally as possible. Having a cookie stick around for up to 31 extra days unnecessarily does not seem worth doing a little extra work when setting the cookie to expire it at the end of the month. If you're worried about the performance in varnish for doing the calculation, let's work together to figure it out. I'm pretty sure we can tune the code you have to set an appropriate expiration with the same performance. csteipp (talk) 01:46, 8 April 2015 (UTC)
Chris, I'm sorry I missed this message. I should've paid closer attention, and I realize now it seems like we ignored your request. Maybe in the future we can talk about implementation details in Phabricator. For now, the cookie is live but I can still work on a patch and make it better. Ping me on IRC and we can talk. --milimetric (talk) 16:24, 7 May 2015 (UTC)

Incognito mode and fresh sessions

The following statement in the "Nocookie Offset" section seems a bit confused regarding the relation between incognito browsing and browser sessions:

"Per x-analytics documentation every request that comes in without cookies whatsoever is tagged with nocookie=1. These are requests are either bots, users browsing with cookies off or users using an "incognito" mode and thus a fresh browser session."

At least on Chrome and Firefox, incognito/private sessions have cookies like any other browser session, except that these cookies are deleted completely at the end of the incognito session (whereas cookies without session expiration, such as our Last-Access cookies, will survive the end of a non-incognito session). I think what is meant here is that a nocookie-tagged request could be the first visit within a new incognito session. But there are other possibilities: A device visiting the site for the very first time, or a device having shedded all previous cookies because both of the following two things happened: More than a month passed since the previous visit (i.e. the Last-Access cookie expired) and a new browser session was started (i.e. other, session-based WMF cookies expired).

Regards, HaeB (talk) 08:21, 10 July 2017 (UTC)

it should say "fresh incognito session". It is just an example of a request that will come w/o cookies, not an exhaustive list