Talk:Data Engineering/Systems/Event Data retention

Rendered with Parsoid
From Wikitech

Outdated parts

This page seems a bit outdated in some aspects:

  • The clientIp field (hashed IP) was removed from the event capsule more than a year ago.
  • "It is being reviewed right now" (now = July 2016) - should note the outcome of the review.

Regards, HaeB (talk) 15:34, 3 May 2017 (UTC)Reply

Thanks for spotting this! I've updated the reference to clientIp. Sadly enough, the "It is being reviewed right now" part is still valid :], meaning it is still being reviewed. This task has been depending on the DBA team for a long time because of lack of resources (they have lots of urgent work). As you saw, we Analytics have recently taken that task and will be tackling this ourselves. Mforns (talk) 20:03, 3 May 2017 (UTC)Reply

Browsing history

Regarding "browsing history: the pages visited by a user": I understand that this part refers to the information that a particular page was viewed by a particular user. Clearly, the names of the viewed pages per se are not sensitive personal information (we even publish them all the time as part of our public pageview data). Regards, HaeB (talk) 15:34, 3 May 2017 (UTC)Reply

Yes, exactly. "Any information that both" has a PII (meaning anything that can potentially identify a user, like ID, editCount, userAgent, etc.) AND contains browser patterns or other similar data that can convey personal preferences. I changed a bit the text to make clear that we refer to the combination of browser history AND its identified user. Thanks for the comment :] Mforns (talk) 20:11, 3 May 2017 (UTC)Reply

Configuration for EventLogging purge strategy

One piece of information I couldn't find in this otherwise comprehensive document is where the purging strategy is configured for each EventLogging schema. I see the per-field whitelist, but not the overall strategy corresponding to Purging Strategies. Awight (talk) 14:55, 4 March 2020 (UTC)Reply