Jump to content

Experimentation Lab/Privacy considerations

From Wikitech

Data Collection Guidelines outline best practices at the Wikimedia Foundation for managing privacy risk in data collection. Some criteria presented in this policy are based on the specific data that the instrument collects. Depending of those attributes, the risk level of an instrument may be increased. And because contextual attributes are the main way to collect data when using Experimentation Lab, depending of those ones, the risk level for an instrument may be affected.

The following are the specific combinations of contextual attributes that increase the risk level for an instrument. Otherwise the risk level of the instrument can be defined as Tier 3: Low risk

Combination Risk level
agent_ua_string + performer_id/performer_name/agent_app_install_id Tier 2: Medium risk
page_id/page_title + performer_id/performer_name Tier 2: Medium risk
page_id/page_title + agent_app_install_id Tier 2: Medium risk (only if end-user is logged-in)
agent_ua_string + performer_id/performer_name + page_id/page_title Tier 1: High risk

When registering/modifying your instrument via xLab, validation and advice will be given based on the above combinations.

xLab giving advice about selected contextual attributes and risk level

When using a custom schema, where additional attributes could be collected apart from the contextual ones, those ones should be considered by the instrument owner to check whether they might increase the risk level of the instrument. For now xLab is not supporting this case.

Check the Regulation section guide to get more details about how to fill that part when registering an instrument or experiment.