Wiki Replica redaction
This page is currently a draft. More information and discussion about changes to this draft on the talk page. |
This page is to document how the data is sanitized for the Wiki Replicas public databases that Wikimedia Cloud Services provides.
Main admin docs for this are in Portal:Data Services/Admin/Wiki Replicas.
Sanitariums
The replication into the Sanitarium hosts uses triggers and filters to remove sensitive columns, tables and databases in the simple case where there are no conditions (e.g. ensures user_password
does not end up in Wiki Replicas).
More technical details can be found at Portal:Data_Services/Admin/Wiki_Replicas#Step_1:_sanitization.
There is also a check_private_data_report
script to make sure redaction happened properly. This runs weekly via cron and emails the DBAs the results when a mismatch is found.
More technical details can be found at Portal:Data_Services/Admin/Wiki_Replicas#Step_2:_evaluation
Wiki Replica views
In operations/puppet.git, modules/profile/templates/wmcs/db/wikireplicas/maintain-views.yaml contains views that define what is public. This contains conditional redactions that cannot be done at sanitarium (e.g. revision delete), and also serves as defense in depth in case one of the sanitarium redactions fail.
More technical details can be found at Portal:Data_Services/Admin/Wiki_Replicas#Step_6:_setting_views.
Document redaction decisions
TODO: include documentation/rationale on any info publicly exposed that is not publically exposed by MW.
Other
Note: operations/software/redactron.git and operations/software/labsdb-auditor.git contain historical software which is no longer used.