MariaDB/Sanitarium and Labsdbs

From Wikitech
Jump to: navigation, search

Sanitarium is the name used for the MySQL/MariaDB instances and configuration that redact private information from the production wiki databasess before replicating to the Wiki Replicas instances which are accessible from the Cloud Services network. Main article is Portal:Data Services/Admin/Wiki Replicas.

Sanitarium

role::mariadb::sanitarium_multisource

role::mariadb::sanitarium_multisource (db1095.eqiad.wmnet) replicates via single instance using multi-source replication.

role::mariadb::sanitarium_multiinstance

role::mariadb::sanitarium_multiinstance (db1102.eqiad.wmnet) replicates via separate MariaDB instances per shard. It is intended to eventually replace role::mariadb::sanitarium_multisource for all replication to Cloud Services.

2017-10-18: runs 4 mysqld instances on ports 331[2467], corresponding to the shards s2, s4, s6, s7:

 $ netstat -tlp | grep mysqld
 tcp6       0      0 [::]:3312               [::]:*                  LISTEN      43118/mysqld    
 tcp6       0      0 [::]:3314               [::]:*                  LISTEN      440/mysqld      
 tcp6       0      0 [::]:3316               [::]:*                  LISTEN      2751/mysqld     
 tcp6       0      0 [::]:3317               [::]:*                  LISTEN      3837/mysqld

datadir is in /srv/sqldata.s[2467] and tmpdir in /srv/tmp.s[2467]

All but certain filtered tables are replicated.

FIXME: Why separate instances?

Triggers

The filters (triggers) used to redact the data are created by Redactatron, specifically redact_standard_output.sh. The following triggers are defined for each wiki's schema:

 +-----------------------------+----------------------+---------------+
 | TRIGGER_NAME                | TABLE                | TYPE          |
 +-----------------------------+----------------------+---------------+
 | abuse_filter_log_insert     | abuse_filter_log     | BEFORE INSERT |
 | abuse_filter_log_update     | abuse_filter_log     | BEFORE UPDATE |
 | aft_article_feedback_insert | aft_article_feedback | BEFORE INSERT |
 | aft_article_feedback_update | aft_article_feedback | BEFORE UPDATE |
 | archive_insert              | archive              | BEFORE INSERT |
 | archive_update              | archive              | BEFORE UPDATE |
 | mark_as_helpful_insert      | mark_as_helpful      | BEFORE INSERT |
 | mark_as_helpful_update      | mark_as_helpful      | BEFORE UPDATE |
 | recentchanges_insert        | recentchanges        | BEFORE INSERT |
 | recentchanges_update        | recentchanges        | BEFORE UPDATE |
 | revision_insert             | revision             | BEFORE INSERT |
 | revision_update             | revision             | BEFORE UPDATE |
 | user_insert                 | user                 | BEFORE INSERT |
 | user_update                 | user                 | BEFORE UPDATE |
 +-----------------------------+----------------------+---------------+

Each trigger includes several tasks, e.g. SET NEW.user_password = ''

  • Monitor that all schemas have their filters in place—produce an error on icinga otherwise. There were some wikis that didn't have the filter.
 SELECT SCHEMATA.SCHEMA_NAME FROM SCHEMATA WHERE SCHEMATA.SCHEMA_NAME NOT IN (SELECT EVENT_OBJECT_SCHEMA FROM TRIGGERS)

Wiki Replica views

The Wiki Replica hosts filter again by using custom views which set certain fields to NULL and preventing normal users from seeing the underlying tables directly. For example:

customviews:
  abuse_filter:
    source: abuse_filter
    view: >
      select af_id, if(af_hidden,null,af_pattern) as af_pattern,
      af_user, af_user_text, af_timestamp, af_enabled,
      if(af_hidden,null,af_comments) as af_comments, af_public_comments,
      af_hidden, af_hit_count, af_throttled, af_deleted, af_actions, af_global, af_group
CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `abuse_filter` AS select `enwiki`.`abuse_filter`.`af_id` AS `af_id`,if(`enwiki`.`abuse_filter`.`af_hidden`,NULL,`enwiki`.`abuse_filter`.`af_pattern`) AS `af_pattern`,`enwiki`.`abuse_filter`.`af_user` AS `af_user`,`enwiki`.`abuse_filter`.`af_user_text` AS `af_user_text`,`enwiki`.`abuse_filter`.`af_timestamp` AS `af_timestamp`,`enwiki`.`abuse_filter`.`af_enabled` AS `af_enabled`,if(`enwiki`.`abuse_filter`.`af_hidden`,NULL,`enwiki`.`abuse_filter`.`af_comments`) AS `af_comments`,`enwiki`.`abuse_filter`.`af_public_comments` AS `af_public_comments`,`enwiki`.`abuse_filter`.`af_hidden` AS `af_hidden`,`enwiki`.`abuse_filter`.`af_hit_count` AS `af_hit_count`,`enwiki`.`abuse_filter`.`af_throttled` AS `af_throttled`,`enwiki`.`abuse_filter`.`af_deleted` AS `af_deleted`,`enwiki`.`abuse_filter`.`af_actions` AS `af_actions`,`enwiki`.`abuse_filter`.`af_global` AS `af_global`,`enwiki`.`abuse_filter`.`af_group` AS `af_group` from `enwiki`.`abuse_filter`

When a labsdb host has to be rebuilt from zero or if a new wiki is created, maintain-views.py is used to create/update the views.