MariaDB/Sanitarium and Labsdbs

From Wikitech
Jump to navigation Jump to search

Sanitarium is the name used for the MySQL/MariaDB instances and configuration that redact private information from the production wiki databasess before replicating to the Wiki Replicas instances which are accessible from the Cloud Services network. Main article is Portal:Data Services/Admin/Wiki Replicas.

Sanitarium

role::mariadb::sanitarium_multisource

role::mariadb::sanitarium_multisource (db1095.eqiad.wmnet) replicates via single instance using multi-source replication.

role::mariadb::sanitarium_multiinstance

role::mariadb::sanitarium_multiinstance (db1102.eqiad.wmnet) replicates via separate MariaDB instances per shard. It is intended to eventually replace role::mariadb::sanitarium_multisource for all replication to Cloud Services.

2017-10-18: runs 4 mysqld instances on ports 331[2467], corresponding to the shards s2, s4, s6, s7:

 $ netstat -tlp | grep mysqld
 tcp6       0      0 [::]:3312               [::]:*                  LISTEN      43118/mysqld    
 tcp6       0      0 [::]:3314               [::]:*                  LISTEN      440/mysqld      
 tcp6       0      0 [::]:3316               [::]:*                  LISTEN      2751/mysqld     
 tcp6       0      0 [::]:3317               [::]:*                  LISTEN      3837/mysqld

datadir is in /srv/sqldata.s[2467] and tmpdir in /srv/tmp.s[2467]

All but certain filtered tables are replicated.

FIXME: Why separate instances?

Triggers

The filters (triggers) used to redact the data are created by Redactatron, specifically redact_standard_output.sh. The following triggers are defined for each wiki's schema:

 +-----------------------------+----------------------+---------------+
 | TRIGGER_NAME                | TABLE                | TYPE          |
 +-----------------------------+----------------------+---------------+
 | abuse_filter_log_insert     | abuse_filter_log     | BEFORE INSERT |
 | abuse_filter_log_update     | abuse_filter_log     | BEFORE UPDATE |
 | aft_article_feedback_insert | aft_article_feedback | BEFORE INSERT |
 | aft_article_feedback_update | aft_article_feedback | BEFORE UPDATE |
 | archive_insert              | archive              | BEFORE INSERT |
 | archive_update              | archive              | BEFORE UPDATE |
 | mark_as_helpful_insert      | mark_as_helpful      | BEFORE INSERT |
 | mark_as_helpful_update      | mark_as_helpful      | BEFORE UPDATE |
 | recentchanges_insert        | recentchanges        | BEFORE INSERT |
 | recentchanges_update        | recentchanges        | BEFORE UPDATE |
 | revision_insert             | revision             | BEFORE INSERT |
 | revision_update             | revision             | BEFORE UPDATE |
 | user_insert                 | user                 | BEFORE INSERT |
 | user_update                 | user                 | BEFORE UPDATE |
 +-----------------------------+----------------------+---------------+

Each trigger includes several tasks, e.g. SET NEW.user_password = ''

  • Monitor that all schemas have their filters in place—produce an error on icinga otherwise. There were some wikis that didn't have the filter.
 SELECT SCHEMATA.SCHEMA_NAME FROM SCHEMATA WHERE SCHEMATA.SCHEMA_NAME NOT IN (SELECT EVENT_OBJECT_SCHEMA FROM TRIGGERS)

Wiki Replica views

The Wiki Replica hosts filter again by using custom views which set certain fields to NULL and preventing normal users from seeing the underlying tables directly. For example:

customviews:
  abuse_filter:
    source: abuse_filter
    view: >
      select af_id, if(af_hidden,null,af_pattern) as af_pattern,
      af_user, af_user_text, af_timestamp, af_enabled,
      if(af_hidden,null,af_comments) as af_comments, af_public_comments,
      af_hidden, af_hit_count, af_throttled, af_deleted, af_actions, af_global, af_group
CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `abuse_filter` AS select `enwiki`.`abuse_filter`.`af_id` AS `af_id`,if(`enwiki`.`abuse_filter`.`af_hidden`,NULL,`enwiki`.`abuse_filter`.`af_pattern`) AS `af_pattern`,`enwiki`.`abuse_filter`.`af_user` AS `af_user`,`enwiki`.`abuse_filter`.`af_user_text` AS `af_user_text`,`enwiki`.`abuse_filter`.`af_timestamp` AS `af_timestamp`,`enwiki`.`abuse_filter`.`af_enabled` AS `af_enabled`,if(`enwiki`.`abuse_filter`.`af_hidden`,NULL,`enwiki`.`abuse_filter`.`af_comments`) AS `af_comments`,`enwiki`.`abuse_filter`.`af_public_comments` AS `af_public_comments`,`enwiki`.`abuse_filter`.`af_hidden` AS `af_hidden`,`enwiki`.`abuse_filter`.`af_hit_count` AS `af_hit_count`,`enwiki`.`abuse_filter`.`af_throttled` AS `af_throttled`,`enwiki`.`abuse_filter`.`af_deleted` AS `af_deleted`,`enwiki`.`abuse_filter`.`af_actions` AS `af_actions`,`enwiki`.`abuse_filter`.`af_global` AS `af_global`,`enwiki`.`abuse_filter`.`af_group` AS `af_group` from `enwiki`.`abuse_filter`

When a labsdb host has to be rebuilt from zero or if a new wiki is created, maintain-views.py is used to create/update the views.

Sanitarium's master failover (for a multisource sanitarium)

This is an example of failing over a server that acts as a Sanitarium master.

Current master: db1066

Future master: db1106

Sanitarium server (multisource): db1095

  • Downtime db1065 and db1106
  • Depool db1065 and db1106 https://gerrit.wikimedia.org/r/#/c/420964/
  • Log the maintenance on #wikimedia-operations
    !log Stop db1065 and db1106 in sync - this will generate lag on labs
    
  • stop db1065 and db1106 in sync:
    ./repl.pl --stop-siblings-in-sync --host1=db1065.eqiad.wmnet:3306 --host2=db1106.eqiad.wmnet:3306
    
  • Verify they stopped on the same position
    mysql --skip-ssl -hdb1106 -e "show slave status\G" | grep Exec_Master ; mysql --skip-ssl -hdb1065 -e "show slave status\G" | grep Exec_Master
    
  • Check db1106 master log: 
    mysql --skip-ssl -hdb1106 -e "show master status\G"
    
  • Connect to db1095 (note: the below command is only for multisource sanitarium, not for multi-instance)
SET @@default_master_connection='s1'; stop slave; reset slave all;
change master 's1' to master_host='db1106.eqiad.wmnet', master_user='repl', master_password='xx' ,master_port=3306, master_log_file='db1106-bin.000009', master_log_pos=474802220, master_ssl=1;
  • check filters on the new s1 thread
  • start slave s1 on db1095
  • start slave on db1106
  • start slave on db1065
  • Repool db1106: https://gerrit.wikimedia.org/r/#/c/420974/