Tool Labs/Database plan

From Wikitech
Jump to: navigation, search

This is obsolete - as of 10 August 2013, database replication for some wikis is live to Tool Labs. See https://www.mediawiki.org/wiki/File:Tool_Labs_presentation_%28Hackaton_2013%29.pdf .

Production Data Replication

Overview

  • All public wikis will be replicated to the LabsDB servers, with private user data redacted.
  • First, data will be replicated to a special set of database servers (PreLabsDBDBS) that use triggers to rewrite or remove private data. They will write row based binlogs. Production shards will map 1:1 with mysql instances, unlike on toolserver where some are combined via a custom replication engine.
  • Triggers will be created with the help of the redactatron schema review tool.
  • The actual labs databases will replicate from the above mentioned databases. Users will access data via views that only include reviewed tables and columns to ensure that unreviewed tables (such as from a new extension) aren't exposed without prior review.
  • Replicated data will be stored on flash storage, while each system will have a traditional disk array attached to store labs project data. Users will be able to join project tables against wiki tables, but only within the current shard.
  • The labs team will integrate these databases with labs, automating database creation and access on a per-project basis.

Implementation Steps

  • Convert a set of production shards to innodb_file_per_table via mysqldump and load.
    • Required so that the tables from different logical databases can easily be hosted on different storage systems.
    • Status: Done
  • Puppet support for maintaining multiple mysql instances per server
    • Status: Done
  • Server provisioning of PreLabsDBDBS
    • db1053, db1054, db1057
    • Status: Done
  • Write schema review tool (Redactatron)
    • Status: Now needs to be rewritten after the initial labsdb config for ongoing maintenance. Needs additional support for special advanced filtering logic in views and removing whole tables via replication filtering.
  • Review schema, generate triggers, cleanup scripts, and views
    • Status: Done
  • Application of triggers to PreLabsDBDBS and deletion of all prior private data
    • Status: Done, except for CentralAuth
    • ETA: Soon after the AMS hackathon (early 2013-06)
  • Provision LabsDB hardware with external disk arrays
    • Status: Done
  • Configure mysql on LabsDB hardware and setup data replication
    • Status: Done for s[1,2,4,5]
    • ETA for s3,s6 - 2013-05-25
    • ETA for s7 (with CentralAuth): 2013-06-12

Integration of replicated LabsDB systems with ToolLabs

Steps to be listed [Marc-André Pelletier]