Jump to content

MariaDB/Provisioning a host

From Wikitech
This document is a work in progress and may be partially incorrect / incomplete
If you want to clone a MariaDB host please refer to this documentation https://wikitech.wikimedia.org/wiki/MariaDB/Clone_a_host
All patches are to be applied to the operations/puppet repository

Host preparation

Downtime the host in Icinga

Semi-automatically:

cookbook sre.hosts.downtime DBNAME --days 1 --task-id TXXXXXX --reason "provisioning - TXXXXXX"

Manually:

  1. Open https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=DBNAME.
  2. Select "Schedule downtime for checked hosts" from the top right menu.
  3. TBD: next steps

Adding a new host

  1. TBD

Replacing a failed host

  1. Identify an available host in the same DC
    1. TBD: how?
    2. TBD: What to do if there aren't any?
  2. Match the OS version the replaced host was running
    1. Reimage the host to Stretch if needed, following the instructions in Server_Lifecycle/Reimage

Section and replication configuration

Add the host to a section:

  1. In manifests/site.pp:
    1. Add the DBNAME to the appropriate regexps
    2. Remove the insetup role from DBNAME
  2. In hieradata/hosts/DBNAME.yaml add
    1. For single-instance hosts: mariadb::shard: 'SECTIONNAME'
    2. For multi-instance hosts:

Configure the correct replication mode:

  1. In hieradata/hosts/DBNAME.yaml add: mariadb::binlog_format: 'BINLOG_FORMAT', where BINLOG_FORMAT is:
    1. (TBD: when?) STATEMENT
    2. (TBD: when?)ROW

Add the host to dbctl config (example)

  1. In conftool-data/dbconfig-instance/instances.yaml add - DBNAME in the appropriate location

Add the host to Mediawiki's database loadbalancer configuration (example)

  1. Log on to one of the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet)
  2. Run sudo dbctl --scope DCNAME instance DBNAME edit, where DCNAME is eqiad or codfw
  3. Fill the template:
    1. Set pooled: false
    2. Replacing a failed host?
      1. Yes: mirror the replaced host configuration
      2. No: For now, just ask (TBD: guidelines?)
  4. Run dbctl config commit -m "COMMIT_MESSAGE"

Data population

Cloning

  1. Clone via the sre.mysql.clone cookbook
  2. Start MariaDB
  3. Wait until replication has caught up (all green on Icinga)

Restoring from backup

  1. MariaDB/Backups#Provision_a_precompressed_and_prepared_snapshot_(preferred)

Replication from master

Enabling notifications (example):

  1. In hieradata/hosts/DBNAME.yaml delete profile::base::notifications: disabled

Add it to zarcillo DB

Example host: db1208 going to core s3, lives in eqiad in A5

Zarcillo DB lives in db1215

set session binlog_format=ROW;
INSERT INTO instances (name, server, port, `group`) VALUES ('db1208','db1208.eqiad.wmnet',3306, 'core');
INSERT INTO section_instances (instance, section) VALUES ('db1208','s3');
INSERT INTO servers (fqdn, hostname, dc, rack) VALUES ('db1208.eqiad.wmnet', 'db1208', 'eqiad', 'a5');

Next steps

  1. Ask other team members about schema changes that were executed since the timestamp of the backup you restored the host from and apply them (if needed)
  2. Pool the host in (TBD: link needed)