MariaDB/Clone a host

From Wikitech

Derived from MariaDB/Provisioning a host

Good to know before starting

Please be sure to be aware of the following page(s):

Host preparation

Here is an overview of what a host depends on:

arrow hell

Downtime the hosts

The following chained command will run downtime cookbook to prevent $source_server and $destination_server from notifying issues

cookbook sre.hosts.downtime $source_server --days 2 --task-id T$Phabricator_task_id --reason "provisioning - T$Phabricator_task_id" \
&& cookbook sre.hosts.downtime $destination_server --days 2 --task-id T$Phabricator_task_id --reason "provisioning - T$Phabricator_task_id"

Either open https://alerts.wikimedia.org/ and look for your host or follow this URL, and replace $source_host and $destination_host. It can be done after clicking on the URL by double clicking on the filter on top of your screen

Move the host in the right groups with the right role

  1. In manifests/site.pp:
    1. Add the DBNAME to the appropriate regexps
    2. Remove the insetup role from DBNAME
  2. In hieradata/hosts/$destination_host.yaml add the configuration content of hieradata/hosts/$source_host.yaml . You can check this example if needed.

Add the host to dbctl config (example)

In conftool-data/dbconfig-instance/instances.yaml add - DBNAME in the appropriate location and then run dbctl commit command on cumin

Fetch informations on the host and its cluster

On Zarcillo

select `section` from section_instances where instance like '%$source_host_shortname%
select `group` from instances where server like '%source_host_shortname%'

On Netbox

Look for the destination server FQDN and set aside its Rack and DC Slug

On the MySQL cluster

sudo mysql -e 'show slave status\G' | grep -iE 'master'

and get the FQDN of the cluster source server

Cloning

Clone via the sre.mysql.clone cookbook in a tmux:

sudo cookbook sre.mysql.clone --source $source_server --target $destination_server --primary  $cluster_replication_source

Monitor the run output

After cloning

Once replication is all caught up, you can validate that everything is going OK with:

$ sudo mysql -e 'show slave status\G' | grep -iE 'run|secon|slave|master|pos'

And then enable notifications back by removing profile::base::notifications: disabledline in your hieradata/hosts/$destination_host.yaml

Add it to zarcillo DB

Zarcillo DB lives in db1215. For instance, this INSERT:

set session binlog_format=ROW;
INSERT INTO instances (name, server, port, `group`) VALUES ('db1208','db1208.eqiad.wmnet',3306, 'core');
INSERT INTO section_instances (instance, section) VALUES ('db1208','s3');
INSERT INTO servers (fqdn, hostname, dc, rack) VALUES ('db1208.eqiad.wmnet', 'db1208', 'eqiad', 'a5');

After a few days of uptime

If required (i.e. upon decomissionning) and if everything works properly, we can now depool our old host:

sudo dbctl instance $source_host depool

We will also repool the destination host with the right script which has to be executed in a tmux/screen from cumin:

./repool $destination_host 'Host warmup' 15 30 45 60 75 90 100

each value at the end of the command line is a "step" and a traffic percentage to repool with