Fundraising/techops/procedures/services-mariadb clone

From Wikitech

MariaDB Cloning Process

Every so often, we need to clone a database to another host. This could be to rebuild a machine after hardware failure, or to build out a new database replica. Here we'll walk through the steps needed to do so.

In the examples, we will use frsource1001 as the source of cloning, frdest1001 as the host receiving the updated data, and frprimary1001 as the primary/master database. Additionally, all shell commands are prefaced with a # and all mysql commands are prefaced with --. This is designed as a safety mechanism in case they are copy/pasted straight in. Make sure to exclude those characters if you wish for the commands to execute.

Prep the destination host (frdest1001)

We first need to prep the destination hots so that it is ready for the SQL data. To do that, we want to ensure that mariadb/mysql is stopped and that the SQL data directory is empty.

On frdest1001 as root:

   # stop mysql
   # rm -rf /srv/mysql/*

Cloning the data across

At this point, we will have steps that will need to happen on both frsource1001 and frdest1001. Most steps need to happen chronologically and will be listed her as such. Our cloning process will be using tar over netcat. This tends to have the highest throughput when doing transfers. One thing that must be taken into account with this process is that there is no encryption used. Please ensure that the data you are transferring is encrypted or that you have encryption on the links between the hosts if they pass outside of a data center or secure environment. We are doing this transfer on port 3306 since that is already open in central and host based firewalls between the hosts that run dbs.

Gather mysql replication information from source

On frsource1001:

   # mysql
     -- stop slave
     -- show slave status \G
        /* Capture the output of the replication status specifically master log name, exec master log position */
        /* This will be used to re-establish replication on the dest host after cloning */
     -- exit
   # sudo systemctl stop mysql

Prepare frdest1001 to receive the data

This should be done in a screen/tmux session in case of any interruption in your network connection.

On frdest1001 as root:

   # screen -S mysql_clone
   # cd /srv/mysql
   # nc -l -p 3306 | tar xvBpf -

We should now have a process listening on port 3306 that will output a tar archive sent to it onto disk.

Send the data across from frsource1001 to frdest1001

You will need to verify the list of databases and files you wish to clone since it may vary depending on the database and it's purpose. This should be done in a screen/tmux session in case of any interruption in your network connection.

On frsource1001 as root:

   # screen -S mysql_clone
   # cd /srv/mysql
   # tar cvBpf - aria_log.00000001 aria_log_control ibdata1 civicrm drupal fredge fruec geonames mysql performance_schema smashpig sys master.info | nc -q 100 frdest1001 3306

You should now see tar output in both screen sessions if the transfer is flowing.

Wait for the data to clone across

This can take quite some time depending on the databases, hosts, and network distances involved. Grab a cup of tea and check in regularly.

After the clone is complete

Restart mariadb/mysql on the source host

Once we are certain the data is across, we generally want to get the source host back online and caught up with replication. This way it can be put back into service as quickly as possible.

   # sudo systemctl start mysql
   # mysql
     -- show slave status
        /* verify replication is stopped */
     -- start slave
     -- show slave status
        /* verify replication has restarted and, over some successive runnings of the command, it is catching up */
     -- exit

Finally, we will await replication catchup. This could take some time depending on how busy the database is and how long the clone process took. You can verify the progress by looking at Seconds_Behind_Master and verifying that it decreases to 0. You can also look at the Replication Lag panel on the host's MySQL monitoring page.

Set destination to not start replication on start

[] verify on disk permissions
[] set skip_slave_start in my.cnf
[] start mariadb/mysql

Re-establish replication with primary database

[] log in to mysql
[] reset slave (shouldn't be set at this point anyway)
[] execute change master command
     -- change master to MASTER_HOST = 'frprimary1001.frack.eqiad.wmnet', MASTER_LOG_FILE = 'frprimary1001-bin.XXXXXX', MASTER_LOG_POS = YYYYYYYYY, MASTER_USER='repl', MASTER_PASSWORD='ZZZZZZZZZ', Master_SSL = 1, MASTER_SSL_CA = '/etc/mysql/cacert.pem', MASTER_SSL_CERT = , MASTER_SSL_KEY = , MASTER_SSL_VERIFY_SERVER_CERT = 1;
[] verify master command set properly
     -- show slave status \G
[] start slave
[] verify replication has started
[] await replication catchup

Cleanup

After we are all set, we need to clean up after ourselves. This usually just means exiting out of any screeen/tmux sessions we still have open on frsource1001 and frdest1001. We'll also want to delete any copy we had of the replication status (from show slave status \G) that we may have captured to disk.