Data Platform/Systems/DB Replica
The Analytics DB replica is the host running with the role::mariadb::misc::analytics::backup
At the the time of writing, this is db1208.
The host runs a mariadb instance for every database host to replicate; to see the config, check profile::mariadb::misc::analytics::multiinstance
in puppet.
Start/Stop replicas for maintenance
If you have to do maintenance (like rebooting, changing ports to the mysql instances, etc..) you'll need to stop replication first. You can do easily with:
sudo mysql -S /run/mysqld/mysqld.analytics_meta.sock # or any other socket/instance combination of course
set session sql_log_bin=0;
STOP SLAVE;
# do maintenance;
START SLAVE;
Check status of replication
sudo mysql -S /run/mysqld/mysqld.analytics_meta.sock # or any other socket/instance combination of course
set session sql_log_bin=0;
show slave status \G;
Pay attention to:
- Slave_IO_Running: if not "Yes" something is not working.
- Slave_SQL_Running_State: if the above is not "Yes", it should contain the reason of the problem.
Replicated databases
There are two instances on db1208
currently running, that replicate:
- the
piwik
database frommatomo1002
(for Matomo) - the
druid druid_public_eqiad hue search_airflow superset_production hive_metastore
databases froman-mariadb1001
(for Analytics Meta)
Add a new database to replicate
In this case, there are two options:
- There is a new db host to replicate, if so follow "Set up replication" later on.
- A new db is added to a db host that already holds a replica instance on db1108 (for example - Analytics Meta). In this case the new DB should be replicated correctly without any problem. Please remember that things like grants etc.. are as well replicated, so you need to be sure that things like the user are created/replicated as well beforehand.
Set up replication
This procedure assumes that:
- A new mariadb instance is running on db1208.
- The mariadb instance is configured correctly (see User:Elukey/Analytics/DBs#First_run_of_MariaDB for example).
The procedure to follow for each new mysql instance running on the replica is the following:
- On the master, execute
mysqldump --single-transaction --master-data=2 -u $USER -p $PASSWORD -h $host --databases $DATABASE1 $DATABASE2 etc.. >> file.sql
. Note that usually we connect via UNIX socket on localhost as root to have full admin privileges, so the user/password parameters can be removed when connecting as root@localhost (typically via sudo mysql etc..). - The
file.sql
dump should contain in its header something like:CHANGE MASTER TO MASTER_LOG_FILE='matomo1002-bin.000023', MASTER_LOG_POS=29517516;
. Keep a note about it since it will be needed later on. - Ship the file from the master to db1108. This can be done in several ways, the quickest one is with transfer.py on cumin1001.
- On the replica, execute
cat file.sql | sudo mysql -S /run/mysqld/mysqld.$INSTANCE.sock
($INSTANCE is the name that you set up in puppet). - On the replica, run
sudo mysql_upgrade -S /run/mysqld/mysqld.$INSTANCE.sock
- On the replica, log in as root to the instance that you are configuring (
sudo mysql -S /run/mysqld/mysqld.$INSTANCE.sock
) and then run:change master to master_host='an-coord1001.eqiad.wmnet', master_user='$REPLICATION_USER', master_password='$PASSWORD', master_port=3306, MASTER_SSL=1, master_log_pos=SEE-PREVIOUS-STEP, master_log_file='SEE-PREVIOUS-STEP';
(master_user and master password can be found in the puppet private repo looking for class passwords::analytics)start slave;
show slave status \G;
(make sure that you seeSlave_IO_Running: Yes
)STOP SLAVE; CHANGE MASTER TO MASTER_USE_GTID=Slave_pos; START SLAVE;
show slave status \G;
(make sure that you seeSlave_IO_Running: Yes
)
- Done!
Bacula backups
There is a weekly bacula backup job that pulls the databases list above to Bacula. You can check what gets saved on dbprov1002:
root@dbprov1002:/srv/backups/dumps/latest/dump.matomo.2020-07-31--07-54-57# ls
metadata piwik.piwik_archive_blob_2019_01.sql.gz piwik.piwik_archive_numeric_2017_07-schema.sql.gz
piwik.piwik_access-schema.sql.gz piwik.piwik_archive_blob_2019_02-schema.sql.gz piwik.piwik_archive_numeric_2017_07.sql.gz
[..]
Why weekly? If we need to recover anything that happened a day before, we'll have 14 days of binlog on the master and replica database hosts. See https://dev.mysql.com/doc/refman/8.0/en/point-in-time-recovery-binlog.html