Data Platform/Systems/Analytics Meta
The Analytics Mysql Meta instance is a Mariadb database currently running on two dedicated hosts:
an-mariadb1001
an-mariadb1002
It hosts some databases that are required by the Analytics Cluster and related subsystems such as Hive, Druid, Superset, and DataHub.
At present an-mariadb1001
is the primary server for the database and an-mariadb1002
is a real-time replica. There is not yet any facility for high-availability and failover to the replica host is a manual operation.
The databases are replicated to a backup replica (db1208
) from where weekly logical backups are created as part of the normal MariaDB/Backups process.
Databases
MariaDB [(none)]> show databases;
+-------------------------+
| Database |
+-------------------------+
| airflow_analytics |
| airflow_platform_eng_v2 |
| airflow_research |
| database |
| datahub |
| druid |
| druid_public_eqiad |
| hive_metastore |
| hue |
| information_schema |
| mpic |
| mpic_next. |
| mysql |
| oozie |
| percona |
| performance_schema |
| search_airflow |
| superset_production |
| superset_staging |
+-------------------------+
Dependent services
The Mariadb instance collects metadata about the following systems:
- Both Druid clusters (Analytics/Systems/Druid)
- Superset production and staging services (Analytics/Systems/Superset)
- Hive Server2 and Metastore (Analytics/Systems/Cluster/Hive)
Maintenance
Please remember that any maintenance to the database will need to keep into consideration that we have replication to db1208.
Backup
There are currently two backup workflows currently running:
- All the above database are replicated to
an-mariadb1002
. - All the above databases are replicated to
db1208
, see Analytics/Systems/DB_Replica.db1208
dumps are taken periodically into the db provisioning/recovery hosts (dbprov1003), which are stored long term into Bacula.
Restore a Backup
There are multiple ways to restore data:
- Analytics/Systems/DB_Replica shows that we have a replica on db1208 and bacula backups. This will be decommed soon so we shouldn't rely on it.
- MariaDB/Backups#Pre-requisites before recovering the backup and below shows how to do it from dbprov hosts, otherwise it is always possible to run mysqldump or xtrabackup from the replica on db1208.
- We keep 14 days of binlog, that means being able to recover to a specific transaction in time if needed. See https://dev.mysql.com/doc/refman/8.0/en/point-in-time-recovery-binlog.html.
Let's say that we have a database on analytics-meta that got modified in an unexpected way, for example the superset_production
db upgraded to a new version of Superset that ended up to be not working. We cannot rely on the db version stored on db1208 (the replica) since it gets updated via mariadb slave few seconds after its master.
Check on dbprov100x when was the last analytics-meta dump taken.
In puppet you should find the dbprov host hosting the analytics-meta dumps. Currently it is dbprov1003:
elukey@dbprov1003:~$ sudo ls /srv/backups/dumps/latest
dump.analytics_meta.2020-10-06--02-08-27
elukey@dbprov1003:~$ sudo cat /srv/backups/dumps/latest/dump.analytics_meta.2020-10-06--02-08-27/metadata
Started dump at: 2020-10-06 02:08:27
SHOW MASTER STATUS:
Log: db1208-bin.000150
Pos: 685545104
GTID:0-171971944-198484076,171974727-171974727-7482
SHOW SLAVE STATUS:
Connection name:
Host: an-mariadb1001.eqiad.wmnet
Log: analytics-meta-bin.017481
Pos: 10114597
GTID:0-171971944-198484076,171974727-171974727-7482
We have some good info from the metadata file: name of the binlog file and position of the last transaction recorded in the dump. The next step is to find all the transactions happened between when the dump was taken and the last known good state of the database.
Make a list of binlogs that were created AFTER the binlog registered in the metadata file
In this case, on db1108 we have to check all the binlogs in the mariadb's data dir after db1108-bin.000150:
elukey@db1208:/srv/sqldata.analytics_meta$ ls -lht db1108-bin.[0-9]*
-rw-rw---- 1 mysql mysql 969M Oct 12 14:51 db1208-bin.000160
-rw-rw---- 1 mysql mysql 1001M Oct 12 00:52 db1208-bin.000159
-rw-rw---- 1 mysql mysql 1001M Oct 11 08:59 db1208-bin.000158
-rw-rw---- 1 mysql mysql 1001M Oct 10 17:12 db1208-bin.000157
-rw-rw---- 1 mysql mysql 1001M Oct 10 01:00 db1208-bin.000156
-rw-rw---- 1 mysql mysql 1001M Oct 9 10:37 db1208-bin.000155
-rw-rw---- 1 mysql mysql 1001M Oct 8 21:15 db1208-bin.000154
-rw-rw---- 1 mysql mysql 1001M Oct 8 06:24 db1208-bin.000153
-rw-rw---- 1 mysql mysql 1001M Oct 7 14:53 db1208-bin.000152
-rw-rw---- 1 mysql mysql 1001M Oct 6 23:27 db1208-bin.000151
-rw-rw---- 1 mysql mysql 1001M Oct 6 07:35 db1208-bin.000150
-rw-rw---- 1 mysql mysql 1001M Oct 5 16:17 db1208-bin.000149
-rw-rw---- 1 mysql mysql 1001M Oct 5 01:12 db1208-bin.000148
-rw-rw---- 1 mysql mysql 1001M Oct 4 09:57 db1208-bin.000147
-rw-rw---- 1 mysql mysql 1001M Oct 3 18:55 db1208-bin.000146
-rw-rw---- 1 mysql mysql 1001M Oct 3 04:53 db1208-bin.000145
[..more..]
From the above ls, it looks that we have 11 binlogs to check, from 000150 to 000160.
Find the last good transaction from the binlog
We should now find the last transaction for the superset_production db before the problem occurred (basically the last good known state). Something very quick and easy to spot the right file could be to start from the latests, dump their content and see the first one containing alter table statements.
elukey@db1208:/srv/sqldata.analytics_meta$ sudo mysqlbinlog --base64-output=decode-rows --database superset_production db1208-bin.000160 | grep -i alter
ALTER TABLE dashboard_email_schedules ADD COLUMN slack_channel TEXT
ALTER TABLE slice_email_schedules ADD COLUMN slack_channel TEXT
ALTER TABLE row_level_security_filters DROP FOREIGN KEY row_level_security_filters_ibfk_3
ALTER TABLE row_level_security_filters DROP COLUMN table_id
ALTER TABLE query ADD COLUMN ctas_method VARCHAR(16)
ALTER TABLE dbs ADD COLUMN allow_cvas BOOL
ALTER TABLE dbs ADD CHECK (allow_cvas IN (0, 1))
ALTER TABLE dbs DROP COLUMN perm
You should be able to find a position (end_log_pos) in the binlog right before the alters, record it and proceed with the next step.
Use mysqlbinlog to create a dedicated recovery file
Every binlog has its own range of positions, that may overlap, so you should not apply --start and --stop position to all of them to avoid loosing data. Let's assume that the end_log_pos found in the previous step is 70620843, found in db1108-bin.000160. The procedure then should be:
sudo mysqlbinlog --database superset_production --start-position 685545104 db1208-bin.000150 >> recovery_binlog_superset_production
sudo mysqlbinlog --database superset_production db1108-bin.000151 ... db1208-bin.000159 >> recovery_binlog_superset_production
sudo mysqlbinlog --database superset_production --stop-position 70620843 db1208-bin.000160 >> recovery_binlog_superset_production
Recover the database
At this point you should be able to use the guide outlined earlier on to restore the last dump of the database needed, and then apply the recovery_binlog simply with cat recovery_binlog_superset_production | sudo mysql -d superset_production
Failover
This procedure should be used only if an-mariadb1001
is not available for some reason, for example unrecoverable hardware failure.
There is no automatic failover, only a manual one (described by this procedure). The high level idea is the following:
- Check what daemons are running on
an-mariadb1001
vsan-mariadb1002
. - Ssh to db1208 and dump the status of replication to a file (
sudo mysql -S /run/mysqld/mysqld.analytics_meta.sock "show slave status\G" > replication_status
) - Then stop replication on the Analytics Meta's mysql instance (
sudo mysql -S /run/mysqld/mysqld.analytics_meta.sock
andstop slave; reset slave all
). - Do the same on an-mariadb1002.
- Look for all occurrences of an-mariadb1001 in puppet and figure out the ones that need to be moved to an-mariadb1002. A high level list is:
- hive metastore mysql configuration (hive-site.xml)
- druid clusters configuration (druid common config, in hiera)
- superset configs (superset_config.py)
- Merge the above change and run puppet across the hosts, you'll likely need to roll restart the impacted daemons.
- At this point, you should have recovery.
- Important: Bacula backups are happening weekly and they may interfere with daemons using the database, due to locking etc.. In order to temporary disable backups, comment the lines related to analytics meta in https://gerrit.wikimedia.org/r/c/operations/puppet/+/617650/1/modules/profile/templates/mariadb/backup_config/dbprov1002.cnf.erb