MariaDB/Backups

From Wikitech
Presentation summary

Backup and recovery methods

The following types of backups are part of the design:

  • Bi-daily compressed binary backups (snapshots) for disaster recovery (1 month retention)
  • Weekly logical backups for long term recovery (3 month retention)
  • Binlog backups (for point in time recovery/incremental-ish backups) [Implementation ongoing, we have 10+ copies of 30 last days over multiple servers on 2 DCs]
  • Incremental backups for append-only external storage servers [Not implemented yet, lower priority, external store servers are backed up using regular logical backups right now]

Physical servers and services part of, or related to MariaDB backups

Architecture of Wikimedia Foundation infrastructure Database backups (high level overview of services and data and control flows)
  • The many production database servers that are used by Mediawiki and many other analytics, cloud-support and other misc services. These are normally called db[12]* (database) servers, in addition to the larger es[12]* (external storage) servers
    • There are other mysql-focused servers, like pc* and clouddb* (formerly, labsdb*) hosts, but those are not backed up- in the first case, because they are essentially disk cache (data can be lost without losing user data), in the second because they don't contain but a filtered replica of production data
    • es servers follow the same logical backup as metadata servers, although in the future we may change into more optimized incremental backups
    • For a summary of the functionality of the different data servers, see: MariaDB#Sections_and_shards
  • There are specific dedicated backup replicas, whose only function is to mirror the production datasets and provide a source of backups with enough performance and reliability, without affecting the production hosts due to the load and locking on generating backups
  • Bacula service, storing backups (only the logical and incremental) long-term, reading from the provisioning servers and sending it to the bacula storage nodes
  • Provisioning servers: storing backups short term for quick recovery, provisioning and the post-processing needs (e.g. rotation, xtrabackup --prepare, compression, consolidation if many files are created). They, themselves, orchestrate the logical (dumps) backups.
  • Testing databases: db1133 and db2102 (one host per datacenter is procured). They are non-production hosts that regularly recover logical backups and snapshots and set up replication to verify backups are working properly [not yet implemented]
  • Cluster management servers (aka cumin hosts: cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet): provide orchestration, specifically for snapshots, as they require remote execution -beyond the mysql protocol- to execute root commands on the source servers and sending files to the provisioning servers
  • dbbackups, database (currently at m1 section) stores metadata information about the generated backups at generation time, and informs about the current ongoing status as well as the a priori success or failure of backups. It also stores the size and name of each file generated, for further data and backup analysis and trending. An icinga check is also setup there which alerts if the latest fresh backup is older than the configured amount of time.

Software and deployment architecture

WMFBackup is the main class controlling the generation of backups. It is an extensible backup class that at the time can use 3 class-methods:

  • NullBackup: Does nothing
  • MariaBackup: Uses mariabackup utility, fork of XtraBackup just recompiled to support MariaDB specific InnoDB format. MariaBackup/XtraBackup allow for (in theory) lower time to recovery, as it will be as fast as putting the files into the dir and starting mysql. It is, at the time, the chosen method to generate what we call generically "snapshots" (binary or raw backups).
  • MyDumper: Uses mydumper to generate a fast, highly parallel, compressed logical dump. It is, at the time, the chosen method to generate what we call "dumps" (logical backups).

The backup process has 4 main functions:

  • Generate the backup files (e.g run mydumper or mariabackup)
  • Post-process it, with a number of compulsory and optional tasks: check the backup seems complete, prepare it (in the case of mariabackup), consolidate (tar) per database, compress, and rotate to its final location.
  • Generate state and metadata about the backup and the file it generates (by the BackupStatistics class)

Normally the class functionality is controlled by the command line utility backup-mariadb, documented at Backup-mariadb.

backup-mariadb utility and its libraries are deployed on the provisioning host. However, snapshotting requires remote root execution therefore we can only do local mysql snapshots with it.

To generate remote snapshots, the Transfer.py script is being used for the first part of the backup, installed on the cluster management servers (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet). Every day, a cron job runs remote-backup-mariadb script, and send the snapshots to the provisioning hosts using transfer.py. Then, it runs locally on the provisioning host backup-mariadb in order to post-process the generated files and gather the metadata statistics.

transfer.py is a generic utility, installed on the cluster management (orchestration) hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet) to transfer files over the network, but has a switch --type=xtrabackup, that allows also to transmit in a consistent way the mysql files of a live mysql server. Please note that the post-process of backup-mariadb is still needed after transfer.py, as xtrabackup (or mariabackup) --prepare has to be run before recovering the files to a server.

The backups are stored in the following location:

/srv
 /backups
   /dumps
     /ongoing  # ongoing logical backups
     /latest   # latest completed logical backup
     /archive  # recent, but not latest, logical backups (in case the latest has issues), regularly purged
   /snaphosts
     /ongoing  # ongoing mariabackup files
     /latest   # latest completed and prepared mariabackup tarballs
     /archive  # recent, but not latest, mariabackup tarballs (in case the latest has issues), regularly purged
     

Finally, bacula regularly backups the configured path on the chosen active datacenter for long term backups (right now, eqiad) copying only the latest logical backups (/srv/backups/dumps/latest).

Puppet code is distributed into the following profiles:

  • mariadb::backup::bacula [provisioning host] to move logical dumps to long term storage (bacula)
  • mariadb::backup::check [currently on the alerting hosts, cbut could really be run from almost anywhere] icinga checks that backups are generated correctly and fresh
  • mariadb::backup::mydumper [provisioning host] automation of logical backups, and backup and recovery software in general
  • mariadb::backup::snapshot [provisioning host] snapshotting environment
  • mariadb::backup::transfer [orchestration hosts] automation of snapshots

Incrementals and binlogs are not productionized/fully automated yet (although one can find binlogs and manual es backups in several locations).

A summary of the flow of control and data can be seen at the diagram: Database_backups_overview.svg

Logical backups

Mydumper is used for creating logical backups. Not only it is faster to create and recover than mysqldump, it also allows table or even lower level granularity- very interesting for when a single table needs to be restored.

On the host with the profile mariadb::backup::mydumper (provisioning hosts), a configuration file exists on /etc/mysql/backups.yaml that configures what and where and how to backup. Weekly, the script backup-mariadb reads that config file and backups the core and misc sections into /srv/backups/dumps.

The backup host -at the moment dbstore1001 and es2001- only keeps the last or latest backups so they can be sent to bacula, not long term. The backups are generated asynchronously to the bacula director handling because with the old system, it used to block all other backup and recovery processes for a long time. Now bacula only has to retrieve the latest directory and get the latest successful backups.

Each backup is a single directory with the following name:

dump.section_name.timestamp

timestamp is not in ISO format as ':' is a special symbol by some languages, like the command prompt. It is in the following format instead:

YYYY-MM-DD--HH-mm-ss

Each directory contains the same structure than a regular mydumper process generates, except that, if so configured, it can tar all objects of a single database to avoid having hundreds of thousands of files. This currently happens for x1 and s3, which have many thousand of objects- that gets reduced to only ~900, one per database.

To understand the status of the backups, mydumper creates a metadata file that contains the start of the backup process (which the dump should be consistent to, the exact binlog and GTID coordinates for that consistency point, and the time the backup finished. If the backup fails, normally the database dir will not copied to latest and a log file on the ongoing directory with the name of the section will show some kind of error. If for some reason mydumper was successful but the overall process cannot, one can retry the backup with the --only-postprocess option to rotate, consolidate, compress and/or generate statistics.

Orchestration and scheduling

At the moment, logical dumps of all Mediawiki metadata sections, as well as content, happens every week on Tuesday UTC mornings (check puppet for the latest schedule). This is initiated with a cron job on each db provisioning host, that reads the /etc/wmfbackups/backups.cnf on the local dbprov host and controls which sections to backup, with which options.

Adding a new dump

  1. Add EVENT, LOCK TABLES, SELECT, SHOW VIEW, TRIGGER grants to the given backup user for the database backup
  2. Add also FILE, RELOAD, REPLICATION CLIENT, SUPER to the same user on *.*
  3. Make sure, if exists, that the regex filtering backups or not the appropriate objects
  4. Add or otherwise make sure that the host is backed in eqiad and/or codfw on puppet://modules/profile/templates/mariadb/backups-*.cnf.erb. The right location will try to balance the load and disk space used of the several hosts used for backup
  5. Make sure your backups are done as scheduled, by looking at dbbackups db on m1, tables backups and backup_files.

Recovering a logical backup

To recover, there is a script called recover-dump which automates the decompression (if a .tar.gz), untar (if databases consolidated) and runs myloader. If for some reason that wouldn't work, myloader can be called directly (assuming the directory is not compressed, tarred or that has been handled beforehand), or even load individual objects directly (mydumper creates a .sql with the structure and a .sql data file per table).

This is a step by step guide of how to recover a backup.

For this example we will recover: x1 on db1120 from the backup host es2001

Pre-requisites before recovering the backup

  • db1120 must have a MySQL server up and running (if it is a new host follow MariaDB#Setting_up_a_fresh_server to set it up).
  • db1120 must have privileges to let the source host (e.g. dbprov1001) connect, create the databases and tables (and other objects) and import the rows
    • This can be tested with dbprov1001$ mysql -hdb1120.eqiad.wmnet -uroot -pREDACTED

Recovering the data

It is recommended to run the following command in a screen session:

dbprov1001:~# recover-dump --host db1120.eqiad.wmnet --user root --password REDACTED --port 3306 x1
Attempting to recover "dump.x1.2018-07-24--23-30-38"
Running myloader...

Alternatively, if a specific backup has to be recovered, different from the one on latest with the given section, it also accepts absolute paths:

dbprov1001:~# recover-dump --host db1120.eqiad.wmnet --user root --password REDACTED --port 3306 /srv/backups/dumps/archive/dump.x1.2018-07-24--23-30-38

Remember to provide an absolute path to the dir or compressed .tar.gz if using the second format. it doesn't have to be inside /srv/backups/, it can be anywhere where your current user can read.

This will start recovering all the data on db1120 for the whole x1 section. We can check if this is actually working by checking if the data directory on db1120 is growing

db1120:~# du -s /srv/sqldata/

This can take several hours and once finished, es2001 will return to the prompt. And the data is ready on db1120 to be analyzed and processed if needed.

Recovering the data (Mariadb 10.6)

This method should work for both 10.4 or 10.6 (or other versions), but it is the only one that works reliably in 10.6. It is actually faster and fixes some bug in myloader, but it is not properly implemented yet (it is a puppet bash script with very few options).

  • First, create a credentials file under /root/.my.cnf, like this:
 [client]
 host = db1024.eqiad.wmnet
 port = 3306
 user = root
 password = <same password than the user grant added to the server for recovery>

This file has the same options as the mysql command line client, and it is the running mysql server where the data will be sent to.

  • Second, run the command: mini_loader.sh recovery_directory where the only parameter is the mysql dump directory to recover. For example:
 mini_loader.sh /srv/backups/dumps/dump.x1.2018-02-28--01-20-46
  • The recovery will inform of the progress by printing the start and finish of each file processed, or errors detected, like this:
 Starting recovery at 2022-12-02 12:57:56+00:00
 Creating database testwiki...
 Database testwiki created successfully
 ...
 Creating table testwiki.revision...
 Table testwiki.revision created successfully
 Importing data to table testwiki.revision...
 Table testwiki.revision imported successfully
 ...
 Finishing recovery at 2022-12-02 23:27:59+00:00
 Remember to remove /root/.my.cnf to prevent accidental loads
  • The script will remind you to remove /root/.my.cnf after the load, to prevent accidental recoveries after the fact

Enabling replication on the recovered server

If the server needs to be pooled in production, firstly we have to enable replication so it can catch up. First of all we should create the heartbeat table on our server.

CREATE DATABASE heartbeat;
USE heartbeat;
CREATE TABLE `heartbeat` (
 `ts` varbinary(26) NOT NULL,
 `server_id` int(10) unsigned NOT NULL,
 `file` varbinary(255) DEFAULT NULL,
 `position` bigint(20) unsigned DEFAULT NULL,
 `relay_master_log_file` varbinary(255) DEFAULT NULL,
 `exec_master_log_pos` bigint(20) unsigned DEFAULT NULL,
 `shard` varbinary(10) DEFAULT NULL,
 `datacenter` binary(5) DEFAULT NULL,
 PRIMARY KEY (`server_id`)
) ENGINE=InnoDB DEFAULT CHARSET=binary;

And we should now insert a couple of rows with the server_id of the replication chain masters (eqiad and codfw just to be sure). To do so we can check: https://tendril.wikimedia.org/tree and check our section and masters. For x1 our masters are: db1069 and db2034 Let's find out the server_id

root@cumin1001:~# mysql.py -hdb2034 -e "SELECT @@hostname; SELECT @@server_id" -BN
db2034
180355159
root@cumin1001:~# mysql.py -hdb1069 -e "SELECT @@hostname; SELECT @@server_id" -BN
db1069
171966572

Now let's insert those two rows on db1120

USE heartbeat;
SET SESSION sql_log_bin=0;
INSERT INTO heartbeat (server_id) VALUES (171966572);
INSERT INTO heartbeat (server_id) VALUES (180355159);

To do so, we have to gather the coordinates from the backup which are in a file called metadata. In our case the backup is at dump.x1.2018-07-24--23-30-38

es2001:~# cat /srv/backups/latest/dump.x1.2018-07-24--23-30-38/metadata
Started dump at: 2018-07-24 23:30:38
SHOW SLAVE STATUS:
Connection name:
Host: db2034.codfw.wmnet
Log: db2034-bin.000196
Pos: 978121166
GTID:0-171970580-683331037,1-171970580-1,171966572-171966572-191034075,171970580-171970580-596994206,171974681-171974681-198565537,180355159-180355159-13448767,180363268-180363268-40608909
Finished dump at: 2018-07-25 00:32:34

What we need to enable replication:

  • Host
  • Log
  • Pos

Once we have those we can execute the following command on db1120 (the password is at repl-password file on the pw repo)

CHANGE MASTER to MASTER_HOST='db2034.codfw.wmnet', MASTER_USER='repl', MASTER_PASSWORD='REDACTED' ,MASTER_PORT=3306, MASTER_LOG_FILE='db2034-bin.000196', MASTER_LOG_POS=978121166, MASTER_SSL=1;
START SLAVE;
SHOW SLAVE STATUS\G

We should see our Seconds_behind_master decreasing (sometimes it can take a while to see it decreasing)

If we have recovered an eqiad host from a codfw source, once the server has caught up, we need to move it under the eqiad master. In our case, db1120 is replicating from db2034 which is a codfw master, so we need to move them to be at the same level, that is: db1120 must be a sibling of db2034 and not a child. To do so we can use 'repl.pl' script on 'neodymium':

./marostegui/git/software/dbtools/repl.pl  --switch-child-to-sibling --parent=db2034.codfw.wmnet:3306 --child=db1120.eqiad.wmnet:3306

Snapshoting and disaster recovery

While logical backups have a lot of advantages:

  • Small disk footprint (specially if compressed). High compression ratio and no space wasted on indexes or fragmented data.
  • Fast to generate: If enough data is on memory, reads can be very fast, specially if done with enough parallelism
  • Very low granularity on recovery: Because we use one (or several) separate files per table, we can recover single databases, single tables or even individual rows if properly filtered. We can even separately recovered structure and data.
  • Not prone to corruption: A physical copy of file would maintain and make very difficult to detect certain kinds of corruption. Because a logical dump requires reading all rows, once exported, corruption cannot happen unless the exports are themselves corrupted at a later time.
  • Software independency and portability: Because in the end we are just generating text files, the format has great portability. It can be used on different MySQL/MariaDB versions, vendors or even on different database software. Also, they can converted if needed as they are human-readable text/manageable with 3rd party software.

Because many of the above, this format is ideal for long-term preservation.

However, logical dumps have some important weaknesses:

  • They are slow to recover, as they have to be reimported row by row, as well as indexes have to be regenerated again.
  • Taking a dump can create a lot of performance impact both due to the amount of logical reads need, and the state in which they leave the buffer pool afterwards

Because of this, logical backups tend to increase a lot the Time to Recovery in case of a full disaster. That is where snapshots (how we call generically raw or binary backups) come into play: While larger in size, they tend to be faster to recover as they use the native database format- only requiring to shutdown the original server and copy back the files- making it as fast as a regular file copy can be sent over the network. Snapshots are the bases of our "fast" disaster recovery method.

Snapshots can be generated in several ways:

  • Cold backup (shutdown the server and copy the files)
  • lvm snapshots + recovery
  • MySQL Enterprise backup/Xtrabackup

We chose, after some research (task T206204), to use the latest option, specifically with MariaDB's fork of the free software Percona XtraBackup, MariaBackup, which is mostly identical in usage and functionality to XtraBackup, but is compiled with MariaDB and thus supports its internal format better (Xtrabackup started having issues with MariaDB since 10.0).

MariaBackup is one of the supported methods in the WMFBackup Class, which means it can be taken using backup-mariadb command line utility. The main difference between snapshot taking and logical backups is that snapshots requires raw access to the underlying mariadb files as a privileged user in addition to access to mysql itself as a privileged account. Thus, backup-mariadb only is able to create localhost backups by itself (more on remote backups later).

When used, files are copied to /srv/backups/snapshots/ongoing, in what would appear as a complete datadir copy. However, at this point the backup process is not complete- as documented on the XtraBackup documentation page, the backup needs to be prepared before being used. The preparation step of the backup takes care of that before continuing with the rest of post-processing steps. At the moment preparation happens always after backup, but there are reasons to postpone this (e.g. to generate incremental or differential backups, or to export individual tables). This would be possible in the future, but not currently supported.

Special mention requires that one normally would want to compress the final set of files for several reasons: The most obvious is space saving- snapshots are as large as the original datadir. The second reason would be that while normally mydumper can be used to recover individual parts of the database, normally snapshots are used for full recoveries and provisioning only, thus them being pre-compressed speeds up the later full recovery.

On Remote Snapshotting

With the above implementation, local snapshotting would be possible; however a backups is only a backup if it is fully offline. Several options and designs were considered for this- in particular, the possibility of preparing backups locally to the host before sending them away on some way. This was discarded for the following reasons: 1) read access was needed to the datadir, making it difficult to do privilege separation; 2) it is not out of the question that a sever may not have enough space to temporarily store a copy of its database (specially under emergency, which is what snapshotting was intended for) and 3) if preparation was compulsory from the start, it would make incremental/differential backups impossible in the future.

Based on that, database hosts would only run xtrabackup (mariabackup) --backup and then use its streaming capabilities to send its data away, to be prepared on a second step. For that purpose, existing tool transfer.py, used in the past to perform cold backup transfer as well as general file transfer between hosts over the network was modified to allow xtrabackup as the source of data transference. Once transferred to the provisioning host(s), to /srv/backups/snapshots/ongoing, the utility backup-mariadb is run, although with the --only-postprocess option so it finds and treats its, now, local files, but does not attempt to generate a new snapshot.

So, in summary, while logical backups and snapshots share most of the workflow, because of the particularities of creating a remote snapshot, this requires an extra initial step where first files are transferred and taken out of the source hosts (unlike logical backups, which are able to perform that just by using the mysql query protocol).

Orchestration and Scheduling

Because the remote copy mentioned on the previous section, remote execution is needed, something that is not available at the time (and probably never, for security reasons) from the provisioning hosts. This is why the transfer itself has to be initiated from the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet). Probably not surprisingly, cumin is being used for remote execution.

At the moment, snapshotting of all Mediawiki metadata sections happens every 1 or 2 days (4 days a week): new backups are finish at the moment on Mondays, Wednesdays, Thursdays and Saturdays (check puppet for the latest schedule). This is initiated by a systemd timer, that reads the /etc/wmfbackups/remote_backups.cnf yaml configuration files on the cluster management server and controls which sections to backup, with which options, as well as to which server those backups are sent. The systemd timer, rather than directly calling transfer.py and backup-mariadb, uses remote-backup-mariadb simple script, which does the above steps for simplicity. Transfer.py has locking logic to know if to open a new port for each transfer, starting with the number 4400, and up if more than one transfer is happening simultanously.

Remote-backup-mariadb itself will not return error code (!=0) because one or more backup fails to prevent alerts spam. As long as the basic execution and parsing of config and arguments happens correctly, and there are no uncaught exceptions, it will allways return 0. See the monitoring section for how to track backups are happening correctly.

Manpages are available for all executables and config file with more details about available options and their meaning.

Adding a new database to Snapshot

Snapshotting doesn't require any special privileges other than the regular ones for a production mysql, as it is done (and must be done) as root.

To add additional hosts to snapshot, just edit the puppet://modules/profile/templates/mariadb/backups-cuminXXXX.cnf.erb templates so they are run on the provided hosts, in the given order.

Recovering a Snapshot

At the moment, there is no specific utility to fully automate snapshot recoveries. There is, however, a transfer utility (transfer.py --type=decompress) that can automate the initial transfer and decompression from an existing snapshot:

For now, the rest of the setup steps have to be done manually:

  • [from a cumin host]:
transfer.py --type=decompress dbprov1001.eqiad.wmnet:/srv/backups/snapshots/snapshot.s1.2020-09-11--23-45-01.tar.gz db1051.eqiad.wmnet:/srv/sqldata

Remember the transfer will fail if existing data is left on that dir- it should be deleted or moved away first

  • Chown the datadir recursively to be owned by the mysql user
  • Just to be safe:
 systemctl set-environment MYSQLD_OPTS="--skip-slave-start"
  • Start mysql
  • Setup replication based on the GTID coordinates (remember GTID tracks already executed transactions, while binlogs tracks offsets or "gaps between transactions", do not confuse both methods). GTID is normally stored on the xtrabackup_slave_pos file.
CHANGE MASTER TO MASTER_GTID=XXXXXX; START SLAVE;

Binary log backups and point in time recovery

NOT YET IMPLEMENTED

Incremental backups and external storage disaster recovery

NOT YET IMPLEMENTED

Monitoring and metadata gathering

Metadata

On backup generation, an entry is added to the `backups` (and several on the `backup_files`) metadata tables (at the moment, on m1/dbbackups db). It contains the up to date status (ongoing, finished, failed) state of the backup genration and, when finished, some information about it and its generated files (date, size, etc.)

The dbbackups database now has 3 tables:

  • backups
  • backup_files
  • backup_objects (not in use)

backups contain an id and properties about backups (status -ongoing, finished (correctly), failed and deleted- dir name, source, section, start_time, end_time, total_size, etc:

CREATE TABLE `backups` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
  `status` enum('ongoing','finished','failed', 'deleted') COLLATE utf8mb4_unicode_ci NOT NULL,
  `source` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
  `host` varchar(300) CHARACTER SET latin1 DEFAULT NULL,
  `type` enum('dump','snapshot', 'cold') COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `section` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `start_date` timestamp NOT NULL DEFAULT '1970-01-01 00:00:01',
  `end_date` timestamp NULL DEFAULT NULL,
  `total_size` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `last_backup` (`type`,`section`,`status`,`start_date`)
) ENGINE=InnoDB AUTO_INCREMENT=174 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

backup_files contain a backup id and a list of files for that backup and its properties (date, size, name)

CREATE TABLE `backup_files` (
  `backup_id` int(10) unsigned NOT NULL,
  `file_path` varchar(300) CHARACTER SET latin1 NOT NULL DEFAULT ,
  `file_name` varchar(300) CHARACTER SET latin1 NOT NULL,
  `size` bigint(20) unsigned DEFAULT NULL,
  `file_date` timestamp NULL DEFAULT NULL,
  `backup_object_id` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`backup_id`,`file_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

backup_objects will link the backup files to specific objects (tables, databases, triggers, etc.) for further checking (at the moment, this is not filled in). This will be useful when we maintain an inventory of database objects for all servers so we can make sure no objects are left uncopied/have appropriate size, etc.

CREATE TABLE `backup_objects` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `backup_id` int(10) unsigned NOT NULL,
  `db` varchar(100) CHARACTER SET latin1 NOT NULL,
  `name` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
  `size` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `backup_id_db_name` (`backup_id`,`db`,`name`)
) ENGINE=InnoDB AUTO_INCREMENT=6352 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Metadata is collected during generation, by the same script that generates the dumps, backup-mariadb. It connects to the m1 database (db1080:dbbackups at the moment) and logs the information. If the logging fails, the backup should continue, but it cannot log a successful backup without the backups being successful, too.

The metadata database ("statistics" db) is controlled on the backup configuration, on the /etc/wmfbackups/[remote_]backups.cnf, the same place that controls the backups to be done.

For separation, the actual db configuration is setup on a separate file, as configured by the statistics-file option, as it most likely contain user and password details. At the WMF infrastructure, this lives one the /etc/wmfbackups/statistics.cnf file.

Alerting

Icinga checks database backups are generated correctly every week and there is a recent, successful, non-0 sized, no older than 8 days backup package

Icinga checks are run through nrpe on backupmon1001, and are controlled by a hiera setting: /hieradata/role/common/dbbackups/monitoring.yaml

This is used at the moment to alert if backups fail for any reason (icinga checks query m1 database with check-mariadb-backups) and it checks backups are generated correctly weekly and have a reasonable size. Further checks could be added later, but thorough backup testing should be done through proper recovery validation (full recovery to test hardware, starting replication and maybe some checksums/smoke tests).

The test will fail if:

  • The latest backups is older (the end_date of the full backup process, not the moment they are consistent with) than the configured date. At the moment, that is 8 days (7 + 1 day of buffer) for dumps, and 3 (2 + 1) for snapshots. Check Puppet for the latest values used on production
  • If latest backup is not in a finished state (that means it finished without errors, including metadata gathering), it is ignored. The check, as part of the backup, includes a check of one of the files last being generated in the backup being present, aside from the exit code and no errors on logs (warnings are accepted).
  • The backup does not run at all, or it runs but it cannot insert its metadata on the backup tables
  • The backups is smaller than a certain size (1MB min_size), so it is not a 0 or a few bytes in size.
  • The backup is about the same size than the previous one finished (±15% for a crit, ±3% for a warning) - so something weird, like an explosion in growth, or large missing/deleted datasets are detected.
  • There is at least 2 last correct backups (it warns if there is only 1, as it cannot check the size is correct, for human checking the first backup generated correctly)

Backup checks are configured separately than backup runs- this in on purpose, so backups are not deleted by accident (or a bug) and not noticed. This can lead to inconveniences, to be revisited later.

Metadata gathering into the database is allowed to fail (it is a non-fatal error), letting the backup process continue. This means a backup can be successful, but as no metadata has been gathered, it will show as a failure (false positive). A false negative shouldn't occure. If the backup finished correctly, one can manually run its metadata by rerunning the backup with backup-mariadb --only-postprocess option, which will analyze the files and write them to the database, once the problem is corrected (grants, database down, network down, bug, etc.).

More details can be seen in the source code of the check:

Dashboard

Home page
Backup job details

To more easily check the status of backups and check files, jobs running, failed backups, etc. there is a work in progress dashboard for easier check.

To access it (currently it is not publicly deployed) one has to run:

ssh -L 8000:localhost:8000 backupmon1001.eqiad.wmnet

So you forward your local port 8000 to backupmon1001.eqiad.wmnet's 8000, where currently it runs from an unprivileged user in debug mode. Then open your browser at:

http://localhost:8000

The code for the dashboard rests on GitLab.

Backup testing

NOT YET IMPLEMENTED

Backups quick cheatsheet

This is WIP

The assignment of each backup and its preprocessing server can be located on Puppet at /modules/profile/templates/dbbackups.

  • dbprov* contain metadata and misc db backups
  • backupXXX2 contain content db backups

Dumps are stored long term on Bacula.

Provision a precompressed and prepared snapshot (preferred)

  • Once the snapshot has been generated go to one of the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet)
transfer.py --type=decompress dbprov2002.codfw.wmnet:/srv/backups/snapshots/latest/xxxx.tar.gz DESTINATION.FQDN:/srv/sqldata
  • Once the data has been copied over successfully, ssh to the host.
systemctl start mariadb
cat /srv/sqldata/xtrabackup_slave_info | grep GLOBAL | mysql
  • Now from the mysql prompt of that host (or from cumin) configure the replication thread:
CHANGE MASTER TO MASTER_HOST='FQDN', MASTER_USER='<user>', MASTER_PASSWORD='<pass>', MASTER_SSL=1, master_use_gtid = slave_pos;
start slave;

Copy data from a backup source to a host

You should not use this method unless there is a good reason to (it is much slower, computational intensive and error prone):

  • Stop replication
  • Gather the coordinates
  • From one of the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet): Run transfer.py with type xtrabackup and using: SOURCE_HOST:SOCKET DESTINATION_HOST:DATA_DIR (if the transfer will not happen within the same datacenter please remove --no-encrypt):
transfer.py --type=xtrabackup db1140.eqiad.wmnet:/run/mysqld/mysqld.x1.sock db1127.eqiad.wmnet:/srv/sqldata
  • Once the transfer is done, you must prepare the backup before starting mysql. Ssh to the host:
xtrabackup --prepare --use-memory=300GB --target-dir=/srv/sqldata

Prepare must run on a mariabackup version newer or equal to the source server version, and preferably of the same major version.

Rerun a failed backup

It is not uncommon for a backup run to fail. This is because even if the backup process itself was 100% reliable, there can be external factors that make impossible for the backup to finish correctly (especially during the resource intensive process which is a large copy), such as:

  • Network goes down during the backup process, making it impossible to continue
  • A backup source db server suddenly has a lot of load; or maintenance (like a schema change) happens
  • The copy process is so intensive that network gets saturated and errors so much that cannot continue
  • A server process or its hardware fails in the middle of the copy

Snapshots (xtrabackup) are more likely to fail due to temporary unavailability than mydumper (as it involves 3 servers: the backup scheduler, the database and the storage; plus it requires 10x the amount of bandwidth). For this reason, after the batch of xtrabackup backups is finished, a new attempt to copy is done once again. After it has failed twice, no new attempts happen automatically (to prevent infinite loops and taking over all resources).

As of March 2022, failed backups are not cleaned up automatically for 2 reasons: allowing later debugging and saving the data, in case it could be reused later with human intervention. dbprov hosts should have enough space to accommodate this.

Failed backups do not immediately alert, they only log it to the server logs and to the database backups metadata db. Only if backups fail constantly will an alert be generated, as that backup will be detected as stale.

Rerunning a failed snapshot

  • Research why the snapshot failed in the first place. Just rerunning it without looking at the logs may waste your time (e.g. if the issue is a lack of space). Confirm there is a reason that has been corrected (a software bug, service down, network issue, etc.). The most common issue with snapshots is network errors, as snapshots tend to use a lot of bandwidth. To debug issues one can look at:
    • The icinga alert called "snapshot of [section] in eqiad/codfw": alerting of violation of freshness expectations
    • The database of the database backups metadata (dbbackups) in m1 (db1128 for all datacenters), useful to know where the backup was running: dbbackups> SELECT * FROM backups WHERE section='[section]' and type='snapshot' ORDER BY id DESC LIMIT 10;
    • The log of the software automation in the server executing the backup (cuminXXXX): journalctl -u database-backups-snapshots) or if the host has been recently restarted: grep database-backups-snapshots daemon.log*
    • The partial database itself (if it is empty, what had been backed up so far, etc.) on dbprovXXXX:/srv/backups/snapshots/ongoing
  • If the transfer of files was ok, and the only thing that failed was the xtrabackup --prepare or any of the preparation steps (e.g. compression, write permission error), one can rerun exclusively those by executing:
sudo chown -R dump: /srv/backups/snapshots/ongoing/snapshot.s9.2022-04-01--09-31-45
sudo -u dump backup-mariadb /srv/backups/snapshots/ongoing/snapshot.s9.2022-04-01--09-31-45 --type=snapshot --rotate --compress --retention=4 --only-postprocess --stats-file=/etc/wmfbackups/statistics.ini
  • In all cases, one can always generate a new backup from zero. The easiest way is to run from the cumin host on the local dc:
sudo remote-backup-mariadb s9
Where s9 is the section (or list of sections, separated by spaces to retry). To retry all sections (do a full manual run): sudo remote-backup-mariadb all
  • Cleanup: if there are leftover files from previous failed or partially completed backups, and those will no longer be useful for debugging, they can (have to) be removed manually. Make sure when deleting files you don't accidentally delete "good" ones e.g. cd to the ongoing file so you don't affect latest/archive ones.

Rerunning a failed dump

  • Research why the dump failed in the first place. Just rerunning it without looking at the logs may waste your time (e.g. if the reason is a lack of space). Confirm there is a reason that has been corrected (a software bug, service down, network issue, etc.). To debug issues one can look at:
    • The icinga alert called "dump of [section] in eqiad/codfw": alerting of violation of freshness expectations
    • The database of the database backups metadata (dbbackups) in m1 (db1128 for all datacenters), useful to know where the backup was running: dbbackups> SELECT * FROM backups WHERE section='[section]' and type='dump' ORDER BY id DESC LIMIT 10;
    • The log of the software automation in the server executing the backup (dbprovXXXX:/var/log/mariadb-backups/backups.log
    • The mydumper log, found on the /srv/backups/dumps/ongoing/dump_[section].log file
    • The partial database itself (if it is empty, what had been backed up so far, etc.) on dbprovXXXX:/srv/backups/dumps/ongoing
  • If the mydumper output itself was ok (e.g. permission error, or something else in the postprocessing job), one can continue the backup process with the --only-postprocess on an existing backup, which will only archive, rotate and analyze the files without generating a new backup, like this in a screen session:
sudo -u dump backup-mariadb /srv/backups/dumps/ongoing/dump.es5.2022-05-03--14-34-46 --only-postprocess --type=dump --rotate --retention=8 --stats-file=/etc/wmfbackups/statistics.ini
This also works if the backup was completed, but we want to regenerate its statistics again
  • In all cases, one can always generate a new backup from zero. The easiest way is to edit the configuration file (/etc/wmfbackups/backups.cnf) and only leave the backups one wants to retry, then execute in a screen session:
sudo -u dump backup-mariadb
A new dump can take 2-3 hours for a metadata database, and up to 24 hours for a full ES server.
  • Cleanup: if there is leftover files from previous failed or partially completed backups, and those will no longer be useful for debugging, they can (have to) be removed manually. Make sure when deleting files you don't accidentally delete "good" ones- e.g. cd to the ongoing file so you don't affect latest/archive ones.

See also