This page is primarily for WMCS admins for database and server operations on the ToolsDB database cluster that is currently operating in the clouddb-services Cloud VPS project. For general information and background, see https://wikitech.wikimedia.org/wiki/Portal:Data_Services#ToolsDB and https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#User_databases
Re-importing data after replication failures
When the database crashes, bad things can happen. One thing that has happened before is that replication was interrupted in a way that prevented it from just picking up where it left off (see task T253738). When that happens replication filters for the problem tables need to be added until replication is able to continue using either the puppetized config file and restarts of mariadb or by stopping the slave process and setting the global variable Replicate_Wild_Ignore_Table to include more table patterns. The only operation to do on the primary or master server is mysqldump, which can be done from Toolforge bastions and tool accounts just to be on the safe side!
eg. If we were adding "s52861__fnbot.review_filter" to the existing four database filters. On the replica (clouddb1002 usually):
MariaDB [(none)]> stop slave; MariaDB [(none)]> set global Replicate_Wild_Ignore_Table='s51412\_\_data.%,s51071\_\_templatetiger\_p.%,s52721\_\_pagecount\_stats\_p.%,s51290\_\_dpl\_p.%,s52861\_\_fnbot.review\_filter'; MariaDB [(none)]> start slave; MariaDB [(none)]> show slave status \G # To check if replication has errored out again
Any added filters should be committed to the puppet template with appropriate comments for the record.
To get the table working again, you need to work with the user to ensure that no writes are coming to the table the whole time while you resync things.
Then, export the table with something like (in this case dumping the maj_articles_recents table from the totoazero tool):
tools.totoazero@tools-sgebastion-08:~$ mysqldump --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.eqiad.wmflabs s51245__totoazero maj_articles_recents | gzip > T257273-maj_articles_recents.sql.gz
As you can see, this can be done straight from a tool account as long as you move the file to where you can download it and send it to the replica server. You can also do this on the active ToolsDB server (usually clouddb1001) as root, if you like, but you will still need to transfer the resulting file to the replica.
Then you need to ship that resulting gzipped dump to the ToolsDB replica. Once you have scp'd or whatever the file onto the replica server (usually clouddb1002), on that replica server run the following general procedure:
me@clouddb1002 $ sudo -i mysql
MariaDB [(none)]> show slave status \G # Make sure that Seconds_Behind_Master is a numerical value not terribly far from zero--hopefully zero. MariaDB [(none)]> stop slave; MariaDB [(none)]> set global Replicate_Wild_Ignore_Table='s51412\_\_data.%,s51071\_\_templatetiger\_p.%,s52721\_\_pagecount\_stats\_p.%,s51290\_\_dpl\_p.%'; # Remove the table filter that you added earlier from this list. Only remove the one you are fixing now. MariaDB [(none)]> drop table <offending_db>.<offending table>;
Drop out of the mysql shell and restore the table you are working on:
me@clouddb1002 $ gunzip tableback.sql.gz # In the example above it would be 'gunzip T257273-maj_articles_recents.sql.gz' me@clouddb1002 $ sudo -i mysql <user's database> < tableback.sql # In the example above it would be 'sudo -i mysql s51245__totoazero < T257273-maj_articles_recents.sql'
At this point, as long as the user didn't write to that table, you should be in good shape to start replication!
me@clouddb1002 $ sudo -i mysql
MariaDB [(none)]> start slave; MariaDB [(none)]> show slave status \G # Did it blow up? You may have a problem to troubleshoot. Is Seconds_Behind_Master moving to zero when you run this again? If yes, and it reaches zero, then you are done!
Failing over Toolsdb
It will require manual intervention not only to fall back, but to migrate table data (see ToolsDB_Backups_and_Replication). It is often simpler and useful to think of the secondary in the toolsdb cluster as more of a backup of the primary, which is something of a single point of failure. This should be changed in the future.
It is basically a matter of following standard mysql and mariadb switchover processes:
- Set the primary server to read-only mode either by disabling puppet and changing the config with a restart of the service or by changing the config in puppet and then manually restarting the service (puppet does not manage mariadb's service directly). At this point whatever you are doing has user impact, obviously.
my.cnf and under the [mysql] section add:
Then restart the service.
It can also be done directly on MySQL without restarting the service by executing the following command on the MySQL prompt.
set global read_only=ON;
NOTE: This will be lost if MySQL is restarted unless the read-only=ON option is added on the my.cnf as specified above.
- Change the DNS entry (see Portal:Data_Services/Admin/Wiki_Replica_DNS for the procedure -- the record is in the same location as the configuration files for the replicas) to point at the secondary (by default, clouddb1002).
- Stop replication on the secondary in the mysql shell (recording the Relay_Master_Log_File and Exec_Master_Log_Pos values in case you change your mind abruptly). This is generally done by running
SHOW SLAVE STATUS \G.
Make sure to write down the coordinates so you can connect any other host to replicate from this new master if needed, so run the following on the secondary:
SHOW MASTER STATUS\G
Reset all replication on secondary to start clean, from the MySQL prompt:
RESET SLAVE ALL;
To promote this host to master, you need to make the host writable.
my.cnf and remove
read-only=ON from it. You need to restart the service.
Once MySQL is back you can check if the host is writable by running the following from the MySQL prompt.
It should be 0.
The above may not be 100% correct yet -- please edit if not.