Nova Resource:Deployment-prep/Databases
The beta cluster currently (May 2021) has a pair of MariaDB instances in a fairly standard master-replica setup running Buster and MariaDB 10.4.
Creating new instances
Derived from Setting up a MySQL replica and lessons learned from task T216067, slightly adjusted from task T276968. Still may be out of date, the last phabricator ticket may be useful.
- Create a new
g3.cores8.ram16.disk20
instance in horizon named deployment-dbNN where NN is an unused number, add a large enough Cinder volume and mount it to /srv, get its puppet certs signed, etc.- TODO: determine if the VM needs that many resources. We used cores8.ram16 on g2 since we needed the extra disk, Cinder is now a thing.
On New Replica
- set
mariadb::config::basedir: /opt/wmf-mariadb106
in hiera, adjust the version number based on the used MariaDB version (this may be wrong, best to check existing instances on the same distro version) - ensure puppet runs cleanly
- then disable puppet so that it will not start up mariadb until you are ready
/opt/wmf-mariadb106/scripts/mysql_install_db --user=mysql --basedir=/opt/wmf-mariadb106 --datadir=/srv/sqldata
cd /srv/sqldata
nc -l -p 9210 | mbstream -x
On Existing DB
Make sure the db is read only.
- If cloning from the primary, on that host:
- do
FLUSH TABLES WITH READ LOCK;
- now
SET GLOBAL read_only = OFF;
- check that there are no long running connections still around that may be writing (
SHOW FULL PROCESSLIST;
works for this); the read_only variable only applies to new connections
- do
- If cloning from a replica, on that host:
- verify that read_only is OFF by doing
SHOW GLOBAL VARIABLES LIKE 'read_only';
- if not, first off, there's a problem with the replica, but also, do
SET GLOBAL read_only = OFF;
- verify that read_only is OFF by doing
Ensure that replication is stopped.
- On the primary, run
STOP ALL REPLICAS;
- Verify that replication has stopped by running on the primary,
SHOW ALL REPLICAS STATUS;
and check that no connections are listed
Run mariabackup/innobackupex
mariabackup --innobackupex --stream=xbstream /srv/sqldata --user=root --host=127.0.0.1 --slave-info | nc NEW-SERVER 9210
- if you see errors like "InnoDB: Operating system error number 24", this indicates too many open files; you may need to edit /etc/my.cnf to add in the
mysqld
section,open-files-limit = 4094
and perhaps check the ulimit too.
On New replica, After mariabackup
Do this before starting mysql!
mariabackup --innobackupex --apply-log --use-memory=10G /srv/sqldata
chown -R mysql: /srv
systemctl start mariadb
- tail the error log in /srv/sqldata and check for any errors
- or maybe: do
journalctl -n 20 -u mariadb.service
and check that for errors - if you see some sort of field definition error like "Incorrect definition of table mysql.event: expected column (some field) to have type (some type)" etc, see the step below.
- If this is newer version of MariaDB, run
/opt/wmf-mariadb106/bin/mysql_upgrade --host=127.0.0.1
to ensure compatibility. You might need to use the--force
option. Verify that mariadb starts up with no errors by stopping and restarting it.
Starting Replication
- Check /srv/sqldata/xtrabackup_slave_info on the new host, it provides a partial CHANGE MASTER statement based on whichever host the target was replicating from. If this is correct, use this, adding MASTER_HOST and the repl account/password portions that are missing.
- e.g.
CHANGE MASTER to MASTER_USER='repl', MASTER_PASSWORD='...', MASTER_PORT=3306, MASTER_HOST='deployment-db1', MASTER_LOG_FILE='deployment-db1-bin.000026', MASTER_LOG_POS=191406295;
- The master repl password should be in the file /var/lib/git/labs/private/modules/secret/secrets/mysql/repl_password on the deployment-puppetmaster (currently
deployment-puppetmaster04
) BUT it currently is in /home/ladsgroup/repl_password on db11 while a better final location is worked out.
- e.g.
- If you ran the hot backup from the master, use the information in /srv/sqldata/xtrabackup_binlog_info as slave_info won't exist.
- If you copied from the secondary master, xtrabackup_slave_info will point to the active master; use xtrabackup_binlog_info instead.
Serving traffic from the new replica
If you cloned from the primary, you should have set read_only to OFF on the primary; now set it to ON so that writes can resume.
You'll need to modify the wmf-config/db-labs.php file and add the new host. Give it the same weight as the old replicas. Example patch: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1006593
Read only
Per phab:T110115 all database servers default to read only when they start. This is expected for the replicas, but the primary server needs to be set to read-write after making sure it is ok (nothing worrying in logs, replication looks fine). This can be performed by sshing into the current primary which can be found from mediawiki-config repo.
Remember to !log any actions on #wikimedia-releng connect and check before setting it to read only if someone is doing maintenance which expects it to be read only, on the same IRC channel.
taavi@deployment-db07:~$ sudo -i mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 329
Server version: 10.4.18-MariaDB-log MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
root@BETA[(none)]> select @@read_only;
+-------------+
| @@read_only |
+-------------+
| 1 |
+-------------+
1 row in set (0.000 sec)
root@BETA[(none)]> set global read_only = false;
Query OK, 0 rows affected (0.000 sec)
root@BETA[(none)]> select @@read_only;
+-------------+
| @@read_only |
+-------------+
| 0 |
+-------------+
1 row in set (0.000 sec)