Portal:Cloud VPS/Admin/Galera

The databases for OpenStack services are stored on a Galera cluster hosted on the cloudcontrol nodes.

Deployment

This cluster does not use custom WMF packages; it runs the standard packages from upstream Debian.

The cluster is active/active/active which means that write actions can be taken on any of the cloudcontrol nodes. The only thing preventing to do so is the haproxy in front of them, sending requests just to one of them.

Directions for standing up a new cluster are included in puppet/modules/galera/manifests/init.pp.

DB setup

OpenStack services tend to use connection pooling, opening many long-lived connections to each database. For this reason, our Galera config has extremely long connection timeouts and very high connection limits.

General Operations

Restarting the local mariadb process

Don't configure puppet to restart this. Only restart one node at a time to avoid split-brain.

Tell haproxy the database is down just to be safe with sudo touch /tmp/galera.disabled
To begin, disable puppet so it can't mess with things.
sudo systemctl stop mariadb
In another shell run sudo journalctl -u mariadb.service -f to verify it cleanly exits. It can take a few moments or quite a while.
sudo systemctl start mariadb
Again, watch journalctl to see that it comes up alright.
Once it is up, access the mysql shell with sudo -i mysql -u root
Run SHOW STATUS LIKE "wsrep_local_state_comment"; and SHOW STATUS LIKE "wsrep_ready". If the first isn't "joined" or "synced" the node isn't ready. The second needs to be "ON" or almost all queries against it will fail. If you are manually handling haproxy to keep it out of the cluster, don't include it until you get one of those statuses.
Repool the database in haproxy by removing the /tmp/galera.disabled file and re-enable puppet

Depool a single node

Disable Puppet on the node
sudo touch /tmp/galera.disabled

Troubleshooting

Please read Portal:Cloud_VPS/Admin/Troubleshooting#Galera.