Portal:Cloud VPS/Admin/Galera
The databases for OpenStack services are stored on a Galera cluster hosted on the cloudcontrol nodes.
Deployment
This cluster does not use custom WMF packages; it runs the standard packages from upstream Debian.
The cluster is active/active/active which means that write actions can be taken on any of the cloudcontrol nodes. The only thing preventing to do so is the haproxy in front of them, sending requests just to one of them.
Directions for standing up a new cluster are included in puppet/modules/galera/manifests/init.pp.
DB setup
OpenStack services tend to use connection pooling, opening many long-lived connections to each database. For this reason, our Galera config has extremely long connection timeouts and very high connection limits.
General Operations
Restarting the local mariadb process
Don't configure puppet to restart this. Only restart one node at a time to avoid split-brain.
- Tell haproxy the database is down just to be safe with
sudo touch /tmp/galera.disabled
- To begin, disable puppet so it can't mess with things.
sudo systemctl stop mariadb
- In another shell run
sudo journalctl -u mariadb.service -f
to verify it cleanly exits. It can take a few moments or quite a while. sudo systemctl start mariadb
- Again, watch journalctl to see that it comes up alright.
- Once it is up, access the mysql shell with
sudo -i mysql -u root
- Run
SHOW STATUS LIKE "wsrep_local_state_comment";
andSHOW STATUS LIKE "wsrep_ready"
. If the first isn't "joined" or "synced" the node isn't ready. The second needs to be "ON" or almost all queries against it will fail. If you are manually handling haproxy to keep it out of the cluster, don't include it until you get one of those statuses. - Repool the database in haproxy by removing the /tmp/galera.disabled file and re-enable puppet
Depool a single node
- Disable Puppet on the node
sudo touch /tmp/galera.disabled
Troubleshooting
Please read Portal:Cloud_VPS/Admin/Troubleshooting#Galera.
See also
TODO.