The databases for OpenStack services are stored on a Galera cluster hosted on the cloudcontrol nodes.
This cluster does not use custom WMF packages; it runs the standard packages from upstream Debian.
The cluster is active/active/active which means that write actions can be taken on any of the cloudcontrol nodes. The only thing preventing to do so is the haproxy in front of them, sending requests just to one of them.
Directions for standing up a new cluster are included in puppet/modules/galera/manifests/init.pp.
OpenStack services tend to use connection pooling, opening many long-lived connections to each database. For this reason, our Galera config has extremely long connection timeouts and very high connection limits.
Restarting the local mariadb process
Don't let puppet restart this.
- Tell haproxy the database is down just to be safe with
sudo touch /tmp/galera.disabled
- To begin, disable puppet so it can't mess with things.
sudo systemctl stop mariadb
- In another shell run
sudo journalctl -u mariadb.service -fto verify it cleanly exits. It can take a few moments or quite a while.
sudo systemctl start mariadb
- Again, watch journalctl to see that it comes up alright.
- Once it is up, access the mysql shell with
sudo -i mysql -u root
SHOW STATUS LIKE "wsrep_local_state_comment";and
SHOW STATUS LIKE "wsrep_ready". If the first isn't "joined" or "synced" the node isn't ready. The second needs to be "ON" or almost all queries against it will fail. If you are manually handling haproxy to keep it out of the cluster, don't include it until you get one of those statuses.
- Repool the database in haproxy by removing the /tmp/galera.disabled file and re-enable puppet
Please read Portal:Cloud_VPS/Admin/Troubleshooting#Galera.