Toolforge Redis is running on three nodes (tools-redis-5/6/7). One of them is a master and rest of them are replicas. Keepalived makes sure a virtual IP address is always assigned to the master that the clients can connect to.
If the current master goes down, Redis Sentinel should notice that within five seconds and automatically fail over to a replica. It might take additional 10 seconds for the floating IP to move to the new master.
If Sentinel does not fail over to a new node (use
redis-cli info replication to check), look into
/var/log/redis/redis-sentinel.log on any alive node. If the IP address does not move, check
sudo systemctl status keepalived and check that the
/usr/local/bin/wmcs-check-redis-master script has exit code 0 on the master and 1 on the replicas.
Note that Sentinel requires a quorum to perform any actions - that means that it will not function with two nodes down. Additionally Redis has been configured to not accept any writes on the replicas or on the master if no replicas are connected.
If you need to force a failover or perform other Sentinel actions, you can connect to it using redis-cli on port 26379:
taavi@toolsbeta-redis-1:~$ redis-cli -p 26379 127.0.0.1:26379>
Sentinel commands are listed at redis, use
toolforge as the "master name".
The most useful command is
sentinel failover toolforge which forces a failover to any other available node. You can alternatively add the IP address of the node to fail over to.