Portal:Toolforge/Admin/Redis
Toolforge Redis is running on three nodes (tools-redis-5/6/7). One of them is a master and rest of them are replicas. Keepalived makes sure a virtual IP address is always assigned to the master that the clients can connect to.
If the current master goes down, Redis Sentinel should notice that within five seconds and automatically fail over to a replica. It might take additional 10 seconds for the floating IP to move to the new master.
If Sentinel does not fail over to a new node (use redis-cli info replication
to check), look into /var/log/redis/redis-sentinel.log
on any alive node. If the IP address does not move, check sudo systemctl status keepalived
and check that the /usr/local/bin/wmcs-check-redis-master
script has exit code 0 on the master and 1 on the replicas.
Note that Sentinel requires a quorum to perform any actions - that means that it will not function with two nodes down. Additionally Redis has been configured to not accept any writes on the replicas or on the master if no replicas are connected.
Systemd unit
We are not using the default Systemd unit redis-server.service
that comes with the Debian package. We are using a custom unit named redis-instance-tcp_6379.service
that is deployed via Puppet.
Manual failover
If you need to force a failover or perform other Sentinel actions, you can connect to it using redis-cli on port 26379:
taavi@toolsbeta-redis-1:~$ redis-cli -p 26379
127.0.0.1:26379>
Sentinel commands are listed at redis, use toolforge
as the "master name".
The most useful command is sentinel failover toolforge
which forces a failover to any other available node. You can alternatively add the IP address of the node to fail over to.
Puppet configuration
Puppet is configured to never update (replace => false
) the config files /etc/redis/tcp_6379.conf
and /etc/redis/sentinel-toolforge.conf
, to prevent clashes with Redis Sentinel, which can also modify those files.
This means that if you change or add a config value in modules/profile/manifests/toolforge/redis_sentinel.pp, it will not end up in the actual config files, unless you manually remove them in all Redis server, and let Puppet recreate them.
# # Repeat on all Redis hosts, starting from the current primary
# rm /etc/redis/sentinel-toolforge.conf /etc/redis/tcp_6379.conf
# run-puppet-agent
The commands above will cause Redis to restart and likely cause a short Redis outage.
A possible improvement to the current setup is tracked in phab:T366365.