Portal:Toolforge/Admin/Runbooks/Redis

From Wikitech


This runbook explains how to debug and fix common issues with Toolforge Redis.

Error / Incident

  • checker.tools.wmflabs.org/toolschecker: Redis set/get is a paging alert that triggers where toolschecked is unable to talk to Redis.

Debugging

SSH to Redis hosts at tools-redis-X.tools.eqiad1.wikimedia.cloud and check the output of redis-cli info replication.

The Redis service is using a non-standard unit name in Systemd: redis-instance-tcp_6379.service

Common issues

max number of clients reached

When this happens, the following message will be logged: ERR max number of clients reached. Check in the logs of all servers with sudo journalctl -g "max number of clients". If you find this message appearing repeatedly, restart the Systemd unit on the hosts where the message is logged:

$ sudo systemctl restart redis-instance-tcp_6379.service

Related information

Portal:Toolforge/Admin/Redis

Support contacts

#wikimedia-cloud-admin is the main communication channel for Toolforge admins.

If Redis is down, you should follow the Wikimedia Cloud Services team/Incident Response Process

Old incidents

Add here any new tasks for incidents you might encounter.