Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate
Appearance
The procedures in this runbook require admin permissions to complete.
The ToolforgeWebHighErrorRate alert fires when a the number of overall 5xx responses from Toolforge tool web requests exceeds 25%.
Debugging
The alert links to the HAProxy Grafana dashboard. If that does not show anything obvious, then the issue is likely on the Kubernetes cluster end.
Note that the 5xx count includes individual broken tools. That's "normal" and an issue for the maintainers of those individual tools to fix. This alert is there to detect large, platform-wide issues that need urgent attention.
HAProxy debugging
Some tricks to help debug HAProxy:
- Show currently active connections per tool
$ echo "show table limit-per-tool" | sudo socat unix-connect:/run/haproxy/admin.sock stdio
