Portal:Cloud VPS/Admin/Runbooks/MainProxyDown
Appearance
The MainProxyDown alert fires when the monitoring system cannot connect to the primary web proxy service address is unreachable. This is a highly user-facing outage and likely consitutes an incident.
The procedures in this runbook require admin permissions to complete.
Debugging
In general, there are two components that could fail:
- The proxy stack itself (i.e. nginx + Redis). This should be detected by the separate per-instance
MainProxyInstanceDown(runbook) alert: that alert firing generally means you can focus on this and ignore keepalived-related issues.- If one of the nginx has failed, and the other is alive, you might be able to save some time by stopping the
keepalivedservice (and disabling Puppet) on the broken one to fail over traffic to the working one.
- If one of the nginx has failed, and the other is alive, you might be able to save some time by stopping the
keepalived, which is responsible for ensuring that exactly one of the instances has the service IPs allocated.- If neither, or both of the instances, have the service IPs assigned to the interface, try kicking the service.
- If the IPs are correctly allocated on exactly one instance, check that the Neutron ports are configured correctly.
