Portal:Cloud VPS/Admin/Runbooks/OpenstackAPIs

From Wikitech
The procedures in this runbook require admin permissions to complete.

Note: some of these steps might require extra access to internal infrastructure systems, we are working on improving the runbooks, until then, take this as a guideline.

Error

This page is specifically for managing load, DDOS, or other malicious use of the OpenStack APIs.

There is not currently special-purpose monitoring for API responsiveness, so issues with load will probably show up as a failure of the nova-fullstack service or errors on Horizon.

Debugging

All APIs are accessed via a single service address, openstack.eqiad1.wikimediacloud.org. This is mapped to a single cloudcontrol node running HA proxy. The first thing to check is whether an API outage is due to a service actually being down; this can be seen by checking the HAProxy logs:

$ sudo tail /var/log/haproxy/haproxy.log

This will show messages about any api services being up or down. Services which show as down can be safely restarted.

If all services are up but overwhelmed by traffic, you can see traffic to a particular frontend by altering that frontend's config. For example, for nova-api, the haproxy config is /etc/haproxy/conf.d/nova_api.cfg . Enable per-request logging by uncommenting this line:

# log /dev/log local0 debug

and then running

# sudo systemctl restart haproxy.service

After that, you should start seeing one line per connection in haproxy.log

Note: haproxy logrotate is a bit broken. Sometimes the active logfile is haproxy.log and sometimes it's haproxy.log.1. If haproxy is empty or seems unusually quiet, check for the active log with 'ls -ltrah /var/log/haproxy/'

Solutions

Haproxy supports a variety of throttling and blocking options. There are ready-made block lists for either IP address or user agent; these blocks will apply for all haproxy-backed services uniformly. Throttling and/or more nuanced blocking is left as an exercise for the reader, and best saved for after any immediate downtime is resolved.

Solution: block a user agent

The /etc/haproxy/agentblocklist.txt file contains a simple list of user agent strings. User agents matching any string in that file are blocked.

agentblocklist.txt is managed by puppet and comes from modules/openstack/files/haproxy/agentblocklist.txt. Lines preceded by # will be ignored.

Solution: block an IP

The /etc/haproxy/ipblocklist.txt file contains a simple list of ip addresses; any http request from an IP address listed there will return 403.

ipblocklist.txt is managed by puppet and comes from modules/openstack/files/haproxy/ipblocklist.txt. Don't forget to add a comment explaining why a given ip is blocked! Lines preceded by # will be ignored.

Contacts

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)

More info

Related tasks