Incident documentation/20151218-payments wiki
On December 18 2015 at 11:15 UTC icinga started sending alerts about payments webserver timeouts. We found apache processes backing up, waiting for trivial queries to the fundraising drupal database.
Sequence of Events
- 11:15 icinga alerts begins, Faidon investigates
- 11:38 Jeff contacted and starts investigating
- 11:40 Peter takes banners down
- 12:20 Jeff adjusts mysql table_open_cache and flushes tables on mysql server, services promptly recover
- 12:30 Peter puts banners back up
During the time the database was sluggish payment processing performance was degraded and donors may have seen error messages or timeouts.
We appear to have overrun mysql's configured open table cache, which caused mysql to become sluggish to open and close tables for queries.
Additional mysql tuning was performed after the outage to prevent similar outages.