On December 29 2015 at about 21:57 UTC Silicon (the fundraising ActiveMQ server) stopped accepting connections. FR tech received a flood of email alerts from Barium, the queue consumer. Elliott investigated immediately, and paged Jeff.
Sequence of Events
- 22:00 email alerts begin
- 22:05 (correct time?) Peter takes banners down, pages Jeff
- 22:40 Jeff finds CPUs pinned at 100% on Silicon, ActiveMQ unresponsive and restarts ActiveMQ
- 22:41 queue and payments processing return to normal function
- 22:45 Peter puts banners back up
During the time the queue was locked up no donations could be processed, donors may have seen error messages or timeouts. Any data irregularities should sort themselves out via the audit file download and process jobs.
ActiveMQ appears to have crashed.
Further investigation re. potential ActiveMQ bugs.