ORES went down and responded slowly for ~2 hours today.
- 1930 UTC: New deployment begins
- 2005 UTC: ORES begins to be overloaded
- 2025 UTC: A problem with old Jessie installs is discovered Phab:T130463 -- it turns out that it was really a pip issue with versioning https://github.com/pypa/pip/issues/214
- 2130 UTC: A new cluster is built and requests are being served at the rate that they come in
- 2300 UTC: A new cluster configuration is complete.
- Pip does not remove old versions when installing new wheels. This will need to be done manually
- Our precaching utility will back-up during a short outage and unleash a load of requests on the service when it comes back online