Jump to content


From Wikitech


swift frontends were maxed out on CPU following an rsyslog configuration change, impacting the image scaler cluster and regular swift traffic


  • 2014-09-10T08:52 https://gerrit.wikimedia.org/r/#/c/159348/ is submitted and merged, changing rsyslog and swift proxy configuration
  • 2014-09-10T08:58 first alarm, HTTP timeout on ms-fe1002
  • 2014-09-10T08:59 impact seen on image scalers, LVS alarm for rendering.svc.eqiad.wmnet
  • 2014-09-10T09:01 HTTP 5xx alarm
  • 2014-09-10T09:11 rolling restart of swift frontends
  • 2014-09-10T09:12 LVS recover for ms-fe.eqiad.wmnet
  • 2014-09-10T09:14 LVS recover for rendering.svc.eqiad.wmnet


This was an instance of Incident_documentation/20131205-Swift in which swift busyloops when the syslog socket goes away. The issue was thought to be fixed by latest swift upstream and confirmed during testing. The testing has proven to not replicate the exact conditions for reoccurrence however, as this incident demonstrates. Extra care should be put when deploying rsyslog configuration changes that restart rsyslog as a side effect.

Related graphs:


See related actions for Incident_documentation/20131205-Swift