Jump to content

Incidents/2019-08-20 logstash

From Wikitech

document status: in-review


For about 30 minutes, Logstash was not getting any messages from the MediaWiki servers.


During the Logstash outage, we were partly blind in terms of operational monitoring. It also meant developers were unable to use WikimediaDebug, and unable to deploy new code for MediaWiki and most other services.

While this impacted scheduling and developer productivity, it did not directly affect end-users of any public services. Also, the logs were eventually recovered into Logstash after it was restarted (the Logstash-Kafka consumer picks up where it left off).


  • Icinga alerts.


All times in UTC.


What went well?

  • Detected early. Quickly fixed by restarting.

How many people were involved in the remediation?

  • 1 SRE.