Incident response/Process improvement

From Wikitech

ONgoing reFormulation of Incident Response Efforts (ONFIRE)

The problem

Incident response has been raised repeatedly by the WMF SRE Team as an area for improvement in various ways. In January 2019, a discussion amongst SREs raised the following key pain points:

  • “I don’t know if I need to do something”
  • very unequal distribution of burden
  • not clear what to do or how to do it or who to escalate to
  • don’t have shared understanding of process and definitions.

Paging & Incident Response Working Group

At the January 2019 offsite, the SRE team agreed that a working group, with Joel facilitating/PMing, will identify and offer solutions for the problems with status quo incident reporting, with something to implement before June 2019. Meeting notes for the group are at Incident response/Process improvement/Meetings.

View our charter for more information.