User:BCornwall/Incident action items risk assessment
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
Wikimedia must maintain control and visibility over open/closed incident follow ups to reduce risk of unaddressed, open items. Risk scoring for unaddressed incident follow-up items and periodic risk review are part of the incident review ritual.
This will encourage us to:
- Follow up on forgotten action items
- Raise concerns over action item neglect and the damage that may result
- Promote accountability in ownership of action items
- Assign previously-unclaimed action items
Risk factors
- Severity
- The amount of damage caused to systems if the item is not addressed.
- Marginal - Risks may cause minor damage but little overall effect
- Minor performance/error concerns that would remain within SLA/budget
- Non-user-impacting degradation of service
- Negligible impact on systems (e.g. unhelpful log messages)
- Serious - Risks may cause major damage
- User-facing service degradation/outages
- Major performance/error concerns that exceed budgets
- Catastrophic - Risks will cause extensive damage and long-term effects to systems
- Extended user-facing service outage
- Systems breach
- Leak of sensitive data
- Marginal - Risks may cause minor damage but little overall effect
- Probability
- The likelihood that the related incident could occur again if the item is not addressed.
- Possible - Not expected to occur
- Probable - May occur
- Certain - Expected to occur eventually
Risk assessment systems often have around five rankings of Severity/Probability; However, to limit the possibility of subjective variance for scoring (See #Limitations of risk matrices) we utilize three. Three rankings grant us flexibility to prioritize action items without getting lost in semantics/difference of opinions.
Risk matrix
Probability | Severity | ||
---|---|---|---|
Marginal | Serious | Catastrophic | |
Certain | Medium | High | Unbreak Now! |
Probable | Low | Medium | High |
Possible | Low | Low | Medium |
Limitations of risk matrices
From What's wrong with risk matrices? by Louis Anthony Cox Jr:
Categorizations of severity cannot be made objectively for uncertain consequences. Inputs to risk matrices (e.g., frequency and severity categorizations) and resulting outputs (i.e., risk ratings) require subjective interpretation, and different users may obtain opposite ratings of the same quantitative risks. These limitations suggest that risk matrices should be used with caution, and only with careful explanations of embedded judgments.
Risk scoring matrices are a tool for somewhat-standardized evaluation of at-risk actionables but still require scrutiny for priority of an engineer's time.