User:BCornwall/Incident action items risk assessment

This page is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.

Wikimedia must maintain control and visibility over open/closed incident follow ups to reduce risk of unaddressed, open items. Risk scoring for unaddressed incident follow-up items and periodic risk review are part of the incident review ritual.

This will encourage us to:

Follow up on forgotten action items
Raise concerns over action item neglect and the damage that may result
Promote accountability in ownership of action items
Assign previously-unclaimed action items

Risk factors

Severity

The amount of damage caused to systems if the item is not addressed.

Marginal - Risks may cause minor damage but little overall effect
- Minor performance/error concerns that would remain within SLA/budget
- Non-user-impacting degradation of service
- Negligible impact on systems (e.g. unhelpful log messages)
Serious - Risks may cause major damage
- User-facing service degradation/outages
- Major performance/error concerns that exceed budgets
Catastrophic - Risks will cause extensive damage and long-term effects to systems
- Extended user-facing service outage
- Systems breach
- Leak of sensitive data

Probability

The likelihood that the related incident could occur again if the item is not addressed.

Possible - Not expected to occur
Probable - May occur
Certain - Expected to occur eventually

Risk assessment systems often have around five rankings of Severity/Probability; However, to limit the possibility of subjective variance for scoring (See #Limitations of risk matrices) we utilize three. Three rankings grant us flexibility to prioritize action items without getting lost in semantics/difference of opinions.

Risk matrix

Probability	Severity
Probability	Marginal	Serious	Catastrophic
Certain	Medium	High	Unbreak Now!
Probable	Low	Medium	High
Possible	Low	Low	Medium

Limitations of risk matrices

From What's wrong with risk matrices? by Louis Anthony Cox Jr:

Categorizations of severity cannot be made objectively for uncertain consequences. Inputs to risk matrices (e.g., frequency and severity categorizations) and resulting outputs (i.e., risk ratings) require subjective interpretation, and different users may obtain opposite ratings of the same quantitative risks. These limitations suggest that risk matrices should be used with caution, and only with careful explanations of embedded judgments.

Risk scoring matrices are a tool for somewhat-standardized evaluation of at-risk actionables but still require scrutiny for priority of an engineer's time.