User talk:BCornwall/Incident action items risk assessment

Rendered with Parsoid
From Wikitech

Keeping the incident follow-up action item list slim

One idea fielded in the most recent ONFIRE meeting was to avoid creating another list full of items that aren't serviced. Keeping a slim, focused list that's continually reviewed/acted upon might serve us better. Non-urgent improvements need not be part of the list: the "medium" and "low" scoring tickets would simply become regular tickets. BCornwall (talk) 18:56, 2 May 2023 (UTC)Reply

The term "Risk assessment" is confusing: Is this risk of doing it or risk of not doing it?

Some feedback from someone outside of the ONFIRE team: We typically have risk assessment for rolling out a change, so at first this seemed to follow suit. Some clarification would be nice. BCornwall (talk) 18:58, 2 May 2023 (UTC)Reply

Broader context not addressed: How much work will a ticket take?

One person in two hours? A team in 3 months? How does that factor into our prioritization? If it's a quicker fix it could be ranked higher so as to push it out the door quicker. BCornwall (talk) 19:00, 2 May 2023 (UTC)Reply

This might be irrelevant to the risk scoring and more relevant to the individual teams' prioritization. BCornwall (talk) 19:16, 2 May 2023 (UTC)Reply

Concrete definitions of probability

Some more feedback:

Probability should have a short menu of options for people to estimate when this incident will happen again. "if we don't fix this this will occur every day". Instead of certain, probable, possible, maybe daily, monthly, weekly

BCornwall (talk) 19:01, 2 May 2023 (UTC)Reply

How is responsibility managed? Who may dictate assignments to which teams?

There should probably be some sort of accountability baked into this process: It's all well and good to have a risk score but it doesn't do much good if there's no actual work being done to solve it. BCornwall (talk) 19:03, 2 May 2023 (UTC)Reply