RT Flow

From Wikitech

Summary

This is the current RT queue life-cycle flow as discussed and adopted in Aug' 2011. With this adoption, there will be:

  • Increased visibility
    • increased visibility of work queue for the TechOps team
    • Visibility for customers into status of their ticket
  • Defined Process
    • Process suppors ability for capacity planning
    • Defined prioritization and scheduling mechanics
    • Decomposition of larger composite tickets into finer-grained tasks
  • Managed expectation
    • able to communicate SLAs for different ticket types
    • Organization will have consistency in RT management

Flow Description

The ticket is created by a user on rt.wikimedia.org. The ticket must contain details that describe the requirements and the end state/completion criteria. Usually information required includes a short description of the requirement, the drop dead date, the 'nice to have' date, capacity & performance needs (which usually translates to h/w, s/w requirements), security and access requirements, and dependencies.

Once the ticket is created, an acknowledgement email would be sent to the user and to the whole TechOps team. The triage and resolution process begins. The ticket is reviewed for completeness and assessed on complexity. The user will be contacted to provide more details if there are insufficient data.

There are several ways a ticket get to be 'taken by' or 'assigned to' an owner. It is common for TechOps engineers to 'take' from the RT queue before even reviewed by the Triager. It is also possible the Bugmeister assigns it to a TechOps engineer after getting agreement with that TechOps engineer. Normally, the Triager will assign tickets that have obvious and logical (mainly skills set match) 'owners' and for those that are still unassigned, they would be discussed and assigned during their staff meeting.

The TechOps engineer will update the ticket when he is working on it and that will trigger a status update email to the creator of the tickets.

Should the engineer decide to escalate the ticket, he will find another engineer to take it over. If he could not, the engineer will inform the Triager.

Next Steps

Create and review aging reports on tickets to troubleshoot ticket/process bottlenecks

  • Institute WIP limits for Ops team members for ticket
  • Create Bugmeister role as the customer-facing point person for the ticket process, triage, categorization and initial research/follow-up. Tentatively Mark and CT
  • Create a New External Queue that is customer-facing with complete visibility
  • Open up RT Memberships through (LDAP) to allow anyone to file as well as view status of tickets
  • Continue with Biweekly triage meetings
  • Adhere to additional diligence on updating tickets with an ETA/Reason at every step in the ticket’s lifecycle
  • Implement "postpone ticket" & CRON jobs
  • Help customers define acceptance criteria on a ticket to be able to know when a ticket is “done”; document what is needed to close each bug
  • Build Sandbox environment to allow organization to deploy code that doesn’t meet WMF deployment criteria
  • Research RT plugins to support decomposition, reporting, and analytics process

Cross Reference page - RT