SWAT deploys

From Wikitech
Jump to: navigation, search
Putting the prod in production. Production Drive Committee - NARA - 534919.jpg

SWAT deploys are deployments that happen during a SWAT deploy window (naturally) that are done by a member(s) of the SWAT deploy team (see below).

The purpose is to provide a known window for people to get bug fixes deployed ahead of the normal cadence (currently weekly) without having to beg people to do the deployment for them; the SWAT team is there to do the deployment part.

Guidelines

  • There will be at least one SWAT deploy team member available and active during the window.
  • If you have a proposed patch to go you MUST be in #wikimedia-operations connect on Freenode to communicate with the SWAT team member.
  • All communication MUST happen in #wikimedia-operations connect on Freenode (not in separate team or area-specific channels)
  • Allowed types of patches Things not fitting these criteria should use the standard deploy window process
    • Everything should be already committed into master and backported to the relevant release branch
    • No new features/extensions
    • Fixes of regressions
    • Simple config changes (that don't turn on any new features)
    • Nothing that needs prior public communication
  • The SWAT team MUST be comfortable with the patch going out and CAN veto any proposed patch they are not comfortable with for any reason
  • Our windows have a limit of 8 patches.
    • If you're cherry-picking a patch to both release branches, that counts as 2. Speak to the Foundation's Release Manager (Greg Grossmeier aka "greg-g") in #wikimedia-operations connect on Freenode

Patch submission

  • Prepare patches in gerrit against the current live (usually wmf.NN) branches (or a subset if the bug is limited)
    • Depending on the patch, positive reviews beforehand are necessary (the SWAT team is not responsible for code review)
  • Add the gerrit URL and your IRC name to Deployments calendar page in the correct SWAT deploy slot, eg:
    • gerrit:118741 backport bug 62634 (Special:Contributions fatal for some mw.org Flow users) fix to 1.23wmf18 - Erik (ebernhardson)
  • Be sure to note who will be present for the deployment and able to test the patch, especially if different from the author of the patch.

Doing the deploy

  • The SWAT team coordinates the merging and deploying of the patches. The order to deploy the patches is decided by them.
  • The SWAT team may ask questions regarding the patches to understand the implications and assess risk. The relevant developers should ideally be on IRC in the hour prior to the SWAT window.
  • The SWAT team will ping the relevant developers at the start of the window and when theirs is up; they MUST be available. If they are not available the patch will not be deployed.
  • The relevant developers should have their test cases ready to run as soon as their patches are deployed.

The team

Membership in the SWAT Team is managed by the WMF Release Manager.

08:00 SF window

  • Brad Jorsch
  • Chad Horohoe
  • Mark Holmquist
  • Alex Monk
  • Tyler Cipriani
  • Katie Filbert

16:00 SF window

  • Roan Kattouw
  • Chad Horohoe
  • Alex Monk
  • Adam Wight (Thursdays only)
  • Sébastien Santoro (Dereckson)

SWAT Team members roles, responsibilities, and tips

Trust

  • Being a member of the SWAT team imparts a large amount of trust on the person. In some ways more trust that simply access to deploying on the Wikimedia cluster as others are encouraged to ask you to deploy things on their behalf and you must be willing to say "No" when you are uncomfortable. Making mistakes is to be human, but not learning from them will cause SWAT deployers to lose their deployment access.

Knowledge

  • SWAT deployers need not be experts in all parts of our infrastructure, but they must be comfortable with assessing the general risk of a given patch. If needed, they should ask probing questions to the developer submitting the patch to learn more.
    • Experience with MediaWiki and MediaWiki config changes a plus as that is the vast majority of changes that come through the SWAT process.
  • Some unintuitive situations include:
    • a "simple" config change causing a load spike in a dependent system that the deployer or developer is not familiar with
    • a "simple" config change being against "the community's", the Wikimedia Foundation's, or both's desires
      • controversial changes can easily be skipped and referred to the WMF Release Manager for next steps, there is no need to rush these
  • If a SWAT deployer is uncomfortable with a certain area of the code-base they are free to skip that backport at their own discretion (or have another SWAT deployer review and/or deploy it).

Decisiveness

  • SWAT deployers should not feel obligated to help a developer debug a situation, especially if there is a user-facing issue or outage.
  • When in doubt: Revert and ask questions later

Humour

SWAT, or the Setting Wikis Ablaze Team, is responsible for breaking the site on a regular basis so we all don't get too used to a stable platform upon which to develop. Compare chaos monkey or unit testing (which are similar to each other) - this is less automatic but much more effective at breaking things.

New SWAT Team member check-list

  • Read and be comfortable with the above roles, responsibilities and tips
  • Shell and deploy access in production, see
  • Access to merge changes in wmf deploy branches by being added to the wmf-deployments gerrit group
    • Ask any existing wmf-deployments group member to do this.
  • Join (and read) the operations mailing list (ops@lists.wikimedia.org)
    • This is because announcements that could impact how and/or when to deploy things are primarily sent there.
  • Read the docs