User:Thcipriani/Deployments/Essay

From Wikitech
Opinions™ about deployments ahead. Proceed with caution

Everyone involved in MediaWiki deployment: developers, deployers, and SREs wish our deploys were faster and safer.

Almost all medium and large companies separate the activities of development and infrastructure management [...] the most important thing to keep in mind is that all stakeholders have a common goal: making the release of valuable software a low-risk activity.

– Continuous Delivery, Jez Humble (pg 280)

Our current process is faster and safer than at any time in the past, but our deployment process is neither fast nor safe.

We've released 24 versions in the 30 weeks of the 1.36 release cycle as of February 2021. In that cycle there have been 286 backports, and we've rolled back the train 27 times.

We exhibit many anti-patterns in our approach to deployment. A quote from Continuous Delivery that I've circled, starred, and written "¡OMG!" in the margins next to captures the current status well:

As pressure increases the defined process for collaboration between the development and deployment teams is subverted, in order to get the deployment done within the time allocated to the deployment team

– Continuous Delivery, Jez Humble (pg 8)

Organizational Impediments

The goal of the Deployment Pipeline is to deliver code to production faster. This is a technical solution to a bigger problem.

There are impediments to delivering software outside of our technical ability to deliver safely (although there is much room for improvement there). In identifying a set of capabilities that drive higher software delivery:

[DevOps Research and Assessment -- a team inside of Google Cloud]'s research shows that a high-trust, generative culture predicts software delivery and organizational performance in technology.

https://cloud.google.com/solutions/devops/devops-culture-westrum-organizational-culture

Simply put: we need more communication and more empathy to deploy faster. The design of the triage meetings is to communicate informally between teams about our delivery.

The way to change culture is not to first change how people think, but instead to start by changing how people behave—what they do.

– John Shook, Toyota

The train log triage meeting’s goals are:

  1. to ensure that developers are looking at production logs
  2. to communicate that to deployers

Any process that achieves those two goals will make deployment faster...eventually.