Services/Deployment

From Wikitech
Jump to: navigation, search

Regular Deployment

There are a lot of moving parts in our production stack -- MediaWiki, its extensions, various back-end services, HTTPS handlers, caches, just to name a few. It is thus important that you communicate your deployment schedules on the Deployments page.

Preparing the Deploy Repository

The deployment process starts with updating the deploy repository. Go into your source repository and update it with:

$ ./server.js build --deploy-repo --force --review

The build script will update the pointer of the deploy repository's submodule, create a Docker container in which it will install the module dependencies and send the changes to Gerrit. Review them and merge.

BetaCluster Deployment

Before deploying to production, remember to update and test the deployment in BetaCluster. Log onto deployment-tin.deployment-prep.eqiad.wmflabs and update the repo there:

$ cd /srv/deployment/<service-name>/deploy
$ git pull && git submodule update --init

Time to deploy to BetaCluster:

$ scap deploy '<a-message-here-to-describe-the-changes-being-deployed>'

After the deploy, check the output of your service in BetaCluster.

Deploying to Production

Next, log onto deployment.eqiad.wmnet and update the repo there:

$ cd /srv/deployment/<service-name>/deploy
$ git pull && git submodule update --init

In the #wikimedia-operations IRC channel announce the deployment by logging it into the Server Admin Log with !log <service-name> deploying <deploy-repo-sha1>. Now, proceed to do the deployment from deployment.eqiad.wmnet:

$ scap deploy '<a-message-here-to-describe-the-changes-being-deployed>'

Scap3 will deploy the code, restart the service and check its port and health. In case it detects some problems on the canary node, it will suggest to perform a roll-back. Otherwise it will proceed to deploying it to the rest of the nodes, which completes the deployment process.

Dealing with Problems

Deployment Debugging

Scap3 includes a utility which can be used to monitor the output of the commands executed on the target nodes. Fire up a second terminal, connect to deployment.eqiad.wmnet and execute the scap deploy-log command from /srv/deployment/<service-name>/deploy before starting the deployment. The output should help you figure out what went wrong.

If you haven't started an instance of scap deploy-log during the deploy, but it went badly, you can still recuperate the logs by running scap deploy-log --latest.

Reverting a Deployment

Sometimes the deployment process goes well, but the code that was deployed isn't functioning properly. To revert a deployment and bring the code on the target nodes to a previous state, find the deploy repository's SHA1 that contained the good code and then deploy it with:

$ scap deploy --rev <sha1>

Service Status Inspection

Each service that provides a monitoring specification (via spec.yaml) can be directly checked for health on each of the target nodes by issuing:

$ check-<service-name>

where service-name is the name of the service in ops/puppet. If all is well, you should receive the response:

All endpoints are healthy

For services that locally log their entries, there is an additional command that allows you to look at logs on a target node in a human-readable format:

$ tail-<service-name>

This command accepts all of the arguments that tail does, so if you monitor the logs as they come, you should use:

$ tail-<service-name> -f