User:Mobrovac/My Guide To The Galaxy
TODO
- general intro
- before first deployment
- set up
- mw-vagrant
- beta cluster
- new service request
- deployment
- service operation
You should start thinking about the deployment process more than a month before the actual date you want to see your service deployed in production for the first time. There are a number of steps to be completed before this can happen. This collection of documents will guide through this process.
MediaWiki Vagrant
MediaWiki Vagrant is a very convenient way for developers to rapidly set up a development environment containing a MediaWiki instance and any needed dependencies in a virtualised environment. Your service needs to be present there as well. Luckily, setting it up is very easy. First, in the vagrant directory, create the directory for your service's module and place this code inside <vagrant-dir>/puppet/modules/<service-name>/manifests/init.pp
:
# == Class: <service-name> # # <a-short-description-of-the-service-here> # # === Parameters # # [*port*] # Port the service listens on for incoming connections. # # [*log_level*] # The lowest level to log (trace, debug, info, warn, error, fatal) # class <service-name>( $port, $log_level = undef, ) { service::node { '<service-name>': port => $port, log_level => $log_level, config => {}, } }
This is the minimum amount of code your service's Puppet module should have. As you can see, this definition does not provide any extra configuration for the service. If that is needed, simply add the configuration stanzas to the config
hash as key/value pairs. Note that only configuration specific to your service should be listed here and not the whole configuration file, i.e. only the configuration parameters that your service code accesses via app.conf.*
.
In order to configure the port (and any other parameters that you might have declared for the class), add the following contents to puppet/hieradata/common.yaml
:
<service-name>::port: <service-port>
The last step is to create the role so that users may (de)activate it easily. Place the following Puppet code in puppet/module/role/<service-name>.pp
:
# == Class: role::<service-name> # This role installs <service-name> # class role::<service-name> { include ::<service-name> }
Finally, the service's port must be exposed to the host environment; create the file puppet/modules/role/settings/<service-name>.yaml
with:
forward_ports: <service-port>: <service-port>
You are done! You can now submit the patch for review and anybody will be able to profit from the service in the MediaWiki-Vagrant environment.
First Deployment
Repositories
We require that all services are hosted on our Gerrit servers. It does not have to be your primary development technique or tool, even though you are strongly encouraged to do so.
Because Node.js services use npm dependencies which can be binary, these need to be pre-built. Therefore, two repositories are needed; one for the source code of your service, and the other, so-called deploy repository. Both should be available as WM's Gerrit repositories with the paths mediawiki/services/your-service-name and mediawiki/services/your-service-name/deploy, respectively. When requesting them ask for the former to be a clone of the service template (or of your own service repository) and the latter to be empty.
It is important to note that the deploy repository is only to be updated directly before (re-)deploying the service, and not on each patch merge entering the master branch of the regular repository. In other words, the deploy repository mirrors the code deployed in production at all times.
The remainder of this guide assumes these two repositories have been created and that you have cloned them using your Gerrit account, i.e. not anonymously, with the following outline:
~/code/ |- your-service -- deploy
This guide refers to these two repositories as the source repository and the deploy repository, respectively.
Source Repo Configuration
The service template includes an automation script which updates the deploy repository, but it needs to be configured properly in order to work.
package.json
The first part of the configuration involves keeping your source repository's package.json
updated. Look for its deploy
stanza. Depending on the exact machine on which your service will be deployed, you may need to set target to either ubuntu
or debian
(most likely and default value if missing).
If you want to specify a version of Node.JS, different from the official distribution package, set the value of the node
stanza to the desired version, following nvm versions naming. To explicitly force official distribution package, "system"
version can be used.
The important thing is keeping the dependencies
field up to date at all times. There you should list all of the extra packages that are needed in order to build the npm module dependencies. The _all
field denotes packages which should be installed regardless of the target distribution, but you can add other, distribution-specific package lists, e.g.:
"deploy": { "target": "ubuntu", "node": "system", "dependencies": { "ubuntu": ["pkg1", "pkg2"], "debian": ["pkgA", "pkgB"], "_all": ["pkgOne", "pkgTwo"] } }
In this example, with the current configuration, packages pkg1, pkg2, pkgOne and pkgTwo are going to be installed before building the dependencies. If, instead, the target is changed to debian
, then pkgA, pkgB, pkgOne and pkgTwo are selected.
As a rule of thumb, whenever you need to install extra packages into your development environment for satisfying node module dependencies, add them to deploy.dependencies to ensure the successful build and update of the deploy repository.
Local Git
The script needs to know where to find your local copy of the deploy repository. To that end, when in your source repository, run:
$ git config deploy.dir /absolute/path/to/deploy/repo
Using the aforementioned local outline, you would type:
$ git config deploy.dir /home/YOU/code/deploy
Deploy Repo Set-up
If you haven't yet done so, initialise the deploy repository:
$ cd ~/code/deploy $ git review -s $ touch README.md $ git add README.md $ git commit -m "Initial commit" $ git push -u origin master # or git review -R if this fails # go to Gerrit and +2 your change, if needed and then: $ git pull
Next, you need prepare the deploy repository for usage with Scap3. Create the scap
directory inside your deploy repository and fill the contents of scap/scap.cfg
with:
[global] git_repo: <service-name>/deploy git_deploy_dir: /srv/deployment git_repo_user: deploy-service ssh_user: deploy-service server_groups: canary, default canary_dsh_targets: target-canary dsh_targets: targets git_submodules: True service_name: <service-name> service_port: <service-port> lock_file: /tmp/scap.<service-name>.lock [wmnet] git_server: tin.eqiad.wmnet
This represents the basic configuration needed by Scap3 to deploy the service. We still need to tell Scap3 on which nodes to deploy and which checks to perform after the deployment on each of the nodes. First, the list of nodes. Two files need to be created: scap/target-canary
and scap/targets
. In the former, you need to put the FQDN of the node that will act as the canary deployment node, i.e. the node that will first receive the new code, while in the latter file put the remainder of the nodes. For example, if your target nodes are in the SCB cluster, these files should look like this:
$ cat target-canary scb1001.eqiad.wmnet $ cat targets scb1002.eqiad.wmnet scb2001.codfw.wmnet scb2002.codfw.wmnet
Finally, enable the automatic checker script to check the service after each deployment by placing the following in scap/checks.yaml
:
checks: endpoints: type: nrpe stage: promote command: check_endpoints_<service-name>
Commit your changes, send them to Gerrit for review and merge them.
The deployment process includes a script that builds the deployment repository using Docker containers, so make sure you have the latest version installed. Additionally, you need to add your user to the `docker` group after installation so that you don't need to use `sudo` when running the build script:
$ sudo usermod -a -G docker <your-user>
You need to log out of all of the terminals in order for the change to take effect.
New Service Request
There are various prerequisites that need to be taken care of on the operational side before your service can see the day of light in production: machine allocation, IPs, LVS, etc. In order to express the intent of deployment, you need to complete a new service request, by filing a task against the service-deployment-requests project in Phabricator. Be prepared to give the following information:
- name: the name of the service to be deployed
- description: a paragraph explaining clearly what the service does and why it is needed
- timeline: the desired deployment timeline; note that you should allow a minimum of at least two to three weeks cadence
- point person: the person responsible for the service; this is the person that will get called when there are problems with the service when running in production
- technologies: additional information about the service itself, including, but not limited to, the language used for development and any frameworks used
- request flow diagram: a link to a request flow diagram that explains the interaction between your service and any other parts of the operational stack inside the production cluster, such as requests made to MediaWiki, RESTBase, etc.
For some example tickets see task T105538, task T117560, task T128463.
Role and Module Creation
While you are waiting for the service request to be completed, do not fear: you still have useful things to do. You may start by creating your service's Puppet role and module in the operations/puppet
repository. First, add your service's deploy repository to the list of repositories deployed in production by appending the following block to hieradata/common/role/deployment.yaml
(note the extra spaces at the beginning of each line):
<service-name>/deploy: upstream: https://gerrit.wikimedia.org/r/mediawiki/services/<service-name>/deploy checkout_submodules: true
Next, create modules/<service-name>/manifests/init.pp
and put the following content in it:
# == Class: <service-name> # # Describe the service here ... # # === Parameters # # [*param_name1*] # Description of param_name1 # # [*param_name2*] # Description of param_name2 # class <service-name>( $param_name1 => 'def_val1', $param_name2 => 'def_val2', ) { service::node { '<service-name>': port => <service-port>, config => { param_name1 => $param_name1, param_name2 => $param_name2, }, healthcheck_url => '', has_spec => true, deployment => 'scap3', } }
Note that only configuration specific to your service should be listed here and not the whole configuration file, i.e. only the configuration parameters that your service code accesses via app.conf.*
. Instead of in-lining it directly in the module, you can also store the configuration in form of an ERB YAML template in modules/<service-name>/templates/config.yaml.erb
. Then, simply use it directly for the config
parameter for the service::node
resource like so:
config => template('<service-name>/config.yaml.erb'),
You will also need a role for your service. Put the following code fragment into manifests/role/<service-name>.pp
:
# Role class for <service-name> class role::<service-name> { system::role { 'role::<service-name>': description => 'short description', } include ::<service-name> }
You can now submit the patch for review. Don't forget to mention the service request bug in your commit message.
Access Rights
As the service owner and maintainer, you need to be able to log onto the nodes where your service is running. Once the exact list of target nodes is known, you need to file an access request ticket with the following information:
- Ttile: Access Request for <list-of-maintainers> for <service-name>
- Description: <list-of-maintainers> needs access to <list-of-nodes> for operating <service-name>. We need to be able to read the logs at
/srv/log/<service-name>
and be able to start/stop/restart it. The task asking for the service's deployment is {<service-request-task-number>}
This request implies sudo
rights on the target nodes, so you will need the approval from your manager on the task.
Beta Cluster
- TODO** at a later time...
Deployment
Regular Deployment
There are a lot of moving parts in our production stack -- MediaWiki, its extensions, various back-end services, HTTPS handlers, caches, just to name a few. It is thus important that you communicate your deployment schedules on the Deployments page.
The deployment process starts with updating the deploy repository. Go into your source repository and update it with:
$ ./server.js build --deploy-repo --force --review
The build script will update the pointer of the deploy repository's submodule, create a Docker container in which it will install the module dependencies and send the changes to Gerrit. Review them and merge. Next, log onto `deployment.eqiad.wmnet` and update the repo there:
$ cd /srv/deployment/<service-name>/deploy $ git pull && git submodule update --init
In the #wikimedia-operations
IRC channel announce the deployment by logging it into the Server Admin Log with !log <service-name> deploying <deploy-repo-sha1>
. Now, proceed to do the dpeloyment from deployment.eqiad.wmnet
:
$ deploy
Scap3 will deploy the code, restart the service and check its port and health. In case it detects some problems on the canary node, it will suggest to perform a roll-back. Otherwise it will proceed to deploying it to the rest of the nodes, which completes the deployment process.
Dealing with Problems
Deployment Debugging
Scap3 includes a utility which can be used to monitor the output of the commands executed on the target nodes. Fire up a second terminal, connect to deployment.eqiad.wmnet
and execute the deploy-log
command from /srv/deployment/<service-name>/deploy
before starting the deployment. The output should help you figure out what went wrong.
If you haven't started an instance of deploy-log
during the deploy, but it went badly, you can still recuperate the logs by running deploy-log --latest
.
Reverting a Deployment
Sometimes the deployment process goes well, but the code that was deployed isn't functioning properly. To revert a deployment and bring the code on the target nodes to a previous state, find the deploy repository's SHA1 that contained the good code and then deploy it with:
$ deploy --rev <sha1>
Service Operation
Starting, Stopping, Restarting
If you have sudo
rights on the target machines, then that's as simple as logging onto each of the targets and issuing the respective commands:
$ sudo service <service-name> start $ sudo service <service-name> stop $ sudo service <service-name> restart
Monitoring
Logs
The service's logs are stored locally in /srv/log/<service-name>/main.log
. To take a look, simply tail it:
$ tail -f /srv/log/<service-name>/main.log
Since the log entries are JSON-formatted, you may want to see them in a more presentable form. Use bunyan
for that:
$ tail -f /srv/log/<service-name>/main.log | /srv/deployment/<service-name>/deploy/node_modules/.bin/bunyan