What are we deploying
When deploying new code to the cluster we we might be deploying refinery (just oozie jobs for example) or refinery source (new java code for changes to pageview definition) or both.
- If you want to deploy refinery source first, follow procedure here: Analytics/Cluster/Refinery-source
- If we are deploying refinery but no refinery/source then we can just do the scap deploy listed below.
- If oozie jobs are affected you might need to re-start those.
How to deploy
The refinery now uses Scap so the deployments are as simple as:
- Ssh into deployment.eqiad.wmnet
- First make sure scap config is up to date:
- Tell the
#wikimedia-operationsIRC channels that you are deploying ( using
- Create a screen/tmux session to prevent network failures from ruining your deployment.
scap deploy "YOUR DEPLOYMENT MESSAGE"
Scap will deploy to the canary host first (stat1007) and then if everything goes fine it will ask you to proceed with the rest of the hosts. If you want to double check the status of the deployment, you can run
scap deploy-log and check what is happening. Please note that Scap will create another copy of the repository after each deployment, and it will use symlinks to switch versions.
- Make sure that all the deployment went fine and rollback in case of fireworks.
- After the deployment, ssh into stat1007. Make sure to wait until the entire scap deployment is completed before ssh to stat1007. If you cd into the refinery directory before the symlinks change, you'll still be seeing the old git log and be very confused.
- Change dir into
/srv/deployment/analytics/refineryand check (git log) that the code has been pulled.
- Create a screen/tmux session to prevent network failures to interfere with the execution of the following command.
sudo -u hdfs /srv/deployment/analytics/refinery/bin/refinery-deploy-to-hdfs --verbose --no-dry-run
- This part brings the refinery code to the HDFS (but it does not resubmit Oozie jobs, if you need to do so please see the Analytics/Cluster/Oozie/Administration page.)
- This step needs to be done only on one host (stat1007 is fine).
- Finally, consider changing any documentation that needs to be updated. This may include: Analytics/Data/Webrequest, Analytics/Data/Pageview_hourly, Research:Page_view (and its sub-pages).
Remember to log onto the analytics IRC channel upon successful deploy, for instance:
Deploying to notebook* hosts
We deploy refinery to notebook/SWAP servers separately from our main deploy targets. We do this to save disk space on notebook servers. The notebook servers only have refinery so they can use the analytics-mysql wrapper to abstract connections to the various MySQL replicas. As such, a deploy to notebook servers is not regularly needed. If you do need to deploy there, you must deploy via scap with with the notebook environment:
scap deploy -e notebook
How to deploy Oozie jobs
You can find test / production deployment information here: Analytics/Cluster/Oozie/Administration
For a tutorial / introduction to oozie, read that page first: Analytics/Cluster/Oozie.