Obsolete:1.17 deployment plan
Outline for deployment of 1.17
General plan
1.17 has two features that are complicated from a deployment perspective:
- Improved category collation
- Resource Loader
The category collation code is not likely to change the load characteristics of the site, but is difficult to back out once we deploy it. Resource Loader could have a large impact on the site performance, but is relatively easy to back out. Deploying the two together means that we have code with unknown performance that we can't back out.
We decided the safest path would be to:
- Make category collation configurable, so that we could remain on the legacy collation code for a bit longer
- Deploy 1.17 with legacy collation (dark launching the new category collation code)
- Enable new category collation some time after 1.17 launches
This plan makes it relatively straightforward to back out Resource Loader in case we need to do so for performance reasons.
Preparation work
Development
- See dev planning docs:
Dev tasks for ops:
- make a 1.17wmf1 deployment branch (Tim/Roan)
- Update extension list in make-wmf-branch/default.conf
- Check if extensions need schema updates (unassigned)
- Make it possible to conditionally deploy collation support (Tim)
- build a test infrastructure to validate 1.17 deployment - prototype (Priyanka)
- turn on profiler on 1.17 during testing (Priyanka & Roan)
- Figure out 1.17 post-deployment shifts for devs so that there's continuous coverage immediately after deploy (RobLa)
- Make sure collation code handles having only default values gracefully (Tim)
Operations
- RT #530: 1.17 deployment tracking
- 531: (Mark and Ryan) Set up URL rewriting scheme on bits
- 532: (catrope) Configure $wgLoadScript
- 533: (Tim) Apply schema changes for 1.17
- Some schema changes can be done in advance
- Others needed with code changes, to prevent breakage of live sites
- Need to separate category collation changes from others
- Perform load testing for Varnish - (Mark+ops)
- Decide whether we use ESI/Varnish or bits (RT ticket needed)
- Double check necessary backups (Mark)
Prior Testing
- Test environment: prototype.wikimedia.org
Schedule
Current target (subject to change based on data center move):
- Tuesday, February 8, 2011 at 07:00:00 UTC
- 1.17 deployment (Only core and existing extensions)
- Date TBD
- Collation made live
- Date TBD
- New extensions go live
Deployment Steps/Sequence
Below is the checklist for this deployment. See How to deploy code for details on the checklist. Each item should have an owner, and a time that it's scheduled to be done.
- Finish/check database schema updates. (Tim/Ryan)
- Procedure: perform changes to slave db first and then change that to be the master
- Get the code on fenari (owner? time?)
- Configuration and other prep work
- Add a configuration switch for new extensions (owner? time?)
- Add new extensions to extension-list (owner? time?)
- scap (owner? time?)
- 24x7 developer coverage for the first few days after deployment
- Once ops is happy that we won't need to back out, re-enable category collation feature.
- Run maintenance/updateCollation.php. Previous testing (r69961) indicates that this will take at least a few days to run.
Backing out
Category collation changes probably cannot be backed out due to nature of changes. Other database additions/changes should have no effect on prior changes.
Risks & Mitigations
Identified Risks and Migations :
- db errors on partially deployed version of 1.17
- Make sure code handles having only default values gracefully
- perform db backups before any changes
- Load testing of bits and varnish
- Deploying the new categorylinks collation code before running updateCollation.php will create some extra write pressure on the DB servers, because the categorylinks rows will automatically be upgraded on edit. There is a small chance of this leading to uncontrollable replication lag.