Wikimedia Cloud Services team/EnhancementProposals/Decision record T316866 Openstack Upgrade Cadence
Origin task: phab:T316866
Date of the decision: 5 October 2022
People in the decision meeting (alphabetical order):
- User:Andrew_Bogott
- User:Arturo_Borrero_Gonzalez
- User:David_Caro
- User:FNegri
- User:Nskaggs
- User:Vivian_Rook
- ...
Decision taken
Option 3
Rationale
Folks were interested in exploring both option 3 and option 4. Option 1 was removed during discussion. Ultimately option 4 was left as a future idea, and option 3 was selected.
Problem
Historically, WMCS has upgraded Openstack on a looser cadence, intending to following the stable -1 version. Without a tighter cadence, generally WMCS has lagged 1-2+ versions behind stable. At times this has caused issues with new feature adoption, for example with Trove and most recently Magnum requiring newer versions of Openstack before deployment.
Goals:
- Better manage upgrades. Openstack releases in April and October. We should also plan consistent times of the year to do upgrades in response.
- Run a generally newer Openstack version on average, while still seeking lag time for stability.
- Make it easier to patch or run newer versions of Openstack as needed in response to a bug or desired feature
Constraints and Risks
- A stable system is prioritized over features
- Doing nothing will mean k8s clusters run by Magnum will almost always be EOL during operation.
- This is due to the following. Kubernetes supports releases for 18 month. Openstack adopts a 9 month old release for stable. 6-9 months later, we upgrade to this version, thus 18 months have elapsed since the kubernetes upstream release, making it EOL.
- It doesn't seem possible to upgrade Magnum k8s version without upgrading Openstack. This means, our Openstack and kubernetes versions will be tied together.
- Note, currently our existing k8s version is EOL.
- Today, WMCS is dependent on debian to package Openstack. In the past, this has led to delays due to this packaging work, as well as not all point releases being packaaged.
- WMCS currently patches openstack, and will continue to do so
Proposals:
Option 1:
Do Nothing. Accept the status quo, including the adhoc upgrade cycle for openstack releases.
Pros
- No change required
- Maximum flexibility for planning work
- Ability to defer upgrades and run EOL without missing expectations
Cons
- No set expectations for ourselves or users
- Potential for cumbersome scenarios requiring multiple unplanned upgrades to occur to fix an issue or add a feature
- No goals met
Option 2:
Maintain n-1 target. Accept running EOL k8s. Schedule twice yearly upgrade months to set expectations.
Pros
- Same as option 1, with only minor change and flexibility loss
- Ensures predictability and maintenance of upgrades, rather than relying on adhoc efforts
Cons
- Potential for cumbersome scenarios requiring multiple unplanned upgrades to occur to fix an issue or add a feature. Even if delay is accepted on adding a feature (according to the upgrade schedule), critical or security issues
- Addresses only the first of the three stated goals
Option 3:
Create new n-0.5 target cadence. Upgrade to stable version 1-3 months after release.
Pros
- Ensures openstack versions are up to date and supported during the entire time of operation
- Ensures availability of a kubernetes version that is not EOL during the entire time of operation
- Ensures predictability and maintenance of upgrades, rather than relying on adhoc efforts
- No change to upgrade process required
Cons
- Patching burden isn't improved
- Maintains dependency on debian packaging
Option 4:
Create new n-0.5 target cadence. Upgrade to stable version 1-3 months after release. Utilize docker or similar for deployment.
Pros
- Everything under option 3
- Lower patching burden
- Improved flexibility to upgrade or respond to issues
- Meets all stated goals
- Better target for automation
Cons
- Requires changing how we deploy Openstack; this will require research and design