Wikimedia Cloud Services team/EnhancementProposals/Decision record T316866 Openstack Upgrade Cadence

From Wikitech

Origin task: phab:T316866

Date of the decision: 5 October 2022

People in the decision meeting (alphabetical order):

Decision taken

Option 3

Rationale

Folks were interested in exploring both option 3 and option 4. Option 1 was removed during discussion. Ultimately option 4 was left as a future idea, and option 3 was selected.

Problem

Historically, WMCS has upgraded Openstack on a looser cadence, intending to following the stable -1 version. Without a tighter cadence, generally WMCS has lagged 1-2+ versions behind stable. At times this has caused issues with new feature adoption, for example with Trove and most recently Magnum requiring newer versions of Openstack before deployment.

Goals:

  • Better manage upgrades. Openstack releases in April and October. We should also plan consistent times of the year to do upgrades in response.
  • Run a generally newer Openstack version on average, while still seeking lag time for stability.
  • Make it easier to patch or run newer versions of Openstack as needed in response to a bug or desired feature

Constraints and Risks

  • A stable system is prioritized over features
  • Doing nothing will mean k8s clusters run by Magnum will almost always be EOL during operation.
    • This is due to the following. Kubernetes supports releases for 18 month. Openstack adopts a 9 month old release for stable. 6-9 months later, we upgrade to this version, thus 18 months have elapsed since the kubernetes upstream release, making it EOL.
    • It doesn't seem possible to upgrade Magnum k8s version without upgrading Openstack. This means, our Openstack and kubernetes versions will be tied together.
    • Note, currently our existing k8s version is EOL.
  • Today, WMCS is dependent on debian to package Openstack. In the past, this has led to delays due to this packaging work, as well as not all point releases being packaaged.
  • WMCS currently patches openstack, and will continue to do so

Proposals:

Option 1:

Do Nothing. Accept the status quo, including the adhoc upgrade cycle for openstack releases.

Pros

  • No change required
  • Maximum flexibility for planning work
  • Ability to defer upgrades and run EOL without missing expectations

Cons

  • No set expectations for ourselves or users
  • Potential for cumbersome scenarios requiring multiple unplanned upgrades to occur to fix an issue or add a feature
  • No goals met

Option 2:

Maintain n-1 target. Accept running EOL k8s. Schedule twice yearly upgrade months to set expectations.

Pros

  • Same as option 1, with only minor change and flexibility loss
  • Ensures predictability and maintenance of upgrades, rather than relying on adhoc efforts

Cons

  • Potential for cumbersome scenarios requiring multiple unplanned upgrades to occur to fix an issue or add a feature. Even if delay is accepted on adding a feature (according to the upgrade schedule), critical or security issues
  • Addresses only the first of the three stated goals

Option 3:

Create new n-0.5 target cadence. Upgrade to stable version 1-3 months after release.

Pros

  • Ensures openstack versions are up to date and supported during the entire time of operation
  • Ensures availability of a kubernetes version that is not EOL during the entire time of operation
  • Ensures predictability and maintenance of upgrades, rather than relying on adhoc efforts
  • No change to upgrade process required

Cons

  • Patching burden isn't improved
  • Maintains dependency on debian packaging

Option 4:

Create new n-0.5 target cadence. Upgrade to stable version 1-3 months after release. Utilize docker or similar for deployment.

Pros

  • Everything under option 3
  • Lower patching burden
  • Improved flexibility to upgrade or respond to issues
  • Meets all stated goals
  • Better target for automation

Cons

  • Requires changing how we deploy Openstack; this will require research and design