User:Bstorm/plans/labstore-upgrades

This is not actually how this all went down. This was brainstorming.

Primary cluster

The primary cluster is currently on 1GB Ethernet and doesn't seem to have a definition in install_server.

We may want to add ipv6, and we probably should put it on 10G Ethernet. To that end, labstore1004 is already in a 10G rack, but labstore1005 not and would need to move. ~~I doubt they have 10G Ethernet cards installed, though.~~ They have 10G cards installed, so this should be done, which implies we should re-image the servers.

There are three hardware RAID volumes on the devices. One that's around a TB for root and swap. The other two are data volumes for NFS/DRBD. I think we should leave the name-change to cloudstore out of this round. That would add a variable that would complicate the change for clients.

That seems like it might be relatively simple to re-image safely. So we should:

Set up a partman recipe to reimage from
Stop backups
Stop puppet on labstore1004
Re-image to labstore1005 stretch, merging a config enabling IPv6 for firewall and DNS
Fail over to labstore1005
Re-image labstore1004 and enable puppet
Fail back to labstore1004
Re-enable backups once checks are complete

Dumps cluster

I confirmed that the nfsd-ldap custom package doesn't apply to this cluster because this cluster has all_squash set. That does allow us to upgrade to buster instead of stretch for this case. If upgrading in place, we'd best upgrade from Jessie to Stretch and then from Stretch to Buster.

If puppet was already installed, but it cannot finish the postinst script because it is trying to run its own postinst script as part of a puppet run which is part of it's postinst script, etc. run "rm /var/lib/dpkg/info/puppet.postinst" to clear that up at least

Fail over all services to labstore1006
downtime labstore1007
disable puppet on labstore1007
apt-get upgrade and dist-upgrade...check that all is ok
switch the main sources.list to stretch
upgrade/dist-upgrade
enable puppet and run until things work right
fix the broken
This will include doing sudo rm /opt/puppetlabs/facter/cache/cached_facts/operating\ system on facter 3.11
validate the config, reboot, etc.
upgrade again to buster
Fail all services over to labstore1007
Upgrade labstore1006 (wash, rinse, repeat)
Return to the usual combined services spread