There are currently (July 2014) two swift clusters running (esams and eqiad). esams is used to sync files (manually) from eqiad every now and then (though it is currently pending expansion due to lack of disk space) whereas eqiad is in production to serve originals and thumbnails.
upgrade to icehouse
As part of a overhaul of swift it is necessary to upgrade to the latest upstream version (v1.13.1, codename icehouse). The new version will let us, among other things, setup proper geocluster replication between eqiad and codfw (see the full changelog).
The upgrade has been tested on a minimal cluster in labs and subsequently in esams without any adverse effect. As per the recommended upgrade procedure, a backend has been upgraded and let run for some time, followed by all the remaining backends, followed by a frontend, followed by the remaining frontends.
proposed timeline for eqiad
eqiad will follow a similar upgrade procedure to what has been used in esams, with the more conservative timeline outlined below:
- 2014-07-02T08:00Z (Wed): upgrade ms-be1001
- 2014-07-07T08:00Z (Mon): upgrade ms-be1002/1008 (zone1) + ms-be1003/1004/1012 (zone2)
- 2014-07-09T08:00Z (Wed): upgrade ms-be1005/1006/1007 (zone3) + ms-be1009/1010/1011 (zone4) + ms-be1013/1014/1015 (zone5)
- 2014-07-14T08:00Z (Mon): upgrade ms-fe1001
- 2014-07-16T08:00Z (Wed): upgrade ms-fe1002/1003/1004
upgrade procedure and rollback
the upgrade itself is straightforward enough because openstack makes debian packages available for precise, which we pin at a particular priority to make them installable along with all the dependencies. thus:
- get the proper pinning in place in puppet (something similar to https://gerrit.wikimedia.org/r/#/c/141677/)
- on each machine that is going to be upgraded:
apt-get -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install swift
swift-init all restart
rollback is similarly straightforward, revert the pinning and execute the same commands to get the older version installed
As of 2014-07-18 the upgrade has been completed with no issues reported so far. However the backend bandwidth (ms-be* RX/TX) has seen a steady increase after the frontend have been upgraded to icehouse.