Swift/Icehouse

From Wikitech

There are currently (July 2014) two swift clusters running (esams and eqiad). esams is used to sync files (manually) from eqiad every now and then (though it is currently pending expansion due to lack of disk space) whereas eqiad is in production to serve originals and thumbnails.

upgrade to icehouse

As part of a overhaul of swift it is necessary to upgrade to the latest upstream version (v1.13.1, codename icehouse). The new version will let us, among other things, setup proper geocluster replication between eqiad and codfw (see the full changelog).

The upgrade has been tested on a minimal cluster in labs and subsequently in esams without any adverse effect. As per the recommended upgrade procedure, a backend has been upgraded and let run for some time, followed by all the remaining backends, followed by a frontend, followed by the remaining frontends.

proposed timeline for eqiad

eqiad will follow a similar upgrade procedure to what has been used in esams, with the more conservative timeline outlined below:

  • 2014-07-02T08:00Z (Wed): upgrade ms-be1001
  • 2014-07-07T08:00Z (Mon): upgrade ms-be1002/1008 (zone1) + ms-be1003/1004/1012 (zone2)
  • 2014-07-09T08:00Z (Wed): upgrade ms-be1005/1006/1007 (zone3) + ms-be1009/1010/1011 (zone4) + ms-be1013/1014/1015 (zone5)
  • 2014-07-14T08:00Z (Mon): upgrade ms-fe1001
  • 2014-07-16T08:00Z (Wed): upgrade ms-fe1002/1003/1004

upgrade procedure and rollback

the upgrade itself is straightforward enough because openstack makes debian packages available for precise, which we pin at a particular priority to make them installable along with all the dependencies. thus:

   apt-get update
   apt-get -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install swift
   swift-init all restart

rollback

rollback is similarly straightforward, revert the pinning and execute the same commands to get the older version installed


post-upgrade

As of 2014-07-18 the upgrade has been completed with no issues reported so far. However the backend bandwidth (ms-be* RX/TX) has seen a steady increase after the frontend have been upgraded to icehouse.