Swift/Setup New Swift Cluster

From Wikitech
The "Swift" project is Current as of 2012-04-01. Owner: Bhartshorne. See also RT:1384
This page may be outdated or contain incorrect details. Please update it if you can.


The steps necessary to set up owa1-3 and ms1-3 as a swift cluster (owa -> proxies, ms -> storage)

Setting up swift in labs is similar. Swift/Setup New Swift Cluster (labs) describes the differences from this document

update DNS

Create a name that will be balanced across all the proxy servers using either round robin DNS or an LVS server. For testing, I have created things like msfe-pmtpa-test that is a RRDNS entry pointing to the tampa proxies, owa1-3.

Set up filesystems

Puppet will take care of all disks that are only 1 partition used for data - you should pass it all non-OS disks. You must create partitions on the OS disk for swift storage. The following is what I ran on ms-be1 (where the bios is on sda1 and sdb1, the OS partition is raided across 120GB partitions on sda2 and sdb2, and sda3 and sdb3 are swap):

 # parted
 ) help
 ) print free
 ) mkpart swift-sda4 121GB 2000GB
 ) select /dev/sdb
 ) print free
 ) mkpart swift-sdb4 121GB 2000GB
 ) quit
 # mkfs -t xfs -i size=512 -L swift-sda4 /dev/sda4
 # mkfs -t xfs -i size=512 -L swift-sdb4 /dev/sdb4
 # mkdir /srv/swift-storage/sd{a,b}4
 # vi /etc/fstab # <-- add in a line for sda4 and sdb4 with the same xfs options as the rest
 # mount -a
 # chown -R swift:swift /srv/swift-storage/sd{a,b}4
 # chmod 750 /srv/swift-storage/sd{a,b}4

create the cluster hash

each cluster has a random string it uses to seed the hashes of what objects go where. Generate this string for use in the puppet configs

od -t x8 -N 8 -A n </dev/random

update puppet

Use ms-fe[12] and ms-be1-5 in puppet/manifests/site.pp as an example. You will have to create a class for your cluster following the examples in puppet/manifests/role/swift.pp

  • make sure to define the list of drives for the storage nodes
  • set up a class for your cluster's base, proxy, and storage configs in site.pp (model after pmtpa-test)
  • make sure to set all variables for the proxy config, even if you don't have the real values yet
class { "swift::base": hash_path_suffix => "1234deadbeef5678" }         <---- the cluster hash you just made
class proxy inherits from swift-cluster::your-cluster {
        bind_port => "80",                                              <---- the port on which swift will listen
        num_workers => "8",                                             <---- should be double the number of cores
        proxy_address => "http://msfe-pmtpa-test.wikimedia.org",        <---- the DNS entry you made
        super_admin_key => "some-secret-key",                           <---- choose a strong password here
        memcached_servers => [ "owa1.wikimedia.org:11211", "owa2.wikimedia.org:11211", "owa3.wikimedia.org:11211" ] <-- all proxy servers
        rewrite_account => "placeholder",                               <---- you will change this to its real value later
        rewrite_url => "http://127.0.0.1/auth/v1.0",                    <---- this should actually be localhost
        rewrite_user => "place:holder",                                 <---- you will change this later
        rewrite_password => "placeholder",                              <---- you will change this later
        rewrite_thumb_server => "ms5.pmtpa.wmnet",                      <---- where swift goes to get thumbnails
        shard_containers => "some",                                     <---- whether to shard any of the containers (all, some, none)
        shard_container_list => "wikipedia-commons-local-thumb"         <---- comma separated list of containers to shard (or empty if none)
}
  • on all puppetmasters, create placeholder files for the rings in /var/lib/puppet/volatile/
cd /var/lib/puppet/volatile
mkdir swift/clustername
touch swift/clustername/{account,container,object}.{builder,ring.gz}
  • load the new puppet configs onto each server
for host in owa{1..3} ms{1..3}
do
  ssh $host puppetd --test
  sleep 30 && ssh $host puppetd --test & #run puppet twice just for good measure.
done

build the rings

On any proxy server:

  • create the ring files
    • 16: indicates 2^16 partitions total. Partition count should be round-up(max num drives ever * 100)
      • eg 50 servers * 12 drives each * 100 = 60,000. 2^16 = 65536, 16 would be the partition number.
    • 3: replica count - the number of copies of each piece of data to store
    • 3: min-part-hours - the minimum time before a partition can be moved again
cd /etc/swift
swift-ring-builder account.builder create 16 3 3
swift-ring-builder container.builder create 16 3 3
swift-ring-builder object.builder create 16 3 3
  • add the storage nodes
    • this setup is one zone per server
    • assuming all storage devices are the same size, they all want weight 100
    • the format for the command is swift-ring-builder <ring-file> <command> z<zone-number>-<hostname>:<port>/<device-name> <weight>
      • eg: swift-ring-builder account.builder add z2-ms2.pmtpa.wmnet:6002/sde1 100
      • device name is the basename of the path to the mountpoint; eg /srv/swift-storage/abcd -> abcd.
    • rebalance the rings once they're created (this can take a while)
cd /etc/swift
for num in 1 2 3
do
  host="ms${num}.pmtpa.wmnet"
  hostip=$(dig +short $host)
  zone="${num}-${hostip}"
  weight=100

  for dev in $(ssh $host ls /srv/swift-storage/)
  do
    swift-ring-builder account.builder add z${zone}:6002/${dev} $weight
    swift-ring-builder container.builder add z${zone}:6001/${dev} $weight
    swift-ring-builder object.builder add z${zone}:6000/${dev} $weight
  done
done
swift-ring-builder account.builder rebalance
swift-ring-builder container.builder rebalance
swift-ring-builder object.builder rebalance
chown swift:swift *.ring.gz 

distribute the rings

copy the three .builder and the three ring.gz files into puppet, who will distribute them to all nodes in the cluster. They live in the volatile section of puppet (for big binary files) on all puppetmasters. Within that/swift/ they reside in a directory named for the location and role of the cluster (eg eqiad-test, pmtpa-prod, etc.)

 cd /etc/swift
 scp {account,container,object}.{builder,ring.gz} puppetmaster__node__name:/var/lib/puppet/volatile/swift/eqiad-test/

Check them in and do all the normal puppet stuff.

reboot

always good practice to reboot into a new server as proof that it functions correctly on system start.

set up auth tokens for the cluster

  • Initialize swauth using the super_admin_key from the config above
  • add the user for thumbnails
    • generate a password: pass=$(pwgen -s 12 1)
    • add the user: swauth-add-user -A http://127.0.0.1/auth/ -K thisshouldbesecret -a mw thumb testing
      • account: mw, user: thumb, password: testing (you should use your generated password instead)
  • test it and retrieve the account id:
    • swauth-list -A http://127.0.0.1/auth/ -K thisshouldbesecret mw
      • you're looking for: "account_id": "AUTH_205b4c23-6716-4a3b-91b2-5da36ce1d120"

tell puppet about the auth tokens

Update the puppet config with the authentication tokens you just made

    rewrite_account => "AUTH_205b4c23-6716-4a3b-91b2-5da36ce1d120",
    rewrite_user => "mw:thumb",
    rewrite_password => "testing",

update the proxies and restart the proxy service

  • run puppet on all proxy servers
  • reload the proxy config on all proxy servers
    • swift-init proxy reload

create dispersion objects and containers

  • run swift-dispersion-populate on a proxy node once to populate the initial list of containers and objects for dispersion detection

make the containers necessary for thumbnails

until BZ:33206 is resolved, we have to make all thumbnail containers by hand.

ssh ms5
cd /export/thumbs
for i in */*/*; do echo $i ; done | grep "thumb$" |tr \/ - > /tmp/container-list
scp /tmp/container-list my-proxy-server:/tmp/

ssh my-proxy-server
for cont in $(cat /tmp/container-list); do 
  swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post $cont; 
  swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post ${cont/thumb/temp}; 
  swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post ${cont/thumb/public}; 
done

make the containers readable by anonymous users

The swift rewrite middleware doesn't authenticate for requests to public buckets.

grep -v private /tmp/container-list > /tmp/public-container-list
for cont in $(cat /tmp/public-container-list)
do     
  swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post -r '.r:*' ${cont}
  swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post -r '.r:*' ${cont/thumb/temp}
  swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post -r '.r:*' ${cont/thumb/public}
done

test the cluster

todo