Swift/Hackathon Installation Notes

From Wikitech
The "Swift" project is Current as of 2011-12-31. Owner: Bhartshorne. See also RT:1384

This page describes specifics on how we set up the servers for swift at the NOLA hackathon.

hardware

We have 3 misc servers: magnesium, copper, and zinc, each has 2 drives.

  • 50GB on both drives RAID 1 for the OS
  • 450GB with no RAID for the storage bricks (swift likes direct access to the disks and the docs discourage RAID)

copper and zinc are configured as proxy nodes. all three are configured as storage nodes.

swift install

following instructions on swift's website: http://swift.openstack.org/howto_installmultinode.html

packages

after adding the recommended debian repo, aptitude install swift brought in the following packages from a non-wmf repository:

Get:10 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main python-greenlet 0.3.1-1ubuntu1~lucid0 [15.1kB]
Get:11 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main python-eventlet 0.9.13-0ubuntu1~lucid0 [115kB]
Get:12 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main python-webob 1.0.8-0~ppalucid2 [283kB]
Get:13 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main python-swift 1.4.3-0ubuntu1~lucid1~ppa1 [263kB]
Get:14 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift 1.4.3-0ubuntu1~lucid1~ppa1 [39.2kB]

The rest of the packages needed are:

Get:1 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift-proxy 1.4.3-0ubuntu1~lucid1~ppa1 [8,568B]
Get:1 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift-account 1.4.3-0ubuntu1~lucid1~ppa1 [8,172B]
Get:2 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift-container 1.4.3-0ubuntu1~lucid1~ppa1 [8,000B]
Get:3 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift-object 1.4.3-0ubuntu1~lucid1~ppa1 [9,534B]

I downloaded binary swauth packages from github and installed them in our repo. Only the python-swauth package is necessary; the swauth-doc is only documentation. There are .debs in ubuntu's precise repo, so eventaully these will no longer need to be manually tracked.

  python-swauth 1.0.2-1
  swauth-doc 1.0.2-1

puppet

  • created manifests/swift.pp
  • created files/swift
  • created private:files/swift

proxy nodes

TODO:

  • memcached should only be available to other swift proxy servers - the port should be firewalled off to enforce that.
  • move proxy-server.conf from files to template to templatize account, login, and password
  • puppetize netfilter settings

high capacity conntracking

To let the firewall do stateful connection tracking the kernel maintains a table of all open connections. If this table size is too low, the server starts refusing connections well before exhausting other resources. The default should be raised to the point where the server can be saturated. more detail on high performance linux networking.

  • /sbin/sysctl -w net.netfilter.nf_conntrack_max=262144
  • echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

memcached

Memcached will run on all the proxy servers. /etc/swift/swift-proxy.conf lists all the memcached servers, so must be updated (eventually) when adding or removing proxy servers. Everything is supposed to continue working with a missing memcached shard, so the list does not need to be updated in an outage situation. (This must be validated through testing)

created a class to differentiate configs for different clusters with hardcoded values like the list of memcached servers for the proxy servers. lame, but I'd rather get it running then improve than block on that.

rings

initial setup of the rings need to be done by hand. This is something that is more of a cluster operation rather than a server operation. Once built, the rings are modified when additional servers are added. See section 4 of the proxy server config for http://swift.openstack.org/howto_installmultinode.html

ran:

 swift-ring-builder account.builder create 18 3 1
 swift-ring-builder container.builder create 18 3 1
 swift-ring-builder object.builder create 18 3 1

then ran:

cat ~/build-swift-rings
#!/bin/bash

for zone in 1-208.80.154.136  2-208.80.154.5  3-208.80.154.146                                                                             
do                                                                                                                                         
        for dev in sda3 sdb3                                                                                                               
        do                                                                                                                                 
                weight=100                                                                                                                 
                                                                                                                                           
                swift-ring-builder account.builder add z${zone}:6002/$dev $weight                                                          
                swift-ring-builder container.builder add z${zone}:6001/$dev $weight                                                        
                swift-ring-builder object.builder add z${zone}:6000/$dev $weight                                                           
        done                                                                                                                               
done                                                                                                                                       

and got this output:

root@copper:/etc/swift# bash ~/build-swift-rings 
Device z1-208.80.154.136:6002/sda3_"" with 100.0 weight got id 0
Device z1-208.80.154.136:6001/sda3_"" with 100.0 weight got id 0
Device z1-208.80.154.136:6000/sda3_"" with 100.0 weight got id 0
Device z1-208.80.154.136:6002/sdb3_"" with 100.0 weight got id 1
Device z1-208.80.154.136:6001/sdb3_"" with 100.0 weight got id 1
Device z1-208.80.154.136:6000/sdb3_"" with 100.0 weight got id 1
Device z2-208.80.154.5:6002/sda3_"" with 100.0 weight got id 2
Device z2-208.80.154.5:6001/sda3_"" with 100.0 weight got id 2
Device z2-208.80.154.5:6000/sda3_"" with 100.0 weight got id 2
Device z2-208.80.154.5:6002/sdb3_"" with 100.0 weight got id 3
Device z2-208.80.154.5:6001/sdb3_"" with 100.0 weight got id 3
Device z2-208.80.154.5:6000/sdb3_"" with 100.0 weight got id 3
Device z3-208.80.154.146:6002/sda3_"" with 100.0 weight got id 4
Device z3-208.80.154.146:6001/sda3_"" with 100.0 weight got id 4
Device z3-208.80.154.146:6000/sda3_"" with 100.0 weight got id 4
Device z3-208.80.154.146:6002/sdb3_"" with 100.0 weight got id 5
Device z3-208.80.154.146:6001/sdb3_"" with 100.0 weight got id 5
Device z3-208.80.154.146:6000/sdb3_"" with 100.0 weight got id 5

After building them, the rings look like this (note - only showing account, but did this for all three):

root@copper:/etc/swift# swift-ring-builder account.builder
account.builder, build version 6
262144 partitions, 3 replicas, 3 zones, 6 devices, 100.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  zone      ip address  port      name weight partitions balance meta
             0     1  208.80.154.136  6002      sda3 100.00          0 -100.00
             1     1  208.80.154.136  6002      sdb3 100.00          0 -100.00
             2     2    208.80.154.5  6002      sda3 100.00          0 -100.00
             3     2    208.80.154.5  6002      sdb3 100.00          0 -100.00
             4     3  208.80.154.146  6002      sda3 100.00          0 -100.00
             5     3  208.80.154.146  6002      sdb3 100.00          0 -100.00

The next step is rebalancing, after which they look like this (note - only showing account, but did this for all three):

root@copper:/etc/swift# swift-ring-builder account.builder rebalance
Reassigned 262144 (100.00%) partitions. Balance is now 0.00.
root@copper:/etc/swift# swift-ring-builder account.builder
account.builder, build version 6
262144 partitions, 3 replicas, 3 zones, 6 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  zone      ip address  port      name weight partitions balance meta
             0     1  208.80.154.136  6002      sda3 100.00     131072    0.00 
             1     1  208.80.154.136  6002      sdb3 100.00     131072    0.00 
             2     2    208.80.154.5  6002      sda3 100.00     131072    0.00 
             3     2    208.80.154.5  6002      sdb3 100.00     131072    0.00 
             4     3  208.80.154.146  6002      sda3 100.00     131072    0.00 
             5     3  208.80.154.146  6002      sdb3 100.00     131072    0.00 

finally, copying the ring data files to the other proxy/proxies:

root@copper:/etc/swift# scp *.ring.gz zinc:/etc/swift/
account.ring.gz                                100%  308KB 308.0KB/s   00:00    
container.ring.gz                              100%  308KB 308.0KB/s   00:00    
object.ring.gz                                 100%  308KB 307.7KB/s   00:00    

chown /etc/swift/* to swift:swift on both zinc and copper, then run 'swift-init proxy start'. ps shows it running and netstat shows it listening no port 8080.

swauth

TODO:

  • change the swauth master password


  • got the swauth package from github: https://github.com/gholt/swauth
  • added an swauth stanza to the proxy-server.conf file and changed the pipeline to use it:
[filter:swauth]
use = egg:swauth#swauth
default_swift_cluster = local#http://127.0.0.1:8080/v1
set log_name = swauth
super_admin_key = mymadeupkey
  • restarted the proxy server
  • prepped swauth with swauth-prep -K mymadeupkey
  • created a test user with swauth-add-user -A http://127.0.0.1:8080/auth/ -K mymadeupkey -a test tester testing
  • got the test users credentials with:
root@copper:/etc/swift# swauth-list -K mymadeupkey
{"accounts": [{"name": "test"}]}
root@copper:/etc/swift# swauth-list -K mymadeupkey test
{"services": {"storage": {"default": "local", "local": "http://127.0.0.1:8080/v1/AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881"}}, "account_id": "AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881", "users": [{"name": "tester"}]}
root@copper:/etc/swift# swauth-list -K mymadeupkey test tester
{"groups": [{"name": "test:tester"}, {"name": "test"}, {"name": ".admin"}], "auth": "plaintext:testing"}
    • note that the password is stored in plaintext here

storage nodes

TODO:

  • change mountpoints from /mnt/sd?3 to /srv/swift-storage/?
  • get xfs options (noatime,nodiratime,nobarrier,logbufs=8) into puppet
  • firewall off storage nodes so only the proxy nodes can speak to them via swift protocols
  • investigate running rsync at different nice / ionice levels
  • make sure the firewall covers both ipv4 and ipv6
  • get puppet to start swift services

filesystems

each host has /mnt/sda3 and /mnt/sdb3 (representing raw disk partitions /dev/sda3 and /dev/sdb3). these are already formatted xfs. * remount them with appropriate xfs options: for dev in sda3 sdb3 ; do mount /mnt/${dev} -o remount,noatime,nodiratime,nobarrier,logbufs=8; done

moved mountpoints to /srv/swift-storage/* because rsyncd wants to use the parent directory of the mountpoints as where it bases its access. Leaving the swift storage in /mnt would mean that anyhtincg else we might mount in /mnt/ would automatically be available to the rsync daemon. Moving the swift mountpoints puts them in a more standard location. make all the swift storage writable:chown -R swift:swift /srv/swift-storage/

rsyncd

used the rsyncd.conf from the installation manual with the exception that I removed the address line and changed the mountpoint. rsyncd will bind to the default address, which is fine, and makes it so we don't need to puppet-template the config and insert the hosts' address (we can just use a file instead of a template).

used the rsync file from /etc/defaults with the one change the installation manual suggested.

swift servers

used the templates from the install guide for /etc/swift/{account,container,object}-server.conf but added:

  • bind_ip = 0.0.0.0
  • devices = /srv/swift-storage/

to override the default location for storage nodes of /srv/node/


testing

using tempauth

Note that there are two different sets of headers that you can pass in to swift for authentication:

  • X-Storage-User and X-Storage-Pass
  • X-Auth-User and X-Auth-Key (often referred to as user and api-key)

They are the same; when given the option we should use the X-Auth-User/Key instead of X-Storage.

curl to test stuff (nice verbose output):

  184  curl -k -v -H 'X-Storage-User: xxx:yyy' -H 'X-Storage-Pass: zzz' https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0
  186  AUTH='AUTH_abcdef0123456789'
  187  curl -k -v -H "X-Auth-Token: $AUTH" https://copper.wikimedia.org:8080/v1/AUTH_system

various commands to do stuff:

  170  PROXY_LOCAL_NET_IP='copper.wikimedia.org'
  173  swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz stat
  177  swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz upload myfiles build-swift-rings
  178  find /srv/swift-storage/
  179  swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz upload builders /etc/swift/*.builder
  180  swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz list
  181  swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz list builders

the output of one

root@copper:~# swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz list
builders
myfiles

using swauth

testing the auth system

note that the password 'mymadeupkey' is in proxy-server.conf. The &&echo thing is just to force a newline after the HTTP output. watch /var/log/messages for output from swauth. the user 'super_admin' is built in as the master root admin style account. Be careful when to use 127.0.0.1 and when to use copper in setting this up; it's sticky. swauth-set-account-service -U .super_admin -K mymadeupkey test storage local http://copper.wikimedia.org:8080/v1/AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881 after the fact can fix it.

  • create an accoutn/user/password combo - account: test, user: tester, password: testing.
  • list accounts
  • examine that account

testing the object store

using the swift commands

using curl

testing upload.wikimedia.org style URLs

The containers into which the images will be cached must already exist. For testing, create a few by hand; for real use we'll have to script it. NOTE: containers for public wikis must allow unauthenticated read (.r:*) in the ACL.

Performance Testing

pmtpa test cluster

hardware setup:

  • owa1-3 for proxy nodes
  • ms1-3 for storage nodes

methods:

  • geturls.py -t 30 filelists/wikipedia-filelist-urls.txt
  • ab -n 10000 -c 50 -L filelists/wp19k.txt

Initial performance findings

  • with ~1m objects, 50-60qps write throughput, 1100qps read throughput
  • with ~6m objects, 50-60qps write throughput, 1100qps read throughput

Moved container storage onto ramdisks on owa1-3

  • with ~11m objects, 50-60qps write throughput, 750qps read throughput

Moved container storage back onto ms1-3

  • with ~11m objects, ~45qps write throughput, 500-700qps read throughput

debugging

To enable more logging in the swift middleware we've written, modify /usr/local/lib/python2.6/dist-packages/wmf/rewrite.py and add:

  • at the head of the file, add:
    • from swift.common.utils import get_logger
  • to the class WMFRewrite in __init__, add
    • self.logger = get_logger(conf)
  • where you want to add logging (eg in WMFRewrite:__call__):
    • self.logger.warn( "Env: %s" % env)

tail /var/log/messages and look for messages from the proxy-server process

if you get a 401 response, check the ACL on the container - make sure Read ACL has '.r:*':

root@copper:/usr/share/pyshared/swift/common# swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing stat wikipedia-commons-thumb
  Account: AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881
Container: wikipedia-commons-thumb
  Objects: 2
    Bytes: 4
 Read ACL: .r:*
Write ACL: 
  Sync To: 
 Sync Key: 
Accept-Ranges: bytes