Swift/How To

From Wikitech
Jump to navigation Jump to search

General Prep

Nearly all of these commands are best executed from a swift proxy and stats reporter host (e.g. ms-fe1005.eqiad.wmnet or ms-fe2005.codfw.wmnet) and require either the master password or an account password. Both the master password (super_admin_key) and the specific users' passwords we have at Wikimedia are accessible in the swift proxy config file /etc/swift/proxy-server.conf or in the private puppet repository.

All of these tasks are explained in more detail and with far more context in the official swift documentation. This page is intended as a greatly restricted version of that information directed specifically at tasks we'll need to carry out in the Wikimedia cluster. For this reason many options and caveats have been left out, and assume things like the authentication type used to restrict it to what's correct for our installation. It may or may not be useful for a wider audience.

Set up an entire swift cluster

This is documented elsewhere: Swift/Setup_New_Swift_Cluster

Individual Commands - interacting with Swift

Impersonating a specific swift account

Some swift proxy servers (e.g. ms-fe2005 / ms-fe1005 ) have an extra stats reporter role, which records account credentials in /etc/swift/account_*

You can do . /etc/swift/account_AUTH_dispersion.env to read the dispersion account's authentication URL (ST_AUTH), username (ST_USER) and key (ST_KEY) into your environment, which the swift binary will then use to authenticate your commands, similar to nova/designate/openstack used in labs administration.

Create a container

You create a container by POSTing to it. You modify a container by POSTing to an existing container. Only users with admin rights (aka users in the .admin group) are allowed to create or modify containers.

Run the following commands on any host with the swift binaries installed (any host in the swift cluster or iron)

  • create a container with default permissions (r/w by owner and nobody else)
    • swift post container-name
  • create a container with global read permissions
    • swift post -r '.r:*'
  • The WikimediaMaintenance extension's filebackend/setZoneAccess.php file creates most wiki-specific containers, and SwiftFilebackend gives it's own user read and write privileges along with global read for public containers.

List containers and contents

It's easiest to do all listing from a frontend host on the cluster you wish to list.

list of all containers

  • ask for a listing of the container: swift list

list the contents of one container

  • ask for a listing of the container: swift list wikipedia-commons-local-thumb.a2

list specific objects within a container

example: look for all thumbnails for the file Little_kitten_.jpg

  • start from a URL for a thumbnail (if you are at the original File: page, 'view image' on the existing thumbnail)
  • Pull out the project, "language", thumb, and shard to form the correct container and add -local into the middle
    • eg wikipedia-commons-local-thumb.a2
    • Note - only some containers are sharded: grep shard /etc/swift/proxy-server.conf to find out if your container should be sharded
    • unsharded containers leave off the shard eg wikipedia-commons-local-thumb
  • ask swift for a listing of the correct container with the --prefix option (it must come before the container name)
    • swift list --prefix a/a2/Little_kit wikipedia-commons-local-thumb.a2
    • note that --prefix is a substring anchored to the beginning of the shard; it doesn't have to be a complete name.

Show specific info about a container or object

Note - these instructions will only show containers or objects the account has permission to see.

  • log into a swift frontend host on the cluster you want to use, set ST_AUTH/ST_USER/ST_KEY
  • ask for statistics about all containers: swift stat
  • ask for statistics about the container: swift stat wikipedia-commons-local-thumb.a2
  • ask for statistics about an object in a container: swift stat wikipedia-commons-local-thumb.a2 a/a2/Little_kitten_.jpg/300px-Little_kitten_.jpg

Delete a container or object

Note - THIS IS DANGEROUS; it's easy to delete a container instead of an object by hitting return at the wrong time!

Deleting uses the same syntax as 'stat'. I recommend running stat on an object to get the command right then do cli substitution (^stat^delete^ in bash)

  • log into a swift frontend host on the cluster you want to use, set ST_AUTH/ST_USER/ST_KEY
  • run swift stat on the object you want to delete
    • swift stat wikipedia-de-local-thumb f/f5/Wasserhose_1884.jpg/800px-Wasserhose_1884.jpg
  • swap stat for delete in the same command.
    • ^stat^delete^

When you call delete for a container it will first delete all objects within the container and then delete the container itself.

Setup temp url key on an account

MediaWiki makes use of a temporary url key to download files, the key must be set on the mw:media account. On swift machines that report statistics you can find several .env files to "su" to each account, e.g.

 source /etc/swift/account_AUTH_mw.env
 swift post -m 'Temp-URL-Key:<your key>'

Individual Commands - Managing Swift

Show current swift ring layout

There are a minimum of three rings in Swift: account, object, and container. The swift-ring-builder command with a builder file will list the current state of the ring. Additional rings (e.g. object-1 might be present when using storage policies)

 # swift-ring-builder /etc/swift/account.builder
 # swift-ring-builder /etc/swift/container.builder
 # swift-ring-builder /etc/swift/object.builder

Rebalance the rings

You only have to rebalance the rings after you have made a change to them. If there are no changes pending, the attempt to rebalance will fail with the error message "Cowardly refusing to save rebalance as it did not change at least 1%."

To rebalance the rings you run the actual rebalance on a copy of the ring files then distribute the rings to the rest of the cluster (via puppet).

The canonical copy of the rings is kept in operations/software/swift-ring.git with instructions on how to make changes and send them for review. After a change has been reviewed and merged it can be deployed (i.e. pushed to the puppet master)

Add a proxy node to the cluster

  • Assign the swift proxy role in site.pp in puppet to make the new proxy match existing proxies in that cluster.
  • Add the host to swift::proxyhosts in puppet for ferm to know about the host
  • Add the host to swift::proxy::memcached_servers to let swift know it should consider that host for memcache
  • Run puppet on the host twice, reboot, and run puppet again
  • Test the host
  • Run puppet on the remaining frontend hosts to make sure firewall rules are updated
  • Roll restart swift on the remaining frontend hosts for config change to be applied
  • Pool the new proxy (full details)

Remove a failed proxy node from the cluster

  • Take the failed node out of the load balancer
  • If the host is being decomissioned, revert the "add" steps above

Add a storage node to the cluster

The host will come out of standard dc-ops provisioning with role::spare applied. After that:

  • Apply role::swift::storage instead of role::spare
  • Add the host to swift::storagehosts
  • Run puppet on the host to partition and format all disks, after puppet is done reboot the host and verify all filesystems are mounted.

The host is now ready to be put in service by adding its disks to the rings.

Add device(s) to the rings

You will need operations/software/swift-ring repository and swift utilities (see also #Rebalance_the_rings).

  • On your local host inside the swift-ring repo, cd into the cluster you are operating on (e.g. eqiad-prod)
  • Obtain the host ip address in production e.g. 10.192.32.32
  • Add the brand-new host to all rings, by setting an initial weight of 100/10 for objects and non-object rings respectively. This ensures the host receives only a small load initially.
    • object_weight=100 nonobject_weight=10 ../bin/swift-add-machine IP
  • Rebuild the rings: make -C ..
  • Send out the change for review (if applicable) and push the ring change to production (see the swift-ring repo README)

After the rings have been pushed and swift has converged (check "swift dispersion" on the swift dashboard) then you can increase the host's weight for object and non-object rings in steps (e.g. of 1000 for object rings) up to their desired values. The target values are the number of gigabytes in the filesystem, e.g. 92 == 92GB filesystem.

Remove a failed storage node from the cluster

  • From your swift-ring.git clone directory, cd into the desired swift cluster
  • Note the host's ip address
  • object_weight=0 nonobject_weight=0 ssd_weight=0 ../bin/swift-set-weight IP
  • The command above will set all weights for the host to 0, after rings are pushed then swift will gradually remove all disks from service. Check the swift dashboards for dispersion to be 100% (i.e. rebalance has completed)
  • rebalance the rings and distribute them.

Remove (fail out) a drive from a ring

If a drive fails and it will be replaced by dc-ops in the next two/three days then usually removing it from the rings is not necessary: once the disk is put back and formatted/mounted then swift will rebuild it.

However if the disk is going to be out of service for longer then it needs to be taken out of the rings to maintain the expected redundancy.

remove working devices for maintenance

To remove a device for maintenance, you set the weight on the device to 0, rebalance, wait a while (a day or two), then do your maintenance. The examples here assume you're removing all the devices on a node. Note that I'm only checking one of the three rings but taking action on all three. To be completely sure we should check all three rings but by policy we keep all three rings the same.

  • find the IDs for the devices you want to remove (in this example, I'm pulling out ms-be5)
 root@ms-fe1:/etc/swift# swift-ring-builder /etc/swift/account.builder search 10.0.6.204
 Devices:    id  zone      ip address  port      name weight partitions balance meta
            186     8      10.0.6.204  6002      sda4  95.00       1993  -12.24
            187     8      10.0.6.204  6002      sdb4  95.00       1993  -12.24
            188     8      10.0.6.204  6002      sdc1 100.00       2098  -12.23
            189     8      10.0.6.204  6002      sdd1 100.00       2097  -12.27
            190     8      10.0.6.204  6002      sde1 100.00       2097  -12.27
            191     8      10.0.6.204  6002      sdf1 100.00       2097  -12.27
            192     8      10.0.6.204  6002      sdg1 100.00       2097  -12.27
            193     8      10.0.6.204  6002      sdh1 100.00       2097  -12.27
            194     8      10.0.6.204  6002      sdi1 100.00       2097  -12.27
            195     8      10.0.6.204  6002      sdj1 100.00       2097  -12.27
            196     8      10.0.6.204  6002      sdk1 100.00       2097  -12.27
            197     8      10.0.6.204  6002      sdl1 100.00       2097  -12.27
  • set their weight to 0

cd [your swift-ring.git checkout]/[swift instance] (e.g. eqiad-prod)

 for id in {186..197}; do 
   for ring in account object container ; do 
     swift-ring-builder ${ring}.builder set_weight d${id} 0
   done
 done
Alternatively you can, for a given ring,
 swift-ring-builder ${ring}.builder set_weight 10.0.6.204 0
It will prompt you with a list of the devices that will be affected and give you a change to confirm or cancel.
  • check what you've done
 git diff -w

Replacing a disk without touching the rings

If the time span for replacement is short enough the failed disk can be left unmounted and swapped with a working one. After successful replacement the disk should be added back to the raid controller and the raid cache discarded (megacli case):

 megacli -GetPreservedCacheList -a0
 megacli -DiscardPreservedCache -L'disk_number' -a0
 megacli -CfgEachDskRaid0 WB RA Direct CachedBadBBU -a0

Cleanup fully used root filesystem

There is a race condition that might happen and leads to / filesystem being fully utilized: a faulty disk (as determined by looking at dmesg) is automatically unmounted by swift-drive-audit. When this happens rsync might want to keep writing the remaining files in /srv/swift-storage/PARTITION which is now unmounted and results in root filesystem filling up.

How to fix: (downtime host first in icinga to avoid swift daemons alerts)

 PARTITION=<e.g. sdc1>
 systemctl stop swift*
 systemctl stop rsync
 umount /srv/swift-storage/$PARTITION || true
 rm -rf /srv/swift-storage/${PARTITION}/*
 mount /srv/swift-storage/$PARTITION
 puppet agent --test # will restart swift/rsync