Swift/How To

From Wikitech
Jump to navigation Jump to search

General Prep

Nearly all of these commands are best executed from a swift proxy and stats reporter host (e.g. ms-fe1005.eqiad.wmnet or ms-fe2005.codfw.wmnet) and require either the master password or an account password. Both the master password (super_admin_key) and the specific users' passwords we have at Wikimedia are accessible in the swift proxy config file /etc/swift/proxy-server.conf or in the private puppet repository.

All of these tasks are explained in more detail and with far more context in the official swift documentation. This page is intended as a greatly restricted version of that information directed specifically at tasks we'll need to carry out in the Wikimedia cluster. For this reason many options and caveats have been left out, and assume things like the authentication type used to restrict it to what's correct for our installation. It may or may not be useful for a wider audience.

Set up an entire swift cluster

This is documented elsewhere: Swift/Setup_New_Swift_Cluster

Individual Commands - interacting with Swift

Impersonating a specific swift account

Some swift proxy servers (e.g. ms-fe2009 / ms-fe1009) have an extra stats reporter profile (git grep stats_reporter_host hieradata/ in puppet to find out), which records account credentials in /etc/swift/account_*

You can do . /etc/swift/account_AUTH_dispersion.env to read the dispersion account's authentication URL (ST_AUTH), username (ST_USER) and key (ST_KEY) into your environment, which the swift binary will then use to authenticate your commands, similar to nova/designate/openstack used in labs administration.

If you need access to mediawiki files use . /etc/swift/account_AUTH_mw.env as the credentials file. BE CAREFUL WITH THIS as you will have the same access that mediawiki has.

The same environment variables can be used to access swift via its command line client from anywhere in the cluster, the variables are:

 ST_AUTH=https://<swift url>/auth/v1.0
 ST_USER=<account>:<user>
 ST_KEY=<password>

Create a new swift account (Thanos Cluster)

  1. Add the new user in puppet hieradata/common/profile/thanos/swift.yaml
  2. Add fake credentials in labs/private hieradata/common/profile/thanos/swift.yaml
  3. Add real credentials in hieradata/common/profile/thanos/swift.yaml in private puppet
  4. Rolling restart swift-proxy.service in thanos-fe* servers

Note: Check with T279621, since in the future new accounts will be created on the M.O.S.S cluster

Create a container

You create a container by POSTing to it. You modify a container by POSTing to an existing container. Only users with admin rights (aka users in the .admin group) are allowed to create or modify containers.

Run the following commands on any host with the swift binaries installed (any host in the swift cluster or iron)

  • create a container with default permissions (r/w by owner and nobody else)
    • swift post container-name
  • create a container with global read permissions
    • swift post -r '.r:*' container-name
  • create a container with a specific storage policy
    • swift post -H 'X-Storage-Policy: POLICY' container-name
  • The WikimediaMaintenance extension's filebackend/setZoneAccess.php file creates most wiki-specific containers, and SwiftFilebackend gives it's own user read and write privileges along with global read for public containers.

List containers and contents

It's easiest to do all listing from a frontend host on the cluster you wish to list.

list of all containers

  • ask for a listing of the container: swift list

list the contents of one container

  • ask for a listing of the container: swift list wikipedia-commons-local-thumb.a2

list specific objects within a container

example: look for all thumbnails for the file 1-month-old_kittens_32.jpg

  • start from a URL for a thumbnail (if you are at the original File: page, 'view image' on the existing thumbnail)
  • Pull out the project, "language", thumb, and shard to form the correct container and add -local into the middle
    • eg wikipedia-commons-local-thumb.1d
    • Note - only some containers are sharded: grep shard /etc/swift/proxy-server.conf to find out if your container should be sharded
    • unsharded containers leave off the shard eg wikipedia-commons-local-thumb
  • ask swift for a listing of the correct container with the --prefix option (it must come before the container name)
    • swift list --prefix 1/1d/1-month wikipedia-commons-local-thumb.1d
    • note that --prefix is a substring anchored to the beginning of the shard; it doesn't have to be a complete name.

Find an object in swift

Sometimes it's useful to be able to work out which container in swift an object will be in; where containers are sharded (e.g. Commons) the container an object will be in is based on the md5sum of the object name, un-encoded and with spaces converted to underscores - e.g. printf "Café_de_Flore.jpg" | md5sum returns fddbfcc9dfba48ced649752953030d5a, from which we know the container of this commons image will be wikipedia-commons-local-public.fd and the object name in that container will be f/fd/Café_de_Flore.jpg

Find a deleted object in swift

Deleted objects are stored in separate containers, with objects named for the sha of their contents. So to work out where a deleted object is stored, you have to first query the filearchive table for the fa_storage_key field (where fa_storage_key is the un-encoded object name with spaces converted to underscores). For example, given the fa_storage_key of r0vsvm23suew6epzh4pm39iuy7dzyh4.png, the object would be called r/0/v/r0vsvm23suew6epzh4pm39iuy7dzyh4.png in the wikipedia-commons-local-deleted.r0 container.

For a worked example of using this in practice, see phab:T350020 and the linked code snippets.

Find an object in the backups database

Sometimes it can be useful to see what the backups system knows about an object; for that, see Media_storage/Backups#Querying_files

Show specific info about a container or object

Note - these instructions will only show containers or objects the account has permission to see.

  • log into a swift frontend host on the cluster you want to use, set ST_AUTH/ST_USER/ST_KEY
  • ask for statistics about all containers: swift stat
  • ask for statistics about the container: swift stat wikipedia-commons-local-thumb.a2
  • ask for statistics about an object in a container: swift stat wikipedia-commons-local-thumb.a2 a/a2/Little_kitten_.jpg/300px-Little_kitten_.jpg

Delete a container or object

Note - THIS IS DANGEROUS; it's easy to delete a container instead of an object by hitting return at the wrong time!

Deleting uses the same syntax as 'stat'. I recommend running stat on an object to get the command right then do cli substitution (^stat^delete^ in bash)

  • log into a swift frontend host on the cluster you want to use, set ST_AUTH/ST_USER/ST_KEY
  • run swift stat on the object you want to delete
    • swift stat wikipedia-de-local-thumb f/f5/Wasserhose_1884.jpg/800px-Wasserhose_1884.jpg
  • swap stat for delete in the same command.
    • ^stat^delete^

When you call delete for a container it will first delete all objects within the container and then delete the container itself. Containers with lots of objects can take a very long time to delete (days/weeks) .

Fine-grained object deletions with Swiftly

If you need to more control over what is deleted in a container (such as only deleting objects in the /foo path) you can use the Swiftly client , specifically its fordo feature.

Example read-only command with Swiftly fordo

swiftly for -p commons/checkpoints/1475a2038f088807f9d695aea3e1c7e3/ rdf-streaming-updater-codfw do head "<item>"

Note that "<item>" is literal.

Example deletion command with Swiftly fordo

swiftly for -p commons/checkpoints/1475a2038f088807f9d695aea3e1c7e3/ rdf-streaming-updater-codfw do delete "<item>"

Example .swiftly.conf

[swiftly]

auth_user = wdqs:flink

auth_key = [REDACTED]

auth_url = https://thanos-swift.discovery.wmnet/auth/v1.0


More details in this phab ticket.

Setup temp url key on an account

MediaWiki makes use of a temporary url key to download files, the key must be set on the mw:media account. On swift machines that report statistics you can find several .env files to "su" to each account, e.g.

 source /etc/swift/account_AUTH_mw.env
 swift post -m 'Temp-URL-Key:<your key>'

Checking / Fixing container ACLs for private wikis

The setup process for private wikis does not get permissions correct (see phab:T340189 for details). Make sure each container for the wiki has both read and write ACLs set to mw:thumbor-private,mw:media. To do this, gain access to the mw account as described above, and then you can check the existing ACLs:

 for i in $(swift list | grep wikipedia-FOO) #replace FOO with wiki name
   do echo "$i:" 
   swift stat "$i" | grep "ACL"
 done

And update them:

 for c in wikipedia-FOO-local-public wikipedia-FOO-local-thumb wikipedia-FOO-local-transcoded wikipedia-FOO-timeline-render
   do swift post "$c" --read-acl 'mw:thumbor-private,mw:media' --write-acl 'mw:thumbor-private,mw:media'
 done

Because both swift clusters are independent, you must make this change in both DCs.

Individual Commands - Managing Swift

Show current swift ring layout

There are a minimum of three rings in Swift: account, object, and container. In practice, our production (MS and Thanos) clusters all have the object-1 ring too. The swift-ring-builder command with a builder file will list the current state of the ring. Additional rings (e.g. object-1 might be present when using storage policies)

 # swift-ring-builder /etc/swift/account.builder
 # swift-ring-builder /etc/swift/container.builder
 # swift-ring-builder /etc/swift/object.builder

Managing the Rings

If you need to add or remove devices (or whole systems) from the swift cluster, you will need to update the rings. See Swift/Ring_Management for how to do this.

Add a proxy node to the cluster

  • Assign the swift proxy role in site.pp in puppet to make the new proxy match existing proxies in that cluster (in practice the regex-based matching in site.pp probably already does this as long as the DC team has removed a more specific role(insetup) assignment).
  • Add the host to swift::proxyhosts in puppet for ferm to know about the host
  • Add the host to swift::proxy::memcached_servers to let swift know it should consider that host for Memcached
  • Add the host into conftool-data/node/SITE.yaml so that confctl knows about the node (so you can pool it when ready)
  • Run puppet on the host twice, reboot, and run puppet again
  • Test the host
    • curl for a file that exists in swift from both a working host and the new host
    • eg: curl -o /tmp/foo -v -H "Host: upload.wikimedia.org" http://$(hostname -f)/wikipedia/commons/thumb/1/1d/1-month-old_kittens_32.jpg/800px-1-month-old_kittens_32.jpg
  • Run puppet on the remaining frontend hosts to make sure firewall rules are updated
  • Roll restart swift on the remaining frontend hosts for config change to be applied
  • Pool the new proxy (full details)

Reimage/provision the ring manager proxy host

When reimaging the swift ring manager host you'll also need to force-generate new rings (otherwise puppetmaster will not be able to find /var/cache/swift_rings/new_rings.tar.bz2):

 # Verify no changes to the rings are expected
 /usr/local/bin/swift_ring_manager -o /var/cache/swift_rings/
 # force-write new rings
 /usr/local/bin/swift_ring_manager --doit --rebalance -o /var/cache/swift_rings/

Remove a failed proxy node from the cluster

  • Take the failed node out of the load balancer
  • If the host is being decomissioned, revert the "add" steps above
  • Check if the host being removed is the swift-repl (it'll have profile::swift::proxy::enable_swiftrepl: true in its hosts' hieradata) or profile::swift::stats_reporter_host node for its cluster; if so, then move these to a newer node. It's important to do these changes at the same time as the other puppet changes, otherwise the stats_reporter produces a lot of cron spam!

Add a storage node to the cluster

As of 2023, we are moving to mounting and managing swift filesystems by their /dev/disk/by-path entry instead of relying on fs-labels and drive ordering (which has proved to be fragile, since drive ordering is inconsistent between boots); migrating nodes would involve draining, removing from the rings, and reimaging, so at the moment we are just installing new nodes thus. This means that new nodes need to be matched by the swift_new_storage regex in hieradata/regex.yaml, added to the case statement in modules/install_server/files/autoinstall/scripts/partman_early_command.sh,and be set to partman/custom/ms-be_simple.cfg by the case statement in hieradata/role/common/apt_repo.yam; this should be done before DC-Ops try and install the new nodes. Once all backends are done thus, we can change the default value of profile::swift::storage::disks_by_path, but on our current hardware refresh cycle, this is likely to be around the 2028-2029 FY...

New hosts now typically come out of standard dc-ops provisioning with role::swift::storage applied. After that:

  • Add the host to swift::storagehosts
  • Run puppet on the host to partition and format all disks, after puppet is done reboot the host and verify all filesystems are mounted.

The host is now ready to be put in service by adding its disks to the rings (see Swift/Ring_Management for how).

Removing a storage node from the cluster

Remove (fail out) a drive from a ring

If a drive fails and it will be replaced by dc-ops in the next two/three days then usually removing it from the rings is not necessary: once the disk is put back and formatted/mounted then swift will rebuild it.

However if the disk is going to be out of service for longer then it needs to be taken out of the rings to maintain the expected redundancy; see Swift/Ring_Management for how to do this.

Replacing a disk without touching the rings

If the time span for replacement is short enough the failed disk can be left unmounted and swapped with a working one. After successful replacement the disk should be added back to the raid controller and the raid cache discarded (megacli case):

 megacli -GetPreservedCacheList -a0
 megacli -DiscardPreservedCache -L'disk_number' -a0
 megacli -CfgEachDskRaid0 WB RA Direct CachedBadBBU -a0

In case of a HP host then (hp)ssacli needs to be used:

 ssacli
 # then within the command's CLI
 set target controller slot=3 # could be slot=0 on older hosts
 ld all show # list all logical drives, there will be one/more marked as failed
 ld <NUMBER> modify reenable
 # hit 'y'

Check that the drive is back in the OS (e.g. with dmesg) and run puppet. The partitions and filesystems will be created. Make sure the filesystem is in /etc/fstab and then mount -a and finally run puppet again. You'll see swift filling up the filesystem shortly afterwards by checking df -h output.

It may happen that the new drive doesn't come back with the same letter as before, in these cases a reboot of the host "fixes" this enumeration problem.

Cleanup fully used root filesystem

There is a race condition that might happen and leads to / filesystem being fully utilized: a faulty disk (as determined by looking at dmesg) is automatically unmounted by swift-drive-audit. When this happens rsync might want to keep writing the remaining files in /srv/swift-storage/PARTITION which is now unmounted and results in root filesystem filling up.

How to fix: (downtime host first in icinga to avoid swift daemons alerts)

 PARTITION=<e.g. sdc1>
 disable-puppet "fixing disk on host XXX TYYYYY"
 systemctl stop swift*
 systemctl stop rsync
 umount /srv/swift-storage/$PARTITION || true
 rm -rf /srv/swift-storage/${PARTITION}/*
 mount /srv/swift-storage/$PARTITION
 run-puppet-agent -e "same message as above" # will restart swift/rsync

Rollover a Swift key

These are defined in private puppet in hieradata/common/profile/swift.yaml for Swift (and hieradata/common/profile/thanos/swift.yaml for Thanos). They may well be also defined elsewhere in the client's codebase - for example, MediaWiki has private/PrivateSettings.php on the deploy servers. So the rollover needs to be co-ordinated. For example, updating a Swift credential used by MediaWiki might look in outline like:

  • create new key
  • commit (with sudo) to /srv/private/hieradata/common/profile/swift.yaml on puppetmaster
  • commit to /srv/mediawiki-staging/private/PrivateSettings.php on deployment host
  • run puppet-agent on Swift frontends
  • rolling restart Swift frontends
  • deploy new credential to production - scap sync-file private/PrivateSettings.php "change message and phab number"

Rebooting backends / Puppet is failing on a recently-booted backend

Filesystems are mounted by label on swift backends. But puppet assumes that e.g. the filesystem labelled swift-sda3 is in fact on /dev/sda3; this assumption is especially picky about the SSDs, /dev/sda, and /dev/sdb. Unfortunately, drives do not always appear in the same order, which then causes puppet to fail. So if a node has recently rebooted and puppet is unhappy, it's worth checking drive ordering especially of /dev/sd{a,b} is correct; similarly when rebooting swift nodes, check this is correct. If not, reboot until the drives come up in the right order. You can check by eye from df -lh output, or use this tasteful shell snippet:

  for i in sd{a,b}{3,4} ; 
    do if [ $(findmnt -o SOURCE -n LABEL="swift-$i") != "/dev/$i" ]; 
      then echo "(at least) swift-$i misplaced" ; break -1 2>/dev/null ; 
    fi ; 
  done

This returns 0 if all is well, and emits a message and returns 1 if there's a problem (and the node needs rebooting).

Rolling restart of swift frontends

Documented at Service_restarts#Swift (there's a cookbook which is typically what you want to use).

Handle alerts

Rise in 5xx responses reported from ATS

First check that it isn't simply 5xxs from thumbor (which are passed to the client via ms-swift); then the first port of call is to do a rolling restart of the swift frontends (documented above).

mediawiki originals uploads

This alert fires whenever the originals uploads exceed a threshold, it is used as a early warning system for excessive upload traffic which can throw off capacity planning calculations.

Note that typically nothing is immediately on fire, although such alerts should be investigated and followed up on. A good next step is to check commons for latest uploads and see if a particularly active user(s) or bot(s) stand out.

See also bug  T248151 for an example of excessive traffic from a bot.