Swift/How To
General Prep
Nearly all of these commands are best executed from a swift proxy and stats reporter host (e.g. ms-fe1005.eqiad.wmnet or ms-fe2005.codfw.wmnet) and require either the master password or an account password. Both the master password (super_admin_key) and the specific users' passwords we have at Wikimedia are accessible in the swift proxy config file /etc/swift/proxy-server.conf
or in the private puppet repository.
All of these tasks are explained in more detail and with far more context in the official swift documentation. This page is intended as a greatly restricted version of that information directed specifically at tasks we'll need to carry out in the Wikimedia cluster. For this reason many options and caveats have been left out, and assume things like the authentication type used to restrict it to what's correct for our installation. It may or may not be useful for a wider audience.
Set up an entire swift cluster
This is documented elsewhere: Swift/Setup_New_Swift_Cluster
Individual Commands - interacting with Swift
Impersonating a specific swift account
Some swift proxy servers (e.g. ms-fe2009 / ms-fe1009) have an extra stats reporter profile (git grep stats_reporter_host hieradata/ in puppet to find out), which records account credentials in /etc/swift/account_*
You can do . /etc/swift/account_AUTH_dispersion.env
to read the dispersion account's authentication URL (ST_AUTH), username (ST_USER) and key (ST_KEY) into your environment, which the swift binary will then use to authenticate your commands, similar to nova/designate/openstack used in labs administration.
If you need access to mediawiki files use . /etc/swift/account_AUTH_mw.env
as the credentials file. BE CAREFUL WITH THIS as you will have the same access that mediawiki has.
The same environment variables can be used to access swift via its command line client from anywhere in the cluster, the variables are:
ST_AUTH=https://<swift url>/auth/v1.0 ST_USER=<account>:<user> ST_KEY=<password>
Create a new swift account (Thanos Cluster)
- Add the new user in puppet
hieradata/common/profile/thanos/swift.yaml
- Add fake credentials in labs/private
hieradata/common/profile/thanos/swift.yaml
- Add real credentials in
hieradata/common/profile/thanos/swift.yaml
in private puppet - Rolling restart
swift-proxy.service
in thanos-fe* servers
Note: Check with T279621, since in the future new accounts will be created on the M.O.S.S cluster
Create a container
You create a container by POSTing to it. You modify a container by POSTing to an existing container. Only users with admin rights (aka users in the .admin group) are allowed to create or modify containers.
Run the following commands on any host with the swift binaries installed (any host in the swift cluster or iron)
- create a container with default permissions (r/w by owner and nobody else)
- swift post container-name
- create a container with global read permissions
- swift post -r '.r:*' container-name
- create a container with a specific storage policy
- swift post -H 'X-Storage-Policy: POLICY' container-name
- The WikimediaMaintenance extension's filebackend/setZoneAccess.php file creates most wiki-specific containers, and SwiftFilebackend gives it's own user read and write privileges along with global read for public containers.
List containers and contents
It's easiest to do all listing from a frontend host on the cluster you wish to list.
list of all containers
- ask for a listing of the container:
swift list
list the contents of one container
- ask for a listing of the container:
swift list wikipedia-commons-local-thumb.a2
list specific objects within a container
example: look for all thumbnails for the file 1-month-old_kittens_32.jpg
- start from a URL for a thumbnail (if you are at the original File: page, 'view image' on the existing thumbnail)
- Pull out the project, "language", thumb, and shard to form the correct container and add -local into the middle
- eg wikipedia-commons-local-thumb.1d
- Note - only some containers are sharded:
grep shard /etc/swift/proxy-server.conf
to find out if your container should be sharded - unsharded containers leave off the shard eg wikipedia-commons-local-thumb
- ask swift for a listing of the correct container with the --prefix option (it must come before the container name)
swift list --prefix 1/1d/1-month wikipedia-commons-local-thumb.1d
- note that --prefix is a substring anchored to the beginning of the shard; it doesn't have to be a complete name.
Find an object in swift
Sometimes it's useful to be able to work out which container in swift an object will be in; where containers are sharded (e.g. Commons) the container an object will be in is based on the md5sum of the object name, un-encoded and with spaces converted to underscores - e.g. printf "Café_de_Flore.jpg" | md5sum
returns fddbfcc9dfba48ced649752953030d5a
, from which we know the container of this commons image will be wikipedia-commons-local-public.fd
and the object name in that container will be f/fd/Café_de_Flore.jpg
Find a deleted object in swift
Deleted objects are stored in separate containers, with objects named for the sha of their contents. So to work out where a deleted object is stored, you have to first query the filearchive
table for the fa_storage_key
field (where fa_storage_key
is the un-encoded object name with spaces converted to underscores). For example, given the fa_storage_key
of r0vsvm23suew6epzh4pm39iuy7dzyh4.png
, the object would be called r/0/v/r0vsvm23suew6epzh4pm39iuy7dzyh4.png
in the wikipedia-commons-local-deleted.r0
container.
For a worked example of using this in practice, see phab:T350020 and the linked code snippets.
Find an object in the backups database
Sometimes it can be useful to see what the backups system knows about an object; for that, see Media_storage/Backups#Querying_files
Show specific info about a container or object
Note - these instructions will only show containers or objects the account has permission to see.
- log into a swift frontend host on the cluster you want to use, set ST_AUTH/ST_USER/ST_KEY
- ask for statistics about all containers:
swift stat
- ask for statistics about the container:
swift stat wikipedia-commons-local-thumb.a2
- ask for statistics about an object in a container:
swift stat wikipedia-commons-local-thumb.a2 a/a2/Little_kitten_.jpg/300px-Little_kitten_.jpg
Delete a container or object
Note - THIS IS DANGEROUS; it's easy to delete a container instead of an object by hitting return at the wrong time!
Deleting uses the same syntax as 'stat'. I recommend running stat on an object to get the command right then do cli substitution (^stat^delete^ in bash)
- log into a swift frontend host on the cluster you want to use, set ST_AUTH/ST_USER/ST_KEY
- run
swift stat
on the object you want to deleteswift stat wikipedia-de-local-thumb f/f5/Wasserhose_1884.jpg/800px-Wasserhose_1884.jpg
- swap stat for delete in the same command.
^stat^delete^
When you call delete for a container it will first delete all objects within the container and then delete the container itself. Containers with lots of objects can take a very long time to delete (days/weeks) .
Fine-grained object deletions with Swiftly
If you need to more control over what is deleted in a container (such as only deleting objects in the /foo
path) you can use the Swiftly client , specifically its fordo feature.
Example read-only command with Swiftly fordo
swiftly for -p commons/checkpoints/1475a2038f088807f9d695aea3e1c7e3/ rdf-streaming-updater-codfw do head "<item>"
Note that "<item>"
is literal.
Example deletion command with Swiftly fordo
swiftly for -p commons/checkpoints/1475a2038f088807f9d695aea3e1c7e3/ rdf-streaming-updater-codfw do delete "<item>"
Example .swiftly.conf
[swiftly]
auth_user = wdqs:flink
auth_key = [REDACTED]
auth_url = https://thanos-swift.discovery.wmnet/auth/v1.0
More details in this phab ticket.
Setup temp url key on an account
MediaWiki makes use of a temporary url key to download files, the key must be set on the mw:media account. On swift machines that report statistics you can find several .env files to "su" to each account, e.g.
source /etc/swift/account_AUTH_mw.env swift post -m 'Temp-URL-Key:<your key>'
Checking / Fixing container ACLs for private wikis
The setup process for private wikis does not get permissions correct (see phab:T340189 for details). Make sure each container for the wiki has both read and write ACLs set to mw:thumbor-private,mw:media
. To do this, gain access to the mw
account as described above, and then you can check the existing ACLs:
for i in $(swift list | grep wikipedia-FOO) #replace FOO with wiki name do echo "$i:" swift stat "$i" | grep "ACL" done
And update them:
for c in wikipedia-FOO-local-public wikipedia-FOO-local-thumb wikipedia-FOO-local-transcoded wikipedia-FOO-timeline-render do swift post "$c" --read-acl 'mw:thumbor-private,mw:media' --write-acl 'mw:thumbor-private,mw:media' done
Because both swift clusters are independent, you must make this change in both DCs.
Individual Commands - Managing Swift
Show current swift ring layout
There are a minimum of three rings in Swift: account, object, and container. In practice, our production (MS and Thanos) clusters all have the object-1
ring too. The swift-ring-builder command with a builder file will list the current state of the ring. Additional rings (e.g. object-1 might be present when using storage policies)
# swift-ring-builder /etc/swift/account.builder # swift-ring-builder /etc/swift/container.builder # swift-ring-builder /etc/swift/object.builder
Managing the Rings
If you need to add or remove devices (or whole systems) from the swift cluster, you will need to update the rings. See Swift/Ring_Management for how to do this.
Add a proxy node to the cluster
- Assign the swift proxy role in
site.pp
in puppet to make the new proxy match existing proxies in that cluster (in practice the regex-based matching insite.pp
probably already does this as long as the DC team has removed a more specificrole(insetup)
assignment). - Add the host to swift::proxyhosts in puppet for ferm to know about the host
- Add the host to swift::proxy::memcached_servers to let swift know it should consider that host for Memcached
- Add the host into conftool-data/node/SITE.yaml so that confctl knows about the node (so you can pool it when ready)
- Run puppet on the host twice, reboot, and run puppet again
- Test the host
- curl for a file that exists in swift from both a working host and the new host
- eg:
curl -o /tmp/foo -v -H "Host: upload.wikimedia.org" http://$(hostname -f)/wikipedia/commons/thumb/1/1d/1-month-old_kittens_32.jpg/800px-1-month-old_kittens_32.jpg
- Run puppet on the remaining frontend hosts to make sure firewall rules are updated
- Roll restart swift on the remaining frontend hosts for config change to be applied
- Pool the new proxy (full details)
Reimage/provision the ring manager proxy host
When reimaging the swift ring manager host you'll also need to force-generate new rings (otherwise puppetmaster will not be able to find /var/cache/swift_rings/new_rings.tar.bz2
):
# Verify no changes to the rings are expected /usr/local/bin/swift_ring_manager -o /var/cache/swift_rings/ # force-write new rings /usr/local/bin/swift_ring_manager --doit --rebalance -o /var/cache/swift_rings/
Remove a failed proxy node from the cluster
- Take the failed node out of the load balancer
- If the host is being decomissioned, revert the "add" steps above
- Check if the host being removed is the
swift-repl
(it'll haveprofile::swift::proxy::enable_swiftrepl: true
in its hosts' hieradata) orprofile::swift::stats_reporter_host
node for its cluster; if so, then move these to a newer node. It's important to do these changes at the same time as the other puppet changes, otherwise the stats_reporter produces a lot of cron spam!
Add a storage node to the cluster
As of 2023, we are moving to mounting and managing swift filesystems by their /dev/disk/by-path entry instead of relying on fs-labels and drive ordering (which has proved to be fragile, since drive ordering is inconsistent between boots); migrating nodes would involve draining, removing from the rings, and reimaging, so at the moment we are just installing new nodes thus. This means that new nodes need to be matched by the swift_new_storage regex in hieradata/regex.yaml, added to the case statement in modules/install_server/files/autoinstall/scripts/partman_early_command.sh,and be set to partman/custom/ms-be_simple.cfg by the case
statement in hieradata/role/common/apt_repo.yam
; this should be done before DC-Ops try and install the new nodes. Once all backends are done thus, we can change the default value of profile::swift::storage::disks_by_path, but on our current hardware refresh cycle, this is likely to be around the 2028-2029 FY...
New hosts now typically come out of standard dc-ops provisioning with role::swift::storage applied. After that:
- Add the host to swift::storagehosts
- Run puppet on the host to partition and format all disks, after puppet is done reboot the host and verify all filesystems are mounted.
The host is now ready to be put in service by adding its disks to the rings (see Swift/Ring_Management for how).
Removing a storage node from the cluster
- Make sure the host is out of the rings (see Swift/Ring_Management for how).
- Remove the host from swift::storagehosts
- Decommission the host as usual (see Server_Lifecycle#Reclaim_to_Spares_OR_Decommission for details)
Remove (fail out) a drive from a ring
If a drive fails and it will be replaced by dc-ops in the next two/three days then usually removing it from the rings is not necessary: once the disk is put back and formatted/mounted then swift will rebuild it.
However if the disk is going to be out of service for longer then it needs to be taken out of the rings to maintain the expected redundancy; see Swift/Ring_Management for how to do this.
Replacing a disk without touching the rings
If the time span for replacement is short enough the failed disk can be left unmounted and swapped with a working one. After successful replacement the disk should be added back to the raid controller and the raid cache discarded (megacli case):
megacli -GetPreservedCacheList -a0 megacli -DiscardPreservedCache -L'disk_number' -a0 megacli -CfgEachDskRaid0 WB RA Direct CachedBadBBU -a0
In case of a HP host then (hp)ssacli needs to be used:
ssacli # then within the command's CLI set target controller slot=3 # could be slot=0 on older hosts ld all show # list all logical drives, there will be one/more marked as failed ld <NUMBER> modify reenable # hit 'y'
Check that the drive is back in the OS (e.g. with dmesg) and run puppet. The partitions and filesystems will be created. Make sure the filesystem is in /etc/fstab and then mount -a and finally run puppet again. You'll see swift filling up the filesystem shortly afterwards by checking df -h output.
It may happen that the new drive doesn't come back with the same letter as before, in these cases a reboot of the host "fixes" this enumeration problem.
Cleanup fully used root filesystem
There is a race condition that might happen and leads to / filesystem being fully utilized: a faulty disk (as determined by looking at dmesg) is automatically unmounted by swift-drive-audit. When this happens rsync might want to keep writing the remaining files in /srv/swift-storage/PARTITION which is now unmounted and results in root filesystem filling up.
How to fix: (downtime host first in icinga to avoid swift daemons alerts)
PARTITION=<e.g. sdc1> disable-puppet "fixing disk on host XXX TYYYYY" systemctl stop swift* systemctl stop rsync umount /srv/swift-storage/$PARTITION || true rm -rf /srv/swift-storage/${PARTITION}/* mount /srv/swift-storage/$PARTITION run-puppet-agent -e "same message as above" # will restart swift/rsync
Rollover a Swift key
These are defined in private puppet in hieradata/common/profile/swift.yaml for Swift (and hieradata/common/profile/thanos/swift.yaml for Thanos). They may well be also defined elsewhere in the client's codebase - for example, MediaWiki has private/PrivateSettings.php on the deploy servers. So the rollover needs to be co-ordinated. For example, updating a Swift credential used by MediaWiki might look in outline like:
- create new key
- commit (with sudo) to /srv/private/hieradata/common/profile/swift.yaml on puppetmaster
- commit to /srv/mediawiki-staging/private/PrivateSettings.php on deployment host
- run puppet-agent on Swift frontends
- rolling restart Swift frontends
- deploy new credential to production - scap sync-file private/PrivateSettings.php "change message and phab number"
Rebooting backends / Puppet is failing on a recently-booted backend
Filesystems are mounted by label on swift backends. But puppet assumes that e.g. the filesystem labelled swift-sda3
is in fact on /dev/sda3
; this assumption is especially picky about the SSDs, /dev/sda
, and /dev/sdb
. Unfortunately, drives do not always appear in the same order, which then causes puppet to fail. So if a node has recently rebooted and puppet is unhappy, it's worth checking drive ordering especially of /dev/sd{a,b}
is correct; similarly when rebooting swift nodes, check this is correct. If not, reboot until the drives come up in the right order. You can check by eye from df -lh
output, or use this tasteful shell snippet:
for i in sd{a,b}{3,4} ;
do if [ $(findmnt -o SOURCE -n LABEL="swift-$i") != "/dev/$i" ];
then echo "(at least) swift-$i misplaced" ; break -1 2>/dev/null ;
fi ;
done
This returns 0 if all is well, and emits a message and returns 1 if there's a problem (and the node needs rebooting).
Rolling restart of swift frontends
Documented at Service_restarts#Swift (there's a cookbook which is typically what you want to use).
Update internal TLS certificates
These last for 5 years, so don't need updating very often. The documentation for the process is at Cergen#Update_a_certificate, but, briefly:
- clear the old certificate out of the puppet CA (otherwise it won't sign the new one)
puppet cert clean swift_eqiad
(or codfw) - remove the old CRT and CSR files from
modules/secret/secrets/certificates/swift_eqiad/
- run
cergen -c swift_eqiad [...]
to generate new CSR and CRT,git commit
them. - copy the new CRT file into
modules/profile/files/ssl/
(with a slightly different filename - lose the .pem from the end) in production-puppet, get reviewed & merge - run puppet agent on frontends to deploy (which automatically refreshes envoy)
You can check the state of the deployed cert with something like
openssl s_client -connect ms-fe1009.eqiad.wmnet:443 -showcerts </dev/null 2>/dev/null | openssl x509 -text -noout
Handle alerts
Rise in 5xx responses reported from ATS
First check that it isn't simply 5xxs from thumbor (which are passed to the client via ms-swift); then the first port of call is to do a rolling restart of the swift frontends (documented above).
mediawiki originals uploads
This alert fires whenever the originals uploads exceed a threshold, it is used as a early warning system for excessive upload traffic which can throw off capacity planning calculations.
Note that typically nothing is immediately on fire, although such alerts should be investigated and followed up on. A good next step is to check commons for latest uploads and see if a particularly active user(s) or bot(s) stand out.
See also T248151 for an example of excessive traffic from a bot.