Portal:Data Services/Admin/Runbooks/Create an NFS server
NFS is hosted on physical hardware as well as virtual servers. As of 2022-01-24 we are transitioning most NFS workloads to virtual servers. If a server will only provide files to a particular project, the new server should be built inside that project; if it is providing shared services to multiple projects (e.g. 'scratch') then it should reside in the cloudinfra-nfs project.
The dumps servers (currently labstore1006/1007, soon to be clouddumps1001/1002) will remain on hardware for some time.
see also Portal:Data_Services/Admin/Shared_storage
Creating a new server relies on a spicerack cookbook. The cookbook is in the wmcs branch and is typically run locally rather than on a cumin host. Details about setting up a local cookbook exec environment can be found at Wikimedia_Cloud_Services_team/EnhancementProposals/Operational_Automation#Local_setup.
Create a Server for a new service
Typically you will want a separate server for each NFS volume -- multiple volumes per server might work but are largely untested.
Each NFS server consists of a persistent cinder volume, a service IP, a service name, and a replaceable VM. Before building the new server you'll need to take the following steps:
- Decide the name of the new volume. This will be hard to change later!
- Decide how much storage is needed for the new volume.
- Decide what flavor of server you need. For minor access a 1-core/1-Gb server may be sufficient but for scaled concurrent use a larger flavor is needed.
- Check and adjust the 'gigabytes' quota to provide for the new NFS volume
- Check and adjust 'cores' and 'RAM' to support the new server
- Make sure that a service domain exists in the target project: svc.<projectname>.eqiad1.wikimedia.cloud
The following command will create the VM, volume, service name, and service IP:
$ cookbook -c ~/.config/spicerack/cookbook_config.yaml wmcs.nfs.add_server --create-storage-volume-size <size in GB> --project <target project> --prefix <name of volume>-nfs --flavor <server flavor id> --image <glance image id> --network 7425e328-560c-4f00-8e99-706f3fb90bb4 --service-ip <name of volume>
As of 2022-02-15 a bug in spicerack will result in a failure if the 'nfs' security group does not exist in the target project. Nevertheless the group is created, and a second run should work fine.
The newly created server will also run an nfs-exportd service to maintain exports to the new volume. The behavior of that file is configured via the puppet file nfs-mounts.yaml.erb and the results can be found in /etc/exports.d
Create a replacement server for an existing service
To upgrade or replace the VM hosting a given NFS service, first create a detached server. This will contain all the necessary services but will NOT create a service name, a service IP, or a cinder volume. Instead it creates a VM available for failover from an existing server with storage and service name attached:
$ cookbook -c ~/.config/spicerack/cookbook_config.yaml wmcs.nfs.add_server --project <target project> --prefix <name of volume> <name of volume> --flavor <server flavor id> --image <glance image id> --network 7425e328-560c-4f00-8e99-706f3fb90bb4
Note that the omission of
--create-storage-volume-size prevents creation and attachment of the cinder volume, and the omission of
--service-ip prevents the creation of a new service name or IP.
NFS service failover
For a particular NFS volume, service can be moved from an existing server (likely created using the command in the 'new service' section above) to a passive server <likely created using the 'replacement server' section above) like this:
$ cookbook -c ~/.config/spicerack/cookbook_config.yaml wmcs.nfs.migrate_service --project cloudinfra-nfs --from-host-id <current server ID> --to-host-id <future server id>
Most clients will handle that change gracefully due to the consistent name and IP. Some clients may seize up or otherwise misbehave if they are in the middle of file activity during the failover.