Portal:Data Services/Admin/Runbooks/Create an NFS server

The procedures in this runbook require admin permissions to complete.

Overview

NFS is hosted on physical hardware as well as virtual servers. As of 2022-01-24 we are transitioning most NFS workloads to virtual servers. If a server will only provide files to a particular project, the new server should be built inside that project; if it is providing shared services to multiple projects (e.g. 'scratch') then it should reside in the cloudinfra-nfs project.

The dumps servers will remain on hardware for some time.

Process

Creating a new server relies on a spicerack cookbook. These can be executed from the cloudcumin hosts or from your [[local machine.

Create a Server for a new service

Typically you will want a separate server for each NFS volume -- multiple volumes per server might work but are largely untested.

Each NFS server consists of a persistent cinder volume, a service IP, a service name, and a replaceable VM. Before building the new server you'll need to take the following steps:

Decide the name of the new volume. This will be hard to change later!
Decide how much storage is needed for the new volume.
Decide what flavor of server you need. For minor access a 1-core/1-Gb server may be sufficient but for scaled concurrent use a larger flavor is needed.
Check and adjust the 'gigabytes' quota to provide for the new NFS volume
Check and adjust 'cores' and 'RAM' to support the new server
Make sure that a service domain exists in the target project: svc.<projectname>.eqiad1.wikimedia.cloud

The following command will create the VM, volume, service name, and service IP:

$ cookbook wmcs.nfs.add_server --create-storage-volume-size <size in GB> --project <target project> --prefix <name of volume>-nfs --flavor <server flavor id> --image <glance image id> --network 7425e328-560c-4f00-8e99-706f3fb90bb4 --service-ip <name of volume>

As of 2022-02-15 a bug in spicerack will result in a failure if the 'nfs' security group does not exist in the target project. Nevertheless the group is created, and a second run should work fine.

The newly created server will also run an nfs-exportd service to maintain exports to the new volume. The behavior of that file is configured via the puppet file nfs-mounts.yaml.erb and the results can be found in /etc/exports.d

Create a replacement server for an existing service

To upgrade or replace the VM hosting a given NFS service, first create a detached server. This will contain all the necessary services but will NOT create a service name, a service IP, or a cinder volume. Instead it creates a VM available for failover from an existing server with storage and service name attached:

$ cookbook wmcs.nfs.add_server --project <target project> --prefix <hostname-prefix> <name of volume> --flavor <server flavor id> --image <glance image id> --network 7425e328-560c-4f00-8e99-706f3fb90bb4

Note that the omission of --create-storage-volume-size prevents creation and attachment of the cinder volume, and the omission of --service-ip prevents the creation of a new service name or IP.

NFS service failover

Be careful! On 2024-03-26 the cookbook formatted the filesystem of the toolsbeta-nfs volume requiring a restore from the backups. gerrit:1014543 should prevent that from happening, but the fix has not yet been properly tested.

For a particular NFS volume, service can be moved from an existing server (likely created using the command in the 'new service' section above) to a passive server <likely created using the 'replacement server' section above) like this:

$ cookbook wmcs.nfs.migrate_service --project cloudinfra-nfs --from-host-id <current server ID> --to-host-id <future server id>

Most clients will handle that change gracefully due to the consistent name and IP. Some clients may seize up or otherwise misbehave if they are in the middle of file activity during the failover.