Portal:Cloud VPS/Admin/notes/NAT loophole/NFS

From Wikitech

Some ideas specific ideas about NFS.

Dumps cluster

  • Dumps cluster is a single RO share.
  • Data is being generated in the prod realm. There always will be a fundamental prod --> cloud data flow.
  • Dumps servers won't care what IP the client is using. RO means no write locks.
  • Potentially, cloud VMs could access dumps using the routing_source_ip NAT address (or floating IP). In either case, no need for dmz_cidr (i.e, no need for the dump server to know original VM address)

Primary cluster

  • Shares are RW.
  • Data is produced inside the cloud. There is no fundamental need for prod --> cloud or cloud --> prod data flows.
  • The primary cluster includes a share per CloudVPS project, by the time of this writting:
    • account-creation-assistance cvn dumps fastcci huggle math paws project-proxy public_scratch quarry snuggle testlabs toolsbeta tools twl utrs video wikidumpparse wmde-templates-alpha

Probably this applies to the scratch cluster as well. TODO: to maps too?

Some ideas on how to address the service provided by the primary cluster follows, for better prod<->cloud network isolation.

idea 1: serve NFS from a network namespace connected to a new cloud subnet

  • The NFS hardware servers get 2 NICs: one for control plane (ssh management, etc) and other for data plane (NFS).
  • The control plane interface is kept connected to the production network, for example the cloud-host vlan, which allows us to keep using install servers, puppet, etc.
  • The data plane interface is attached to a linux network namespace. We then run the NFS daemon in this network namespace, thus adding an additional layer of isolation between prod and cloud realms.
  • We introduce a new NFS/share network, a physical vlan. The cloudgw device is the gateway for this new subnet. Let's call it cloud-nfs-share-subnet.
  • The netns in the NFS server is connected to this new cloud-nfs-share-subnet.
  • VM instances running in CloudVPS access the new cloud-nfs-share-submet directly without NAT.
  • The NFS connection is therefore never leaving the cloud realm.

The data storage could be mounted from the ceph cluster, to avoid local NFS server storage being a single point of failure.

idea 1: pros & cons

Pros:

  • leverages the cloudgw project.
  • incremental change without major service architecture reworks.
  • simple approach, relatively low complexity from the engineering point of view.
  • labstore1004/1005 already have required NICs.
  • shares still defined using our custom code in ops/puppet.git, potentially keeping engagement from technical contributors.

Cons:

  • Hand-made setup and workflows. No openstack manila which means using no upstream cloud-native components.
  • Single data point of failure if using local NFS server storage. We could easily mount a volume from the ceph cluster to avoid this.

idea 2: manila with LVM driver using DHSS=false

manila is the openstack Shared Filesystems service for providing Shared Filesystems as a service.

This idea would be a way to wrap our current NFS hardware servers behind the openstack manila API, while keeping full control over the network layout.

  • There should be network connectivity between VMs and the NFS server, but this is not managed by openstack manila, and should be managed by hand.
  • The manila-share service is installed and runs in the NFS server.
  • The driver expects to find a volume group named lvm-shares, which will be used by the driver for share provisioning, and should be managed by hand.
  • When a share is created using the manila API, it will be created in the NFS server, and VMs should be configured by hand with the configuration option generated by the manila API.

The introduction of manila here doesn't add any network isolation benefit. For that, we should also introduce the same network changes as in idea 1.

DHSS is an acronym for ‘driver handles share servers’. It defines two different share driver modes when they either do handle share servers or not, i.e, if manila should manage the final sharing NFS server or not.

The data storage could be mounted from the ceph cluster, to avoid local NFS server storage being a single point of failure, and that would be idea 4.

idea 2: pros & cons

pros:

  • pure upstream cloud-native openstack solution, shares would be managed using the openstack API instead of our custom code in ops/puppet.git.
  • leverages the cloudgw project.
  • incremental change without major service architecture reworks.
  • labstore1004/1005 already have required NICs.

cons:

  • VMs still need a raw NFS connection from the VM to the NFS server share. Manila doesn't solve any of this itself, it just provides the API for managing the share lifecycle and access control.
  • We would still need to introduce the network changes from idea 1.
  • No shares defined using our custom code in ops/puppet.git likely prevents engagement from technical contributors.
  • Single data point of failure if using local NFS server storage. We could easily mount a volume from the ceph cluster to avoid this (see idea 4)

idea 3: manila with generic driver using DHSS=true

manila is the openstack Shared Filesystems service for providing Shared Filesystems as a service. If using generic manila driver and DHSS=true, then:

  • manila will create a cinder volume (in the ceph backend cluster we already have).
  • manila will create a VM using nova (in the hypervisors we already have), and attach the cinder volume to it.
  • manila will create the share (as network service) using the VM and the cinder volume, and export it to clients.
  • manila operates the required neutron configuration.
  • clients would need manual configuration, to use the settings generated by manila.
  • cinder volumes are mounted by VMs using the hypervisor, ceph servers are contacted by cloudvirts.

The nova VM that manila uses needs to be procured beforehand (glance image), and this might clash with the way we handle VMs in CloudVPS.

It is unclear how neutron is modified by manila, and if that would clash with our existing neutron setup (flat topology, etc).

DHSS is an acronym for ‘driver handles share servers’. It defines two different share driver modes when they either do handle share servers or not, i.e, if manila should manage the final sharing NFS server or not.

idea 3: pros & cons

pros:

  • pure upstream cloud-native openstack solution, shares would be managed using the openstack API instead of our custom code in ops/puppet.git.
  • leverages the ceph storage cluster, no data single point of failure.
  • leverages the nova setup we already have.
  • the whole share lifecyle is managed by manila.
  • client <-> server NFS traffic never leaves the CloudVPS virtual network.

cons:

  • it is unclear if manila would be too smart when managing neutron and somehow break or flat topology model (remember: we don't support tenant networks).
  • it is unclear how procuring a NFS-specific VM would work in this environment, we don't have workflows for this.
  • adapting several components (cinder, nova, manila, neutron) to work together could be more complex than idea #1.
  • apparently supports 26 shares max because that's the highest number of cinder volumes you can attach to a VM (KVM/nova PCI limitation). This should be investigated more.
  • no shares defined using our custom code in ops/puppet.git likely prevents engagement from technical contributors.
  • there are IO performance concerns because data is crossing through many layers. We don't have any numbers yet, and should be generated in experiments.

idea 4: manila with cephfs driver using DHSS=false, using NFS as transport and thin NFS server

manila is the openstack Shared Filesystems service for providing Shared Filesystems as a service. If using the cephfs manila driver and DHSS=false, then:

  • the (thin) NFS servers runs both manila-share and the nfs-ganesha services.
  • the (thin) NFS export servers is a client of the ceph cluster, mounts volumes. This is done by hand (puppet).
  • manila will manage the nfs-ganesha exports in the (thin) NFS server.
  • When a share is created using the manila API, it will be created in the NFS server, and VMs should be configured by hand with the configuration option generated by the manila API.
  • VM clients will connect to the thin NFS server using the dedicated subnet under the cloudgw device, with no NAT involved.

DHSS is an acronym for ‘driver handles share servers’. It defines two different share driver modes when they either do handle share servers or not, i.e, if manila should manage the final sharing NFS server or not. In this model, the thin sharing NFS server is managed as a standard server using our puppet workflow.

The introduction of manila here doesn't add any network isolation benefit. For that, we should also introduce the same network changes as in idea 1.

idea 4: pros & cons

pros:

  • pure openstack solution, shares would be managed using the openstack API instead of our custom code in ops/puppet.git.
  • leverages the ceph storage cluster.
  • leverages the nova setup we already have.
  • leverages the cloudgw project.
  • client <-> server NFS traffic never leaves the cloud realm.

cons:

  • VMs still need a raw NFS connection from the VM to the NFS server share. Manila doesn't solve any of this itself, it just provides the API for managing the share lifecycle and access control.
  • We would still need to introduce the network changes from idea 1.
  • no shares defined using our custom code in ops/puppet.git likely prevents engagement from technical contributors.

idea 5: serve NFS from inside the cloud 'by hand' without manila or ceph

This idea involves the following:

  • we create one or more virtual machines to serve NFS to other cloud VMs.
  • the NFS data is stored locally in the virtual machine disk, which is in turn in ceph.
  • VM clients connect directly to the NFS server, without ever leaving the cloud virtual network.
  • There is no openstack manila or cinder involved.

The NFS server VM is managed using our current standard puppet workflows + horizon.

This is a similar approach to the one used in other services like toolsdb. This approach is included here for completeness, but the architecture has been detected to incur in several flaws.

idea 5: pros & cons

pros:

  • client <-> server NFS traffic never leaves the CloudVPS virtual network.
  • technical contributors have it "easy" to contribute to the management of the system, standard puppet + horizon.
  • simple approach.

cons:

  • Hand-made setup and workflows. No openstack manila which means using no upstream cloud-native components.
  • Local VM storage is a single point of failure. No openstack cinder means we don't leverage the ceph cluster for persistent volumes storage.
  • there are IO performance concerns because data might be crossing through many layers. We don't have any numbers yet, and should be generated in experiments.

idea 6: manila with generic driver using DHSS=false

This idea involves the following:

  • we create one or more virtual machines to serve NFS to other cloud VMs.
  • the NFS data is stored in a cinder volume, which in turn is stored in the ceph cluster.
  • cinder volumes are mounted by VMs using the hypervisor, ceph servers are contacted by cloudvirts.
  • VM clients connect directly to the NFS server virtual machine, without ever leaving the cloud virtual network.
  • We deploy the manila-share service in the NFS server virtual machine.

The NFS server VM is managed using our current standard puppet workflows + horizon.

This idea is similar to idea 3, but the DHSS=false setting should remove some of the concerns with that idea.

idea 6: pros & cons

pros:

  • pure upstream cloud-native openstack solution, shares would be managed using the openstack API instead of our custom code in ops/puppet.git.
  • leverages the ceph storage cluster, no data single point of failure.
  • leverages the nova setup we already have.
  • client <-> server NFS traffic never leaves the CloudVPS virtual network.

cons:

  • adapting several components (cinder, nova, manila) to work together could be more complex than other ideas.
  • apparently supports 26 shares max because that's the highest number of cinder volumes you can attach to a VM (KVM/nova PCI limitation). This should be investigated more.
  • no shares defined using our custom code in ops/puppet.git likely prevents engagement from technical contributors.
  • there are IO performance concerns because data is crossing through many layers. We don't have any numbers yet, and should be generated in experiments.

Manila general notes

  • if using DHSS=true, manila will create standard nova VMs. We need to specify the usual instance parameters: <project id>, <nova flavor id>, <glance image id>, <neutron network id>, <cinder volume type id>, etc
    • Question: how does this play with our puppet setup?
    • Question: project-local puppetmaster? self-signing?


See also