Portal:Cloud VPS/Admin/Network

From Wikitech

This page explains how the CloudVPS network works, including the neutron Openstack component.

For the sake of explanation, this document uses the eqiad1 deployment as example, but may be others with same mechanisms.

Network topology

Networks used by cloud servers and other elements can be broadly divided into two categories:

  • management plane networks: those used in physical servers for SSH, puppet, monitoring, etc. is a wiki-production network, usually in the 10.x.x.x range.
  • control/data plane networks: includes he cloud-private networks, those used by CloudVPS virtual clients, and all the ingress/egress traffic at the edge of the network.

On the top-of-rack switches these networks are divided into separate VRFs, or routing-instances. This keeps them as two separate, private domains on cloudsw's, supporting the agreed separation of production and cloud realms. The CR (Core) routers are connected to both realms, and in some cases can forward traffic between them, although this is discouraged.

There are 3 routers involved:

  • The neutron virtual router (by means of neutron-l3-agent, neutron-linuxbridge-agent, neutron-server, etc). This router connects the internal software-defined networks to the cloud edge network.
  • The physical cloudgw router (a pair of linux servers). This router is the main gateway for the 'nerutron virutal router, handling all CloudVPS ingress/egress traffic, and is the main network endpoint facing the public internet.
  • The physical cloudsw routers. These devices connect cloudgw to the rest of the internet, including wiki-production networks.

Datacenter network

Cloud racks vs WMF racks

True "cloud hosts" should be placed in dedicated WMCS racks, connected to a cloudsw (currently C8/D5/E4/F4 in eqiad and B1 in codfw).

Certain hosts have names starting with 'cloud', but are not solely managed by the cloud services team, and don't connect to any cloud networks. These exceptions need to be racked in non-WMCS racks and connected to regular WMF production vlans:

clouddb
clouddumps
cloudelastic
cloudweb


Aside from those all hosts with names starting 'cloud*' are considered cloud hosts, and should be connected as described below.


Physical Connection

Cloud hosts require only a single physical switch connection, with the exception of cloudceph* nodes.

All cloud hosts connect to multiple networks, using 802.1q vlan tagging/trunking for separation on their single link. The naming convention for host sub-interfaces is vlan<VLAN_ID>, e.g. "vlan1105".

Cloud ceph hosts are the exception to this rule, and will continue to require two links until consensus can be reached on moving them to one link. There are also certain hosts in production with 2 physical network connections, but we are in the process of migrating those to single-link (see T319184).

DC-Ops should select cloud-hosts as the vlan type when adding any cloud host to netbox.

Prior to the host reimage step some manual Netbox changes are required, to set the switch-port to mode "tagged" and add the additional vlans needed. This will be automated soon (see T346428), until then feel free to reach out to Cathal to complete this step.

The vlan setup required for all cloud hosts is listed in the next section.

Networks by Server Type

Unless otherwise stated all cloud* hosts connect to these networks:

cloud-hosts (untagged)
cloud-private

Certain hosts differ from this as follows:

cloudvirts

cloud-hosts (untagged)
cloud-private
cloud-instance

cloudnet

cloud-hosts (untagged)
cloud-private
cloud-instance
cloud-instance-transport

cloudgw

cloud-hosts (untagged)
cloud-gw-transport
cloud-instance-transport

cephosd

Physical Link 1:
cloud-hosts (untagged)
cloud-private

Physical Link 2:
cloud-storage (untagged)

cephmon

Physical Link 1:
cloud-hosts (untagged)
cloud-private

Physical Link 2:
cloud-storage (untagged)

Network / Vlan usage

Information on how these networks are built on the netops-managed physical network can be found here. Below is a description of what each vlan is used for:

cloud-hosts (wmf production realm)

Every cloud host's primary connection is to a cloud-host vlan, which uses 10.x IPv4 addressing and belongs to the WMF production realm. This provides the management plane connection for hosts, used to bootstrap the device during reimage, puppet, ssh, management, monitoring etc. Host IPs on this network belong to the <site>.wmnet. domain. Traffic for this network is sent untagged on the wire.

cloud-private (cloud realm)

Every cloud host also has a connection to a cloud-private vlan, which uses 172.20.x IPv4 addressing and belongs to the cloud realm/vrf. This network is used by the openstack control plane and related services such as dns. Host IPs on this network belong to the private.<site>.wikimedia.cloud. domain. The connection is delivered using a vlan tag over the main physical link.

Certain hosts announce additional service IPs (both public and private) to the attached cloudsw using BGP over this vlan. Public service IPs announced in this way are reachable from the internet.

cloud-instances (cloud realm)

CloudVPS instances/VMs connect to a dedicated cloud-instance network. These use IPv4 addressing in the 172.16.x range, and belong to the cloud realm. OpenStack takes care of all IPAM for these ranges, which belong to the <site>.wikimedia.cloud. domain. Netbox has no role in IP assignment for the vlan, and no physical network elements have an IP in the subnet. Unlike the other vlans, which are local to each cloud switch/rack, this vlan is stretched across all cloud racks at layer-2.

cloud-storage (cloud realm)

Cloud-storage is used on cloud ceph hosts to provide the ceph 'cluster' network. Ceph hosts use IPv4 addressing in the 192.168.x range on this interface, which belongs to the cloud realm/vrf. Host IPs on these networks are managed by the cloud team outside of Netbox. This is currently delivered on a separate, secondary physical link to hosts.

cloud-instance-transport (cloud realm)

Cloud-instance-transport is a special network used for edge routing between cloudsw (netops) and cloudgw (wmcs). The two CloudGW servers at each site run keepalived over this interface to share a VIP. The cloudsw route the VPS instance and public NAT ranges to this VIP. IPs on this network are public and in the <site>.wikimediacloud.org domain. This connects the CloudGWs to the cloud realm/vrf, removing the need for them to have a leg in the cloud-private vlan (see T338334). The connection is delivered using a vlan tag over the main physical link.

cloud-gw-transport (cloud relam)

Cloud-gw-transport is a special network used for routing between cloudgw and cloudnet (neutron). The servers on both sides use a VIP to route traffic between each other for high-availability. The vlan exists only at layer-2 on the cloud switches, i.e. no switches have an IP in this range. IPs on this network are public in the <site>.wikimediacloud.org domain. The connection is delivered using a vlan tag over the main physical link.

Edge network

  • neutron manages floating IP NAT and all the software defined network in the virtual realm.
  • cloudgw handles routing_source_ip and dmz_cidr and connects neutron to cloudsw.
  • cloudsw connects to the internet and the rest of wiki-production networks.

Virtual network

TODO. Inside the virtual realm.


Topology data example

Eqiad

In the case of the eqiad1 deployment, the relevant elements for the cloud network are:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN for instances lan-flat-cloudinstances2b cloud-instances2-b-eqiad cloud-instances2-b-eqiad (vlan 1105) 172.16.0.0/21 vlan 1105 cidr
WAN for floating IPs wan-transport-eqiad cloud-eqiad1-floating --- (no vlan) 185.15.56.0/25 cidr
WAN for transport wan-transport-eqiad cloud-gw-transport-eqiad cloud-gw-transport-eqiad (vlan 1107) 185.15.56.236/30 vlan 1107 cidr
WAN for transport --- (ignored by neutron) --- (ignored by neutron) cloud-instances-transport1-b-eqiad (vlan 1120) 185.15.56.240/29 vlan 1120 cidr


Per-rack networks are shown below. 'Legacy' LAN ranges in the production realm connect existing hosts, but no new hosts should be added to them. The new, per-rack vlans/subnets get used instead.

Rack C8:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-c8-eqiad (vlan 1128) 10.64.151.0/24, 2620:0:861:11f::/64 vlan 1128 ipv4 ipv6
Legacy LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-eqiad (vlan 1118) 10.64.20.0/24, 2620:0:861:118::/64 vlan 1118 ipv4 ipv6
Storage Network --- --- cloud-storage1-eqiad (vlan 1106) 192.168.4.0/24 vlan 1106 N/A

Rack D5:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-d5-eqiad (vlan 1127) 10.64.150.0/24, 2620:0:861:11e::/64 vlan 1127 ipv4 ipv6
Legacy LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-eqiad (vlan 1118) 10.64.20.0/24, 2620:0:861:118::/64 vlan 1118 ipv4 ipv6
Storage Network --- --- cloud-storage1-eqiad (vlan 1106) 192.168.4.0/24 vlan 1106 N/A

Rack E4:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-e4-eqiad (vlan 1123) 10.64.148.0/24, 2620:0:861:11c::/64 vlan 1123 ipv4 ipv6
Storage Network --- --- cloud-storage1-e4-eqiad (vlan 1121) 192.168.5.0/24 vlan 1121 [N/A]

Rack F4:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-f4-eqiad (vlan 1124) 10.64.149.0/24, 2620:0:861:11d::/64 vlan 1124 ipv4 ipv6
Storage Network --- --- cloud-storage1-e4-eqiad (vlan 1122) 192.168.6.0/24 vlan 1122 [N/A]

Codfw

In the case of the codfw1dev deployment, the relevant elements are:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN for instances lan-flat-cloudinstances2b cloud-instances2-b-codfw cloud-instances2-b-codfw (vlan 2105) 172.16.128.0/24 vlan 2105 cidr
WAN for floating IPs wan-transport-codfw cloud-codfw1dev-floating --- (no vlan) 185.15.57.0/29 cidr
WAN for transport wan-transport-codfw cloud-gw-transport-codfw cloud-gw-transport-codfw (vlan 2107) 185.15.57.8/30 vlan 2107 cidr
LAN provider (HW servers) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-b-codfw (vlan 2118) 10.192.20.0/24 vlan 2118 cidr
WAN for transport --- (ignored by neutron) --- (ignored by neutron) cloud-instances-transport1-b-codfw (vlan 2120) 185.15.56.240/29 vlan 2120 cidr

Ingress & Egress

Some notes on the ingress & egress particularities.

routing_source_ip

By default, all the traffic from VMs to the Internet (egress) is source NATed using a single IPv4 address. This address is called routing_source_ip.

There are 2 cases in which this egress NAT is not applied:

dmz_cidr

The dmz_cidr mechanisms allows us to define certain IP ranges to which VMs can talk to directly without NAT being involved.

A typical configuration per deployment looks like (please refer to ops/puppet.git for actual hiera values):

profile::openstack::eqiad1::cloudgw::dmz_cidr:
 # VMs --> wiki (text-lb.eqiad)
 - "172.16.0.0/21 . 208.80.154.224"
 # VMs --> wiki (upload-lb.eqiad)
 - "172.16.0.0/21 . 208.80.154.240"

You can read these config as: do not apply NAT to connections src:dst, src:dst, src:dst.

Please note that the dmz_cidr mechanism takes precedence over the routing_source_ip configuration.

A static route is required on the routers so return traffic knows what path to take to reach the Cloud Private IPs.

For example on cr1/2-eqiad: routing-options static route 172.16.0.0/21 next-hop 185.15.56.244/29

Floating IPs

This mechanisms allows us to create an additional public IPv4 address in Neutron. Then this new IP address will be associated with a given instance and all of his egress/ingress traffic will use it (both SNAT and DNAT).

A quota needs to be previously assigned to a project due to limited resources.

Please note that the dmz_cidr mechanism overrides floating IP NAT configurations, and you can see non-NATed packets arriving at VMs with a floating IP assigned.

Here is an example of 3 software defined floating IPs created by Neutron in the codfw1dev deployment, not using eqiad1 for brevity, but it works exactly the same:

root@cloudnet2003-dev:~ # nft -s list chain ip nat neutron-l3-agent-float-snat
table ip nat {
	chain neutron-l3-agent-float-snat {
		ip saddr 172.16.128.19 counter snat to 185.15.57.2 fully-random
		ip saddr 172.16.128.20 counter snat to 185.15.57.4 fully-random
		ip saddr 172.16.128.26 counter snat to 185.15.57.6 fully-random
	}
}
root@cloudnet2003-dev:~ # nft -s list chain ip nat neutron-l3-agent-OUTPUT
table ip nat {
	chain neutron-l3-agent-OUTPUT {
		ip daddr 185.15.57.2 counter dnat to 172.16.128.19
		ip daddr 185.15.57.4 counter dnat to 172.16.128.20
		ip daddr 185.15.57.6 counter dnat to 172.16.128.26
	}
}


traffic from instance to own floating IP

VM instances may try having traffic to its own floating IP. As described in T217681#5035533 - Cloud VPS instance with floating (public) IP can not ping that IP directly, this is not possible with default configuration.
That packet arriving the VM instance would be a martian packet.

A workaround of this is to instruct the network stack to allow this kind of martian packet:

sysctl net.ipv4.conf.all.accept_local=1
accept_local - BOOLEAN
	Accept packets with local source addresses. In combination with
	suitable routing, this can be used to direct packets between two
	local interfaces over the wire and have them accepted properly.
	default FALSE

ingress & egress data example

Some important IP addresses in the eqiad1 deployment:

Type Name Address Explanation Where is defined, where to change it DNS FQDN
ingress incoming gateway 185.15.56.244/29 neutron address in the WAN transport subnet for ingress Core routers (static route) & neutron main router object cloudinstances2b-gw.openstack.eqiad1.wikimediacloud.org
egress routing_source_ip 185.15.56.1 IP address for main source NAT for VMs (mind dmz_cidr exclusions) /etc/neutron/l3_agent.ini in cloudnet nodes (puppet). No NIC has this IP assigned. nat.openstack.eqiad1.wikimediacloud.org

Some important IP addresses in the codfw1dev deployment:

Type Name Address Explanation Where is defined, where to change it DNS FQDN
ingress incoming gateway 208.80.153.190/29 neutron address in the WAN transport subnet for ingress Core routers (static route) & neutron main router object cloudinstances2b-gw.openstack.codfw1dev.wikimediacloud.org
egress routing_source_ip 185.15.57.1 IP address for main source NAT for VMs (mind dmz_cidr exclusions) /etc/neutron/l3_agent.ini in cloudnet nodes (puppet). No NIC has this IP assigned. nat.openstack.codfw1dev.wikimediacloud.org

What Neutron is doing

This section tries to give some light on how Neutron is implementing our network topology under the hood, and what is doing with all this configuration.

Neutron uses 2 specific boxes: cloudnetXXXX.site.wmnet and cloudnetXXXX.site.wmnet (active-standby).
The neutron-server service (daemon, API, etc) runs on cloudcontrol boxes. All the agents run in cloudnet boxes, execept neutron-linuxbridge-agent, which runs in cloudvirt boxes.

When a virtual router is created, and assigned to an l3-agent, a linux network namespace (netns for short) will be created:

This netns will hold all the configuration: IP addresses (such as gateways, floating IPs), iptables rules (NAT, filtering, etc), and other information (static routes, etc).
Using virtual taps, this automatically-generated netns is connected to the main netns where the physical NICs live, along with bridges and vlan tagged subinterfaces.

All this is done in the eth1 interface, while eth0 is left for connection of the cloudnet box to the provider network.

When a virtual router is created, Neutron will decide in which l3-agent will be deploying it, taking into account HA parameters.
In our active-standby setup, only one l3-agent is active at a time, which means that all this netns/interfaces/iptables configuration is deployed by Neutron to just one node.

The 'q-' prefix in netns is from earlier development stages, Neutron was called Quantum.

Security policy

TODO: talk about security groups, dmz_cidr exclusion, core route filtering, etc

See also