Portal:Cloud VPS/Admin/Network

From Wikitech
Jump to navigation Jump to search

This page explains how the CloudVPS network works, including the neutron Openstack component.

For the sake of explanation, this document uses the eqiad1 deployment as example, but may be others with same mechanisms.

Network topology

There are 2 different kind of network involved:

  • control plane networks: those used in physical servers for SSH, puppet, monitoring, etc. Is a wiki-production network, usually in the 10.x.x.x range.
  • data plane networks: those used by CloudVPS virtual clients, and all the traffic doing ingress/egress through the edge of the network.

On the top-of-rack switches these networks are divided into separate VRFs, or routing-instances. This keeps them as two separate, private domains on the connected cloudsw devices. The CR (Core Router) routers provide a default route to the switches into both of the networks, and thus traffic that needs to route between the two networks flows via the CRs where ACLs/filters are used for policy control.

There are 3 routers involved:

  • The neutron virtual router (by means of neutron-l3-agent, neutron-linuxbridge-agent, neutron-server, etc). This router connects the internal software-defined networks to the cloud edge network.
  • The physical cloudgw router (a pair of linux servers). This router is the main gateway for all CloudVPS ingress/egress traffic, and is the main netowrk endpoint facing the public internet.
  • The physical cloudsw routers. These devices connect cloudgw to the rest of the internet, including wiki-production networks.

Datacenter network

cloudvirts

  • control plane: primary interface (for example eth0) connected to the physical switch in their rack. The switch port connecting to this interface doesn't need any specific configuration.
  • data plane: secondary interface (for example eth1) connected to the physical switch in their rack. This switch port must configured in VLAN tagged mode for vlan 1105.

There as been some research on whether we should collapse the 2 interfaces in one, aiming to reduce usage on 10G ports on the switches. The initial research showed promising results, but we didn't introduce this change yet.

cloudnet

Beware that as of today, our neutron setup uses VRRP over VXLAN to instrument the HA mechanism. Both cloudnet nodes need to share the same VLAN for the control plane. See https://phabricator.wikimedia.org/T319539
  • control plane: primary interface (for example eth0) connected to the physical switch in their rack. The switch port connecting to this interface doesn't need any specific configuration.
  • data plane: secondary interface (for example eth1) connected to the physical switch in their rack. This switch port must configure a VLAN trunk with vlan 1105 and vlan 1107.

cloudgw

  • control plane: primary interface (for example eth0) connected to the physical switch in their rack. The switch port connecting to this interface doesn't need any specific configuration.
  • data plane: secondary interface (for example eth1) connected to the physical switch in their rack. The switch port must configure a VLAN trunk with vlan 1120 and vlan 1107.
  • These hosts will failover automatically, so for a full reboot just take one down, wait for it to come up, run the tests cookbook (wmcs.openstack.network.tests, see this to set it up on your laptop), and then reboot the other.

ceph osd

  • ceph control (ssh, monitoring, mon communication, client communication) plane:
    • Primary interface on external card (for example ens2f0np0)
    • 10.64.20.0/24 network
    • Connected to the physical switch in their rack
    • The switch port connecting to this interface needs to configure untagged vlan 1118 (cloud-hosts1-eqiad).
  • ceph data plane (osd to osd communication):
    • Secondary interface on external card (for example ens2f1np1)
    • 192.168.4.0/24 network
    • Connected to the physical switch in their rack
    • The switch port connecting to this interface needs to configure untagged vlan 1105 (cloud-storage1-eqiad).

ceph mons

  • ceph control (ssh, monitoring, client communication, osd communication) plane:
    • Primary interface on external card (for example ens2f0np0)
    • 10.64.20.0/24 network
    • Connected to the physical switch in their rack
    • The switch port connecting to this interface needs to configure untagged vlan 1118 (cloud-hosts1-eqiad)

Edge network

  • neutron manages floating IP NAT and all the software defined network in the virtual realm.
  • cloudgw handles routing_source_ip and dmz_cidr and connects neutron to cloudsw.
  • cloudsw connects to the internet and the rest of wiki-production networks.

Virtual network

TODO. Inside the virtual realm.


Topology data example

Eqiad

In the case of the eqiad1 deployment, the relevant elements for the cloud network are:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN for instances lan-flat-cloudinstances2b cloud-instances2-b-eqiad cloud-instances2-b-eqiad (vlan 1105) 172.16.0.0/21 vlan 1105 cidr
WAN for floating IPs wan-transport-eqiad cloud-eqiad1-floating --- (no vlan) 185.15.56.0/25 cidr
WAN for transport wan-transport-eqiad cloud-gw-transport-eqiad cloud-gw-transport-eqiad (vlan 1107) 185.15.56.236/30 vlan 1107 cidr
WAN for transport --- (ignored by neutron) --- (ignored by neutron) cloud-instances-transport1-b-eqiad (vlan 1120) 185.15.56.240/29 vlan 1120 cidr


Per-rack networks are shown below. 'Legacy' LAN ranges in the production realm connect existing hosts, but no new hosts should be added to them. The new, per-rack vlans/subnets get used instead.

Rack C8:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-c8-eqiad (vlan 1128) 10.64.151.0/24, 2620:0:861:11f::/64 vlan 1128 ipv4 ipv6
Legacy LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-eqiad (vlan 1118) 10.64.20.0/24, 2620:0:861:118::/64 vlan 1118 ipv4 ipv6
Storage Network --- --- cloud-storage1-eqiad (vlan 1106) 192.168.4.0/24 vlan 1106 N/A

Rack D5:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-d5-eqiad (vlan 1127) 10.64.150.0/24, 2620:0:861:11e::/64 vlan 1127 ipv4 ipv6
Legacy LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-eqiad (vlan 1118) 10.64.20.0/24, 2620:0:861:118::/64 vlan 1118 ipv4 ipv6
Storage Network --- --- cloud-storage1-eqiad (vlan 1106) 192.168.4.0/24 vlan 1106 N/A

Rack E4:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-e4-eqiad (vlan 1123) 10.64.148.0/24, 2620:0:861:11c::/64 vlan 1123 ipv4 ipv6
Storage Network --- --- cloud-storage1-e4-eqiad (vlan 1121) 192.168.5.0/24 vlan 1121 [N/A]

Rack F4:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN provider (control plane) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-f4-eqiad (vlan 1124) 10.64.149.0/24, 2620:0:861:11d::/64 vlan 1124 ipv4 ipv6
Storage Network --- --- cloud-storage1-e4-eqiad (vlan 1122) 192.168.6.0/24 vlan 1122 [N/A]

Codfw

In the case of the codfw1dev deployment, the relevant elements are:

What Neutron network object Neutron subnet object Physical name Addressing Netbox
LAN for instances lan-flat-cloudinstances2b cloud-instances2-b-codfw cloud-instances2-b-codfw (vlan 2105) 172.16.128.0/24 vlan 2105 cidr
WAN for floating IPs wan-transport-codfw cloud-codfw1dev-floating --- (no vlan) 185.15.57.0/29 cidr
WAN for transport wan-transport-codfw cloud-gw-transport-codfw cloud-gw-transport-codfw (vlan 2107) 185.15.57.8/30 vlan 2107 cidr
LAN provider (HW servers) --- (ignored by neutron) --- (ignored by neutron) cloud-hosts1-b-codfw (vlan 2118) 10.192.20.0/24 vlan 2118 cidr
WAN for transport --- (ignored by neutron) --- (ignored by neutron) cloud-instances-transport1-b-codfw (vlan 2120) 185.15.56.240/29 vlan 2120 cidr

Ingress & Egress

Some notes on the ingress & egress particularities.

routing_source_ip

By default, all the traffic from VMs to the Internet (egress) is source NATed using a single IPv4 address. This address is called routing_source_ip.

There are 2 cases in which this egress NAT is not applied:

dmz_cidr

The dmz_cidr mechanisms allows us to define certain IP ranges to which VMs can talk to directly without NAT being involved.

A typical configuration per deployment looks like (please refer to ops/puppet.git for actual hiera values):

profile::openstack::eqiad1::cloudgw::dmz_cidr:
 # VMs --> wiki (text-lb.eqiad)
 - "172.16.0.0/21 . 208.80.154.224"
 # VMs --> wiki (upload-lb.eqiad)
 - "172.16.0.0/21 . 208.80.154.240"

You can read these config as: do not apply NAT to connections src:dst, src:dst, src:dst.

Please note that the dmz_cidr mechanism takes precedence over the routing_source_ip configuration.

A static route is required on the routers so return traffic knows what path to take to reach the Cloud Private IPs.

For example on cr1/2-eqiad: routing-options static route 172.16.0.0/21 next-hop 185.15.56.244/29

Floating IPs

This mechanisms allows us to create an additional public IPv4 address in Neutron. Then this new IP address will be associated with a given instance and all of his egress/ingress traffic will use it (both SNAT and DNAT).

A quota needs to be previously assigned to a project due to limited resources.

Please note that the dmz_cidr mechanism overrides floating IP NAT configurations, and you can see non-NATed packets arriving at VMs with a floating IP assigned.

Here is an example of 3 software defined floating IPs created by Neutron in the codfw1dev deployment, not using eqiad1 for brevity, but it works exactly the same:

root@cloudnet2003-dev:~ # nft -s list chain ip nat neutron-l3-agent-float-snat
table ip nat {
	chain neutron-l3-agent-float-snat {
		ip saddr 172.16.128.19 counter snat to 185.15.57.2 fully-random
		ip saddr 172.16.128.20 counter snat to 185.15.57.4 fully-random
		ip saddr 172.16.128.26 counter snat to 185.15.57.6 fully-random
	}
}
root@cloudnet2003-dev:~ # nft -s list chain ip nat neutron-l3-agent-OUTPUT
table ip nat {
	chain neutron-l3-agent-OUTPUT {
		ip daddr 185.15.57.2 counter dnat to 172.16.128.19
		ip daddr 185.15.57.4 counter dnat to 172.16.128.20
		ip daddr 185.15.57.6 counter dnat to 172.16.128.26
	}
}


traffic from instance to own floating IP

VM instances may try having traffic to its own floating IP. As described in T217681#5035533 - Cloud VPS instance with floating (public) IP can not ping that IP directly, this is not possible with default configuration.
That packet arriving the VM instance would be a martian packet.

A workaround of this is to instruct the network stack to allow this kind of martian packet:

sysctl net.ipv4.conf.all.accept_local=1
accept_local - BOOLEAN
	Accept packets with local source addresses. In combination with
	suitable routing, this can be used to direct packets between two
	local interfaces over the wire and have them accepted properly.
	default FALSE

ingress & egress data example

Some important IP addresses in the eqiad1 deployment:

Type Name Address Explanation Where is defined, where to change it DNS FQDN
ingress incoming gateway 185.15.56.244/29 neutron address in the WAN transport subnet for ingress Core routers (static route) & neutron main router object cloudinstances2b-gw.openstack.eqiad1.wikimediacloud.org
egress routing_source_ip 185.15.56.1 IP address for main source NAT for VMs (mind dmz_cidr exclusions) /etc/neutron/l3_agent.ini in cloudnet nodes (puppet). No NIC has this IP assigned. nat.openstack.eqiad1.wikimediacloud.org

Some important IP addresses in the codfw1dev deployment:

Type Name Address Explanation Where is defined, where to change it DNS FQDN
ingress incoming gateway 208.80.153.190/29 neutron address in the WAN transport subnet for ingress Core routers (static route) & neutron main router object cloudinstances2b-gw.openstack.codfw1dev.wikimediacloud.org
egress routing_source_ip 185.15.57.1 IP address for main source NAT for VMs (mind dmz_cidr exclusions) /etc/neutron/l3_agent.ini in cloudnet nodes (puppet). No NIC has this IP assigned. nat.openstack.codfw1dev.wikimediacloud.org

What Neutron is doing

This section tries to give some light on how Neutron is implementing our network topology under the hood, and what is doing with all this configuration.

Neutron uses 2 specific boxes: cloudnetXXXX.site.wmnet and cloudnetXXXX.site.wmnet (active-standby).
The neutron-server service (daemon, API, etc) runs on cloudcontrol boxes. All the agents run in cloudnet boxes, execept neutron-linuxbridge-agent, which runs in cloudvirt boxes.

When a virtual router is created, and assigned to an l3-agent, a linux network namespace (netns for short) will be created:

This netns will hold all the configuration: IP addresses (such as gateways, floating IPs), iptables rules (NAT, filtering, etc), and other information (static routes, etc).
Using virtual taps, this automatically-generated netns is connected to the main netns where the physical NICs live, along with bridges and vlan tagged subinterfaces.

All this is done in the eth1 interface, while eth0 is left for connection of the cloudnet box to the provider network.

When a virtual router is created, Neutron will decide in which l3-agent will be deploying it, taking into account HA parameters.
In our active-standby setup, only one l3-agent is active at a time, which means that all this netns/interfaces/iptables configuration is deployed by Neutron to just one node.

The 'q-' prefix in netns is from earlier development stages, Neutron was called Quantum.

Security policy

TODO: talk about security groups, dmz_cidr exclusion, core route filtering, etc

See also