Portal:Cloud VPS/Admin/Neutron
This page explains how the neutron Openstack component is used in our Cloud VPS service.
For the sake of explanation, this document uses the eqiad1 deployment as example, but may be others with same mechanisms.
Network topology
Mainly, there are 4 networks involved:
- LAN for instances. Each VM gets one IP from this network when created. Private addressing.
- WAN for instances transport. This subnet connects the Neutron virtual router with the external core router. Private addressing.
- WAN for floating IPs. This subnet consist of a pool of IP addressed to be used for internet->VM NATs. Public addressing.
- LAN for physical servers. Each physical net/virt Openstack server is wired to this subnet. AKA provider network. Private addressing.
Also mainly, there are 2 routers involved:
- The Neutron virtual router (by means of l3-agents, neutron-server, etc). This router connects LAN and WANs.
- The external (and physical) core router. This router is the final gateway between the deployment networks and the rest of the WMF networks (and internet).
The main Neutron router is deployed in HA (active-standby). The relevant IP addresses (gateways, etc) are associated with a router object which is managed by Neutron and can be moved from one cloudnetXXXX node to another.
Physical connections
All physical servers in the deployment have an eth0 interface (usually 1G) connected to the physical switch in their rack. This is the interface used for ssh management (LAN provider network). The switch port connecting to this interface doesn't need any specific configuration.
Additionally, both cloudnet and cloudvirt servers have an eth1 interface (usually 10G in cloudnets, 10G in cloudvirts) which is connected to the physical switch in their rack using a trunk with several vlans (for transport subnet, instances, etc). The switch port connecting to eth1 needs this specific configuration active (cloud-virt-instance-trunk in asw2 and cloud-instance-ports in asw) in order for packets to circulate. Also, servers need all the vlan tagged interfaces and bridges created (this is done with puppet).
Usually this switch trunk contains all cloud related vlans, so we can move cloudvirt servers between deployments (different vlans) without having to change switch configurations.
There as been some research on whether we should collapse the 2 interfaces in one, aiming to reduce usage on 10G ports on the switches. The initial research showed promising results, but we didn't introduce this change yet.
Topology data example
In the case of the eqiad1 deployment, the relevant elements are:
What | Neutron network object | Neutron subnet object | Physical name | Addressing | Netbox |
---|---|---|---|---|---|
LAN for instances | lan-flat-cloudinstances2b | cloud-instances2-b-eqiad | cloud-instances2-b-eqiad (vlan 1105) | 172.16.0.0/21 | vlan 1105 cidr |
WAN for transport | wan-transport-eqiad | cloud-instances-transport1-b-eqiad | cloud-instances-transport1-b-eqiad (vlan 1120) | 185.15.56.240/29 | vlan 1120 cidr |
WAN for floating IPs | wan-transport-eqiad | cloud-eqiad1-floating | --- (no vlan) | 185.15.56.0/25 | cidr |
LAN provider (HW servers) | --- (ignored by neutron) | --- (ignored by neutron) | cloud-hosts1-b-eqiad (vlan 1118) | 10.64.20.0/24 | vlan 1118 cidr |
In the case of the codfw1dev deployment, the relevant elements are:
What | Neutron network object | Neutron subnet object | Physical name | Addressing | Netbox |
---|---|---|---|---|---|
LAN for instances | lan-flat-cloudinstances2b | cloud-instances2-b-codfw | cloud-instances2-b-codfw (vlan 2105) | 172.16.128.0/24 | vlan 2105 cidr |
WAN for transport | wan-transport-codfw | cloud-instances-transport1-b-codfw | cloud-instances-transport1-b-codfw (vlan 2120) | 208.80.153.184/29 | vlan 2120 cidr |
WAN for floating IPs | wan-transport-codfw | cloud-codfw1dev-floating | --- (no vlan) | 185.15.57.0/29 | cidr |
LAN provider (HW servers) | --- (ignored by neutron) | --- (ignored by neutron) | cloud-hosts1-b-codfw (vlan 2118) | 10.192.20.0/24 | vlan 2118 cidr |
Other topology considerations
Other information to take into account regarding topology.
2020 network refresh project
Please note that in the 2020 network refresh project we are experimenting a new edge network setup. In particular, as of this writing:
- the
eqiad1
deployment has acloudsw
device in the edge network, see Phabricator T265288 - Enable L3 routing on cloudsw nodes. - the
codfw1dev
deployment has acloudgw
device in the edge network, see Phabricator T261724 - cloudgw: evaluate / validate setup in codfw1dev.
Compat networking
The compat networking is no longer relevant. The setup was originally developed in T202636 to allow communication between VMs in nova-network and neutron deployments. Then it was later dropped in T223923.
Ingress & Egress
The ingress traffic is handled by the core router, which has an explicit static route pointing to the address of the Neutron router in the WAN transport subnet. Same happens in the case of floating IPs.
routing_source_ip
By default, all the traffic from VMs to the Internet (egress) is source NATed using a single IPv4 address. This address is called routing_source_ip.
There are 2 cases in which this egress NAT is not applied:
- the VM->destination is some internal WMF network (#dmz_cidr exclusions)
- the VM has an explicit floating ip associated (the floating ip will be used as both SNAT and DNAT)
These mechanisms (routing_source_ip and dmz_cidr) have been customly added to Neutron, see section below for further details on this customization.
dmz_cidr
The dmz_cidr mechanisms allows us to define certain IP ranges to which VMs can talk to directly without NAT being involved.
This allows us to offer services to VMs easily, implementing access control in those services, etc.
One classic example is NFS stores, which would like to see actual VM IP addresses rather than generic NAT addresses.
A typical configuration per deployment looks like (please refer to ops/puppet.git for actual hiera values):
profile::openstack::eqiad1::neutron::dmz_cidr: - 172.16.0.0/21:91.198.174.0/24 - 172.16.0.0/21:198.35.26.0/23 - 172.16.0.0/21:10.0.0.0/8 - 172.16.0.0/21:208.80.152.0/22 - 172.16.0.0/21:103.102.166.0/24
You can read these config as: do not apply NAT to connections src:dst, src:dst, src:dst.
Please note that the dmz_cidr mechanism takes precedence over both floating IPs and routing_source_ip configurations. This means that no NAT will be applied if the packet src/dst addresses matches any of the configured in dmz_cidr.
In addition, a static route is required on the routers so return traffic knows what path to take to reach the Cloud Private IPs.
For example on cr1/2-eqiad: routing-options static route 172.16.0.0/21 next-hop 185.15.56.244/29
Floating IPs
This mechanisms allows us to create an additional public IPv4 address in Neutron. Then this new IP address will be associated with a given instance and all of his egress/ingress traffic will use it (both SNAT and DNAT).
A quota needs to be previously assigned to a project due to limited resources.
Please note that the dmz_cidr mechanism overrides floating IP NAT configurations, and you can see non-NATed packets arriving at VMs with a floating IP assigned.
Here is an example of 3 software defined floating IPs created by Neutron in the codfw1dev deployment, not using eqiad1 for brevity, but it works exactly the same:
root@cloudnet2003-dev:~ # nft -s list chain ip nat neutron-l3-agent-float-snat
table ip nat {
chain neutron-l3-agent-float-snat {
ip saddr 172.16.128.19 counter snat to 185.15.57.2 fully-random
ip saddr 172.16.128.20 counter snat to 185.15.57.4 fully-random
ip saddr 172.16.128.26 counter snat to 185.15.57.6 fully-random
}
}
root@cloudnet2003-dev:~ # nft -s list chain ip nat neutron-l3-agent-OUTPUT
table ip nat {
chain neutron-l3-agent-OUTPUT {
ip daddr 185.15.57.2 counter dnat to 172.16.128.19
ip daddr 185.15.57.4 counter dnat to 172.16.128.20
ip daddr 185.15.57.6 counter dnat to 172.16.128.26
}
}
traffic from instance to own floating IP
VM instances may try having traffic to its own floating IP. As described in T217681#5035533 - Cloud VPS instance with floating (public) IP can not ping that IP directly, this is not possible with default configuration.
That packet arriving the VM instance would be a martian packet.
A workaround of this is to instruct the network stack to allow this kind of martian packet:
sysctl net.ipv4.conf.all.accept_local=1
accept_local - BOOLEAN Accept packets with local source addresses. In combination with suitable routing, this can be used to direct packets between two local interfaces over the wire and have them accepted properly. default FALSE
ingress & egress data example
Some important IP addresses in the eqiad1 deployment:
Type | Name | Address | Explanation | Where is defined, where to change it | DNS FQDN |
---|---|---|---|---|---|
ingress | incoming gateway | 185.15.56.244/29 | neutron address in the WAN transport subnet for ingress | Core routers (static route) & neutron main router object | cloudinstances2b-gw.openstack.eqiad1.wikimediacloud.org
|
egress | routing_source_ip | 185.15.56.1 | IP address for main source NAT for VMs (mind dmz_cidr exclusions) | /etc/neutron/l3_agent.ini in cloudnet nodes (puppet). No NIC has this IP assigned. | nat.openstack.eqiad1.wikimediacloud.org
|
Some important IP addresses in the codfw1dev deployment:
Type | Name | Address | Explanation | Where is defined, where to change it | DNS FQDN |
---|---|---|---|---|---|
ingress | incoming gateway | 208.80.153.190/29 | neutron address in the WAN transport subnet for ingress | Core routers (static route) & neutron main router object | cloudinstances2b-gw.openstack.codfw1dev.wikimediacloud.org
|
egress | routing_source_ip | 185.15.57.1 | IP address for main source NAT for VMs (mind dmz_cidr exclusions) | /etc/neutron/l3_agent.ini in cloudnet nodes (puppet). No NIC has this IP assigned. | nat.openstack.codfw1dev.wikimediacloud.org
|
What Neutron is doing
This section tries to give some light on how Neutron is implementing our network topology under the hood, and what is doing with all this configuration.
Neutron uses 2 specific boxes: cloudnetXXXX.site.wmnet and cloudnetXXXX.site.wmnet (active-standby).
The neutron-server service (daemon, API, etc) runs on cloudcontrol boxes.
All the agents run in cloudnet boxes, execept neutron-linuxbridge-agent, which runs in cloudvirt boxes.
Example of running agents |
---|
The following content has been placed in a collapsed box for improved usability. |
root@cloudcontrol1003:~# neutron agent-list
+--------------------------------------+--------------------+---------------+-------------------+-------+----------------+---------------------------+
| id | agent_type | host | availability_zone | alive | admin_state_up | binary |
+--------------------------------------+--------------------+---------------+-------------------+-------+----------------+---------------------------+
| 468aef2a-8eb6-4382-abba-bc284efd9fa5 | DHCP agent | cloudnet1004 | nova | :-) | True | neutron-dhcp-agent |
| 601bef99-b53c-4e6a-b384-65d1feebedff | Metadata agent | cloudnet1003 | | :-) | True | neutron-metadata-agent |
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | L3 agent | cloudnet1003 | nova | :-) | True | neutron-l3-agent |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | L3 agent | cloudnet1004 | nova | :-) | True | neutron-l3-agent |
| 9f8833de-11a4-4395-8da5-f57fe8326659 | Linux bridge agent | cloudnet1003 | | :-) | True | neutron-linuxbridge-agent |
| ad3461d7-b79e-4279-921d-5a476e296767 | Linux bridge agent | cloudnet1004 | | :-) | True | neutron-linuxbridge-agent |
| b2f9da63-2f16-4aa5-9400-ae708a733f91 | Linux bridge agent | cloudvirt1021 | | :-) | True | neutron-linuxbridge-agent |
| d475e07d-52b3-476e-9a4f-e63b21e1075e | Metadata agent | cloudnet1004 | | :-) | True | neutron-metadata-agent |
| e382a233-e6a0-422e-9d2e-5651082783fc | Linux bridge agent | cloudvirt1022 | | :-) | True | neutron-linuxbridge-agent |
| ff2a8228-3748-4588-927b-4b6563da9ca0 | DHCP agent | cloudnet1003 | nova | :-) | True | neutron-dhcp-agent |
+--------------------------------------+--------------------+---------------+-------------------+-------+----------------+---------------------------+
|
The above content has been placed in a collapsed box for improved usability. |
When a virtual router is created, and assigned to an l3-agent, a linux network namespace (netns for short) will be created:
Example virtual router netns and l3 agents hosting routers |
---|
The following content has been placed in a collapsed box for improved usability. |
root@cloudnet1004:~# ip netns list | grep router
qrouter-d93771ba-2711-4f88-804a-8df6fd03978a
root@cloudcontrol1003:~# neutron l3-agent-list-hosting-router d93771ba-2711-4f88-804a-8df6fd03978a
+--------------------------------------+--------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | cloudnet1003 | True | :-) | active |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | cloudnet1004 | True | :-) | standby |
+--------------------------------------+--------------+----------------+-------+----------+
|
The above content has been placed in a collapsed box for improved usability. |
This netns will hold all the configuration: IP addresses (such as gateways, floating IPs), iptables rules (NAT, filtering, etc), and other information (static routes, etc).
Using virtual taps, this automatically-generated netns is connected to the main netns where the physical NICs live, along with bridges and vlan tagged subinterfaces.
All this is done in the eth1 interface, while eth0 is left for connection of the cloudnet box to the provider network.
When a virtual router is created, Neutron will decide in which l3-agent will be deploying it, taking into account HA parameters.
In our active-standby setup, only one l3-agent is active at a time, which means that all this netns/interfaces/iptables configuration is deployed by Neutron to just one node.
The 'q-' prefix in netns is from earlier development stages, Neutron was called Quantum.
Security policy
TODO: talk about security groups, dmz_cidr exclusion, core route filtering, etc
Neutron customizations
Our Neutron has been customized to bring back (forwardport?) a functionality from the old nova-network days.
The two functionalities added are routing_source_ip and dmz_cidr, and their behavior is explained in the rest of the document.
In a nutshell, we get a couple more config options in /etc/neutron/l3_agent.ini which allow us to implement our CloudVPS use cases.
Currently, these modifications are for the Openstack Mitaka release, and can be found at:
- modules/openstack/files/mitaka/neutron/l3/router_info.py
- modules/openstack/files/mitaka/neutron/l3/config.py
Our router_info.py patch (from puppet repo) |
---|
The following content has been placed in a collapsed box for improved usability. |
diff -u modules/openstack/files/mitaka/neutron/l3/original/router_info.original modules/openstack/files/mitaka/neutron/l3/router_info.py
--- modules/openstack/files/mitaka/neutron/l3/original/router_info.original 2018-04-02 11:15:44.473887748 +0200
+++ modules/openstack/files/mitaka/neutron/l3/router_info.py 2018-04-11 10:08:51.346400040 +0200
@@ -432,7 +432,7 @@
if 'subnets' in port:
for subnet in port['subnets']:
if (netaddr.IPNetwork(subnet['cidr']).version == 6 and
- subnet['cidr'] != l3_constants.PROVISIONAL_IPV6_PD_PREFIX):
+ subnet['cidr'] != l3_constants.PROVISIONAL_IPV6_PD_PREFIX):
return True
def enable_radvd(self, internal_ports=None):
@@ -695,26 +695,38 @@
gw_port = self._router.get('gw_port')
self._handle_router_snat_rules(gw_port, interface_name)
- def external_gateway_nat_fip_rules(self, ex_gw_ip, interface_name):
+ def external_gateway_nat_fip_rules(self, ex_gw_ip, interface_name, dmz_cidr, src_ip):
+ rules = []
+ # Avoid behavior where NAT applies to the actual router IP
+ rules.append(('POSTROUTING', '-s %s -j ACCEPT' % (ex_gw_ip)))
+ if dmz_cidr:
+ for nat_exclusion in dmz_cidr.split(','):
+ src_range, dst_range = nat_exclusion.split(':')
+ rules.append(('POSTROUTING', '-s %s -d %s -j ACCEPT' % (src_range, dst_range)))
+
dont_snat_traffic_to_internal_ports_if_not_to_floating_ip = (
'POSTROUTING', '! -i %(interface_name)s '
'! -o %(interface_name)s -m conntrack ! '
'--ctstate DNAT -j ACCEPT' %
{'interface_name': interface_name})
+ rules.append(dont_snat_traffic_to_internal_ports_if_not_to_floating_ip)
+
# Makes replies come back through the router to reverse DNAT
ext_in_mark = self.agent_conf.external_ingress_mark
snat_internal_traffic_to_floating_ip = (
'snat', '-m mark ! --mark %s/%s '
'-m conntrack --ctstate DNAT '
'-j SNAT --to-source %s'
- % (ext_in_mark, l3_constants.ROUTER_MARK_MASK, ex_gw_ip))
- return [dont_snat_traffic_to_internal_ports_if_not_to_floating_ip,
- snat_internal_traffic_to_floating_ip]
+ % (ext_in_mark, l3_constants.ROUTER_MARK_MASK, src_ip))
+ rules.append(snat_internal_traffic_to_floating_ip)
- def external_gateway_nat_snat_rules(self, ex_gw_ip, interface_name):
+ return rules
+
+ def external_gateway_nat_snat_rules(self, ex_ip, interface_name):
+ # source nat everything left to our chosen external ip
snat_normal_external_traffic = (
'snat', '-o %s -j SNAT --to-source %s' %
- (interface_name, ex_gw_ip))
+ (interface_name, ex_ip))
return [snat_normal_external_traffic]
def external_gateway_mangle_rules(self, interface_name):
@@ -732,29 +744,43 @@
def _add_snat_rules(self, ex_gw_port, iptables_manager,
interface_name):
+
self.process_external_port_address_scope_routing(iptables_manager)
if ex_gw_port:
- # ex_gw_port should not be None in this case
- # NAT rules are added only if ex_gw_port has an IPv4 address
for ip_addr in ex_gw_port['fixed_ips']:
ex_gw_ip = ip_addr['ip_address']
- if netaddr.IPAddress(ex_gw_ip).version == 4:
- if self._snat_enabled:
- rules = self.external_gateway_nat_snat_rules(
- ex_gw_ip, interface_name)
- for rule in rules:
- iptables_manager.ipv4['nat'].add_rule(*rule)
-
- rules = self.external_gateway_nat_fip_rules(
- ex_gw_ip, interface_name)
- for rule in rules:
- iptables_manager.ipv4['nat'].add_rule(*rule)
- rules = self.external_gateway_mangle_rules(interface_name)
- for rule in rules:
- iptables_manager.ipv4['mangle'].add_rule(*rule)
+ if netaddr.IPAddress(ex_gw_ip).version != 4:
+ msg = 'WMF: only ipv4 is supported'
+ raise n_exc.FloatingIpSetupException(msg)
+ break
- break
+ # ex_gw_port should not be None in this case
+ # NAT rules are added only if ex_gw_port has an IPv4 address
+ if self._snat_enabled and self.agent_conf.routing_source_ip:
+ LOG.debug('external_gateway_ip: %s', ex_gw_ip)
+ LOG.debug('routing_source_ip: %s', self.agent_conf.routing_source_ip)
+ self.routing_source_ip = self.agent_conf.routing_source_ip
+ self.dmz_cidr = self.agent_conf.dmz_cidr
+
+ if not netaddr.IPAddress(self.routing_source_ip).version == 4:
+ msg = 'foo %s is not ipv4' % (self.routing_source_ip)
+ raise n_exc.FloatingIpSetupException(msg)
+
+ rules = self.external_gateway_nat_snat_rules(
+ self.routing_source_ip, interface_name)
+ for rule in rules:
+ iptables_manager.ipv4['nat'].add_rule(*rule)
+
+ rules = self.external_gateway_nat_fip_rules(
+ ex_gw_ip, interface_name, self.dmz_cidr, self.routing_source_ip)
+ for rule in rules:
+ LOG.debug('foo self.external_gateway_nat_fip_rules rule: %s', str(rule))
+ iptables_manager.ipv4['nat'].add_rule(*rule)
+
+ rules = self.external_gateway_mangle_rules(interface_name)
+ for rule in rules:
+ iptables_manager.ipv4['mangle'].add_rule(*rule)
def _handle_router_snat_rules(self, ex_gw_port, interface_name):
self._empty_snat_chains(self.iptables_manager)
|
The above content has been placed in a collapsed box for improved usability. |
Our config.py patch (from puppet repo) |
---|
The following content has been placed in a collapsed box for improved usability. |
diff -u modules/openstack/files/mitaka/neutron/l3/original/config.original modules/openstack/files/mitaka/neutron/l3/config.py
--- modules/openstack/files/mitaka/neutron/l3/original/config.original 2018-04-02 11:15:44.473887748 +0200
+++ modules/openstack/files/mitaka/neutron/l3/config.py 2018-04-02 11:15:44.473887748 +0200
@@ -100,6 +100,10 @@
help=_('Iptables mangle mark used to mark ingress from '
'external network. This mark will be masked with '
'0xffff so that only the lower 16 bits will be used.')),
+ cfg.StrOpt('routing_source_ip', default='',
+ help=_('WMF defined src nat IP option')),
+ cfg.StrOpt('dmz_cidr', default='',
+ help=_('WMF defined src nat exclusions "src_range:dst_range,<repeat>')),
]
OPTS += config.EXT_NET_BRIDGE_OPTS
|
The above content has been placed in a collapsed box for improved usability. |
Related phabricator tasks: T168580.