Jump to content

Nokia Networking/SR Linux Network Tests

From Wikitech

This page details the individual tests carried out to validate the Nokia SR-Linux platform functionally supports all required features for use in WMF datacentres.

Test Topology

The tests were carried out with the following network switches installed in rack A8 in codfw:

Device name Device Type Role
ssw2-a8-codfw Nokia 7220-IXR-D3L Spine Switch
ssw3-a8-codfw Nokia 7220-IXR-D3L Spine Switch
lsw2-a8-codfw Nokia 7220-IXR-D2L Leaf Switch
lsw2-a8-codfw Nokia 7220-IXR-D2L Leaf Switch
lsw2-a8-codfw Nokia 7220-IXR-D2L Leaf Switch
lsw2-a8-codfw Nokia 7220-IXR-D2L Leaf Switch

Each Leaf has 1 uplink to each Spine device, running at either 40 or 100G. All devices are running Nokia SR-Linux v24.7.2.

Physical Tests

Transceiver Compatibility

The primary requirement here is to validate that the normal transceiver modules we use are compatible with the Nokia devices. Thankfully the switches are not programmed to reject modules based on their vendor coding, so we've been able to use Juniper-coded modules received from our various suppliers without problem.

We tested the below modules and had no issues. This does not represent every variant we have across our datacentres, but we are confident based on the experience that any standards-compliant module should work ok.

We can also see that all the DOM information is returned nicely by the switch.

Molex 40GBase-CR4 QSFP+ DAC

   A:cmooney@ssw3-a8-codfw# show interface ethernet-1/2 brief
   +---------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
   |        Port         |   Admin State   |   Oper State    |      Speed      |      Type       |   Description   |
   +=====================+=================+=================+=================+=================+=================+
   | ethernet-1/2        | enable          | up              | 40G             | 40GBASE-CR4     | Core: lsw3-a8-  |
   |                     |                 |                 |                 |                 | codfw:ethernet- |
   |                     |                 |                 |                 |                 | 1/55 {#dummy393 |
   |                     |                 |                 |                 |                 | 5839525}        |
   +---------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
   A:cmooney@ssw3-a8-codfw# info from state interface ethernet-1/2 transceiver
       tx-laser true
       oper-state up
       ddm-events true
       form-factor QSFPplus
       ethernet-pmd 40GBASE-CR4
       connector-type no-separable-connector
       vendor "Molex Inc."
       vendor-part-number 1110409056
       vendor-revision B
       serial-number MOC14036240473
       date-code "2014-01-18T00:00:00Z (11 years ago)"
       link-length-information "Cable Assembly: 3m"
       healthz {
           status healthy
       }

Nokia 10GBase-LR SFP+

   A:cmooney@ssw3-a8-codfw# show interface ethernet-1/33 brief
   +---------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
   |        Port         |   Admin State   |   Oper State    |      Speed      |      Type       |   Description   |
   +=====================+=================+=================+=================+=================+=================+
   | ethernet-1/33       | enable          | up              | 10G             | 10GBASE-LR      | Core: cr2-      |
   |                     |                 |                 |                 |                 | codfw:xe-       |
   |                     |                 |                 |                 |                 | 1/0/4:1         |
   |                     |                 |                 |                 |                 | {#56469769}     |
   +---------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
   --{ running }--[  ]--
   A:cmooney@ssw3-a8-codfw# info from state interface ethernet-1/33 transceiver
       tx-laser true
       oper-state up
       ddm-events true
       form-factor SFPplus
       ethernet-pmd 10GBASE-LR
       connector-type LC
       vendor NOKIA
       vendor-part-number 3HE09327AARA01
       vendor-revision 10
       serial-number O7I2005311
       date-code "2024-06-17T00:00:00Z (9 months ago)"
       fault-condition false
       wavelength 1310.00
       link-length-information "SMF: 10km"
       temperature {
           latest-value 30
           maximum 31
           maximum-time "2025-04-01T06:54:00.214Z (9 days ago)"
           high-alarm-condition false
           high-alarm-threshold 95
           low-alarm-condition false
           low-alarm-threshold -50
           high-warning-condition false
           high-warning-threshold 90
           low-warning-condition false
           low-warning-threshold -45
       }
       voltage {
           latest-value 3.2644
           high-alarm-condition false
           high-alarm-threshold 3.6000
           low-alarm-condition false
           low-alarm-threshold 3.0000
           high-warning-condition false
           high-warning-threshold 3.5500
           low-warning-condition false
           low-warning-threshold 3.0500
       }
       input-power {
           latest-value -1.16
           high-alarm-condition false
           high-alarm-threshold 3.50
           low-alarm-condition false
           low-alarm-threshold -17.40
           high-warning-condition false
           high-warning-threshold 2.50
           low-warning-condition false
           low-warning-threshold -16.40
       }
       output-power {
           latest-value -1.03
           high-alarm-condition false
           high-alarm-threshold 3.50
           low-alarm-condition false
           low-alarm-threshold -11.20
           high-warning-condition false
           high-warning-threshold 2.50
           low-warning-condition false
           low-warning-threshold -10.20
       }
       laser-bias-current {
           latest-value 40.794
           high-alarm-condition false
           high-alarm-threshold 95.000
           low-alarm-condition false
           low-alarm-threshold 3.000
           high-warning-condition false
           high-warning-threshold 90.000
           low-warning-condition false
           low-warning-threshold 5.000
       }
       healthz {
           status healthy
       }

W2W 100GBase-SR4 QSFP28

   A:cmooney@ssw3-a8-codfw# show interface ethernet-1/1 brief
   +---------------------+----------------+----------------+----------------+----------------+----------------+
   |        Port         |  Admin State   |   Oper State   |     Speed      |      Type      |  Description   |
   +=====================+================+================+================+================+================+
   | ethernet-1/1        | enable         | up             | 100G           | 100GBASE-SR4   | Core: lsw2-a8- |
   |                     |                |                |                |                | codfw:ethernet |
   |                     |                |                |                |                | -1/55 {#dummy3 |
   |                     |                |                |                |                | 929739524}     |
   +---------------------+----------------+----------------+----------------+----------------+----------------+
   --{ running }--[  ]--
   A:cmooney@ssw3-a8-codfw# info from state interface ethernet-1/1 transceiver
       tx-laser true
       oper-state up
       ddm-events true
       form-factor QSFP28
       ethernet-pmd 100GBASE-SR4
       connector-type MPO-1x12
       vendor W2W
       vendor-part-number 77J-Z010-SR4
       vendor-revision 01
       serial-number 240507100062
       date-code "2024-05-23T00:00:00Z (10 months ago)"
       fault-condition false
       link-length-information "OM3: 70m, OM4: 100m"
       temperature {
           latest-value 29
           maximum 31
           maximum-time "2025-03-31T20:25:22.215Z (9 days ago)"
           high-alarm-condition false
           high-alarm-threshold 80
           low-alarm-condition false
           low-alarm-threshold -10
           high-warning-condition false
           high-warning-threshold 75
           low-warning-condition false
           low-warning-threshold -5
       }
       voltage {
           latest-value 3.2086
           high-alarm-condition false
           high-alarm-threshold 3.6300
           low-alarm-condition false
           low-alarm-threshold 2.9700
           high-warning-condition false
           high-warning-threshold 3.4600
           low-warning-condition false
           low-warning-threshold 3.1300
       }
       input-power {
           high-alarm-threshold 4.40
           low-alarm-threshold -12.30
           high-warning-threshold 3.40
           low-warning-threshold -11.30
       }
       output-power {
           high-alarm-threshold 4.40
           low-alarm-threshold -10.40
           high-warning-threshold 3.40
           low-warning-threshold -9.40
       }
       channel 1 {
           wavelength 850.00
           input-power {
               latest-value 1.50
               high-alarm-condition false
               high-alarm-threshold 4.40
               low-alarm-condition false
               low-alarm-threshold -12.30
               high-warning-condition false
               high-warning-threshold 3.40
               low-warning-condition false
               low-warning-threshold -11.30
           }
           output-power {
               latest-value 2.18
               high-alarm-condition false
               high-alarm-threshold 4.40
               low-alarm-condition false
               low-alarm-threshold -10.40
               high-warning-condition false
               high-warning-threshold 3.40
               low-warning-condition false
               low-warning-threshold -9.40
           }
           laser-bias-current {
               latest-value 5.862
               high-alarm-condition false
               high-alarm-threshold 15.000
               low-alarm-condition false
               low-alarm-threshold 1.000
               high-warning-condition false
               high-warning-threshold 12.000
               low-warning-condition false
               low-warning-threshold 2.000
           }
       }
       channel 2 {
           wavelength 850.00
           input-power {
               latest-value 1.23
               high-alarm-condition false
               high-alarm-threshold 4.40
               low-alarm-condition false
               low-alarm-threshold -12.30
               high-warning-condition false
               high-warning-threshold 3.40
               low-warning-condition false
               low-warning-threshold -11.30
           }
           output-power {
               latest-value 1.82
               high-alarm-condition false
               high-alarm-threshold 4.40
               low-alarm-condition false
               low-alarm-threshold -10.40
               high-warning-condition false
               high-warning-threshold 3.40
               low-warning-condition false
               low-warning-threshold -9.40
           }
           laser-bias-current {
               latest-value 5.862
               high-alarm-condition false
               high-alarm-threshold 15.000
               low-alarm-condition false
               low-alarm-threshold 1.000
               high-warning-condition false
               high-warning-threshold 12.000
               low-warning-condition false
               low-warning-threshold 2.000
           }
       }
       channel 3 {
           wavelength 850.00
           input-power {
               latest-value 1.30
               high-alarm-condition false
               high-alarm-threshold 4.40
               low-alarm-condition false
               low-alarm-threshold -12.30
               high-warning-condition false
               high-warning-threshold 3.40
               low-warning-condition false
               low-warning-threshold -11.30
           }
           output-power {
               latest-value 1.85
               high-alarm-condition false
               high-alarm-threshold 4.40
               low-alarm-condition false
               low-alarm-threshold -10.40
               high-warning-condition false
               high-warning-threshold 3.40
               low-warning-condition false
               low-warning-threshold -9.40
           }
           laser-bias-current {
               latest-value 5.862
               high-alarm-condition false
               high-alarm-threshold 15.000
               low-alarm-condition false
               low-alarm-threshold 1.000
               high-warning-condition false
               high-warning-threshold 12.000
               low-warning-condition false
               low-warning-threshold 2.000
           }
       }
       channel 4 {
           wavelength 850.00
           input-power {
               latest-value 1.67
               high-alarm-condition false
               high-alarm-threshold 4.40
               low-alarm-condition false
               low-alarm-threshold -12.30
               high-warning-condition false
               high-warning-threshold 3.40
               low-warning-condition false
               low-warning-threshold -11.30
           }
           output-power {
               latest-value 1.34
               high-alarm-condition false
               high-alarm-threshold 4.40
               low-alarm-condition false
               low-alarm-threshold -10.40
               high-warning-condition false
               high-warning-threshold 3.40
               low-warning-condition false
               low-warning-threshold -9.40
           }
           laser-bias-current {
               latest-value 5.862
               high-alarm-condition false
               high-alarm-threshold 15.000
               low-alarm-condition false
               low-alarm-threshold 1.000
               high-warning-condition false
               high-warning-threshold 12.000
               low-warning-condition false
               low-warning-threshold 2.000
           }
       }
       healthz {
           status healthy
       }

W2W Copper 1Gbe RJ45 SFP

   A:cmooney@lsw4-a8-codfw# show interface ethernet-1/1 brief
   +---------------------+--------------+--------------+--------------+--------------+--------------+
   |        Port         | Admin State  |  Oper State  |    Speed     |     Type     | Description  |
   +=====================+==============+==============+==============+==============+==============+
   | ethernet-1/1        | enable       | up           | 1G           | GIGE-T       | nokiatest200 |
   |                     |              |              |              |              | 1            |
   +---------------------+--------------+--------------+--------------+--------------+--------------+
   A:cmooney@lsw4-a8-codfw# info from state interface ethernet-1/1 transceiver
       tx-laser true
       oper-state up
       ddm-events true
       form-factor SFP
       ethernet-pmd GIGE-T
       connector-type RJ45
       vendor W2W
       vendor-part-number 77J-S010-T
       vendor-revision 1.0
       serial-number 230404100133
       date-code "2023-04-25T00:00:00Z (1 year, 11 months ago)"
       link-length-information "OM4: 1000m"
       healthz {
           status healthy
       }

Interesting this one reports the media as "OM4: 1000m", but that's a quirk of the SFP not the switch. It's definitely a regular copper module.

FS.com 10G 3m DAC

   A:cmooney@lsw4-a8-codfw# show interface ethernet-1/10 brief
   +---------------------+-------------+-------------+-------------+-------------+-------------+
   |        Port         | Admin State | Oper State  |    Speed    |    Type     | Description |
   +=====================+=============+=============+=============+=============+=============+
   | ethernet-1/10       | enable      | up          | 10G         | SFP+        | nokiatest20 |
   |                     |             |             |             | PASSIVE     | 02          |
   +---------------------+-------------+-------------+-------------+-------------+-------------+
   A:cmooney@lsw4-a8-codfw# info from state interface ethernet-1/10 transceiver
       tx-laser true
       oper-state up
       ddm-events true
       form-factor SFPplus
       ethernet-pmd "SFP+ PASSIVE"
       connector-type COPPER-PIGTAIL
       vendor FS
       vendor-part-number SFPP-PC02
       vendor-revision A
       serial-number F2011248014-1
       date-code "2022-03-10T00:00:00Z (3 years ago)"
       link-length-information "Cable Assembly: 2m"
       healthz {
           status healthy
       }

Functional Tests

Note some of these tests involve using a 'VRF' or routing-instance. In Nokia all networks have to be within a "network-instance" and so there is no real difference between something happening in the "default" network instance or another one (unlike Juniper or Cisco where there is a defult table, and sepepate routing-instances or VRFs). In an EVPN scenario the underlay can be any network-instance, with as many overlay network-instances as needed which use it for transport.

L2 Access port

   A:cmooney@lsw2-a8-codfw# info interface ethernet-1/1
       description nokiatest2002
       admin-state enable
       mtu 9192
       vlan-tagging false
       subinterface 2051 {
           type bridged
           admin-state enable
       }
   A:cmooney@lsw2-a8-codfw# info network-instance vlan-2051
       type mac-vrf
       admin-state enable
       description private2-a8-codfw
       interface ethernet-1/1.2051 {
       }

We have a host connected on this interface with the following MAC:

   root@nokiatest2002:~# ip -br link show dev eno1 
   eno1             UP             2c:ea:7f:3f:ea:94 <BROADCAST,MULTICAST,UP,LOWER_UP> 

Which is learnt correctly on the switch:

   A:cmooney@lsw2-a8-codfw# show network-instance vlan-2051 bridge-table mac-table all
   ------------------------------------------------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2051
   ------------------------------------------------------------------------------------------------------------------------------------------
   +--------------------+-----------------------------------+------------+-----------+---------+--------+-----------------------------------+
   |      Address       |            Destination            | Dest Index |   Type    | Active  | Aging  |            Last Update            |
   +====================+===================================+============+===========+=========+========+===================================+
   | 2C:EA:7F:3F:EA:94  | ethernet-1/1.2051                 | 1          | learnt    | true    | 1159   | 2025-03-31T11:38:37.000Z          |
   | 78:1F:7C:A8:11:D4  | irb-interface                     | 0          | irb-      | true    | N/A    | 2025-03-31T11:38:33.000Z          |
   |                    |                                   |            | interface |         |        |                                   |
   | BC:97:E1:28:79:00  | ethernet-1/10.2051                | 2          | learnt    | true    | 1159   | 2025-04-08T16:14:06.000Z          |
   +--------------------+-----------------------------------+------------+-----------+---------+--------+-----------------------------------+

L2 Trunk port w/native vlan

A L2 trunk is created in SR-Linux by building multiple L2 sub-interfaces of a single port, each tied to a different 802.1q tag. We can define a "native" vlan by creating the sub-interface we want to be untagged the same as the others, but setting "vlan encap untagged".

So with a switch configuration like this:

   A:cmooney@lsw5-a8-codfw# info flat interface ethernet-1/1
   set / interface ethernet-1/1 description nokiatest2002
   set / interface ethernet-1/1 admin-state enable
   set / interface ethernet-1/1 vlan-tagging true
   set / interface ethernet-1/1 subinterface 2054 type bridged
   set / interface ethernet-1/1 subinterface 2054 admin-state enable
   set / interface ethernet-1/1 subinterface 2054 vlan encap single-tagged vlan-id 2054
   set / interface ethernet-1/1 subinterface 2057 type bridged
   set / interface ethernet-1/1 subinterface 2057 admin-state enable
   set / interface ethernet-1/1 subinterface 2057 vlan encap untagged
   set / interface ethernet-1/1 subinterface 2058 type bridged
   set / interface ethernet-1/1 subinterface 2058 admin-state enable
   set / interface ethernet-1/1 subinterface 2058 vlan encap single-tagged vlan-id 2058
   set / interface ethernet-1/1 sflow admin-state enable

We can configure trunking on the host side by creating vlan sub-interfaces like this:

   root@nokiatest2002:~# ip -4 -br addr show | column -t | sort -nr | egrep ^e
   ens3f1np1                 UP       10.192.54.4/24
   ens3f1np1.2058@ens3f1np1  UP       10.192.55.4/24
   ens3f1np1.2054@ens3f1np1  UP       10.192.47.4/24

On the wire we can see the ARP and pings for each interface being sent with the appropriate encapsulation (tagged or untagged as required):

   root@nokiatest2002:~# tcpdump -e -i ens3f1np1 -l -p -nn icmp or arp 
   listening on ens3f1np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   18:47:28.053406 bc:97:e1:28:54:51 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.192.54.1 tell 10.192.54.4, length 28
   18:47:28.053683 00:00:5e:00:01:01 > bc:97:e1:28:54:51, ethertype ARP (0x0806), length 60: Reply 10.192.54.1 is-at 00:00:5e:00:01:01, length 46
   18:47:28.053695 bc:97:e1:28:54:51 > 00:00:5e:00:01:01, ethertype IPv4 (0x0800), length 98: 10.192.54.4 > 10.192.54.1: ICMP echo request, id 35279, seq 1, length 64
   18:47:28.053956 78:1f:7c:a8:3b:d4 > bc:97:e1:28:54:51, ethertype IPv4 (0x0800), length 98: 10.192.54.1 > 10.192.54.4: ICMP echo reply, id 35279, seq 1, length 64
   18:47:34.493009 bc:97:e1:28:54:51 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 2058, p 0, ethertype ARP (0x0806), Request who-has 10.192.55.1 tell 10.192.55.4, length 28
   18:47:34.493424 00:00:5e:00:01:01 > bc:97:e1:28:54:51, ethertype 802.1Q (0x8100), length 60: vlan 2058, p 6, ethertype ARP (0x0806), Reply 10.192.55.1 is-at 00:00:5e:00:01:01, length 42
   18:47:34.493439 bc:97:e1:28:54:51 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 2058, p 0, ethertype IPv4 (0x0800), 10.192.55.4 > 10.192.55.1: ICMP echo request, id 35280, seq 1, length 64
   18:47:34.493835 00:00:5e:00:01:01 > bc:97:e1:28:54:51, ethertype 802.1Q (0x8100), length 102: vlan 2058, p 0, ethertype IPv4 (0x0800), 10.192.55.1 > 10.192.55.4: ICMP echo reply, id 35280, seq 1, length 64
   18:47:37.475981 bc:97:e1:28:54:51 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 2054, p 0, ethertype ARP (0x0806), Request who-has 10.192.47.1 tell 10.192.47.4, length 28
   18:47:37.476507 78:1f:7c:a8:3b:d4 > bc:97:e1:28:54:51, ethertype 802.1Q (0x8100), length 60: vlan 2054, p 6, ethertype ARP (0x0806), Reply 10.192.47.1 is-at 78:1f:7c:a8:3b:d4, length 42
   18:47:37.476521 bc:97:e1:28:54:51 > 78:1f:7c:a8:3b:d4, ethertype 802.1Q (0x8100), length 102: vlan 2054, p 0, ethertype IPv4 (0x0800), 10.192.47.4 > 10.192.47.1: ICMP echo request, id 35281, seq 1, length 64
   18:47:37.476715 78:1f:7c:a8:3b:d4 > bc:97:e1:28:54:51, ethertype 802.1Q (0x8100), length 102: vlan 2054, p 0, ethertype IPv4 (0x0800), 10.192.47.1 > 10.192.47.4: ICMP echo reply, id 35281, seq 1, length 64

On the switch side all the MACs are populated correctly:

   A:cmooney@lsw5-a8-codfw# show network-instance * bridge-table mac-table all | grep ethernet-1/1
   | BC:97:E1:28:54:51 | ethernet-1/1.2054        | 19        | learnt | true   | 1080  | 2025-04-10T17:38:21.000Z |
   | BC:97:E1:28:54:51 | ethernet-1/1.2057        | 18        | learnt | true   | 1183  | 2025-04-10T17:35:12.000Z |
   | BC:97:E1:28:54:51 | ethernet-1/1.2058        | 20        | learnt | true   | 1080  | 2025-04-10T17:38:26.000Z |

Routing to end host from Vlan interface in VRF

Checking the local routing from IRB interface to end-host in a network instance. IRB config is as follows:

   A:cmooney@lsw5-a8-codfw# info flat interface irb0 subinterface 2054
   set / interface irb0 subinterface 2054 admin-state enable
   set / interface irb0 subinterface 2054 ipv4 admin-state enable
   set / interface irb0 subinterface 2054 ipv4 address 10.192.47.1/24
   set / interface irb0 subinterface 2054 ipv4 arp timeout 600
   set / interface irb0 subinterface 2054 ipv4 arp learn-unsolicited true
   set / interface irb0 subinterface 2054 ipv4 arp host-route populate dynamic
   set / interface irb0 subinterface 2054 ipv4 arp evpn advertise dynamic
   set / interface irb0 subinterface 2054 ipv6 admin-state enable
   set / interface irb0 subinterface 2054 ipv6 address 2620:0:860:126::1/64
   set / interface irb0 subinterface 2054 ipv6 neighbor-discovery stale-time 600
   set / interface irb0 subinterface 2054 ipv6 neighbor-discovery learn-unsolicited global
   set / interface irb0 subinterface 2054 ipv6 neighbor-discovery host-route populate dynamic
   set / interface irb0 subinterface 2054 ipv6 neighbor-discovery evpn advertise dynamic
   set / interface irb0 subinterface 2054 ipv6 router-advertisement router-role admin-state enable
   set / interface irb0 subinterface 2054 ipv6 router-advertisement router-role max-advertisement-interval 30
   set / interface irb0 subinterface 2054 ipv6 router-advertisement router-role router-lifetime 600
   set / interface irb0 subinterface 2054 ipv6 router-advertisement router-role prefix 2620:0:860:126::/64

Host configured as follows:

   root@nokiatest2002:~# ip -br addr show dev ens3f1np1.2054
   ens3f1np1.2054@ens3f1np1 UP             10.192.47.4/24 2620:0:860:126:be97:e1ff:fe28:5451/64 

IPv4

   A:cmooney@lsw5-a8-codfw# ping -c 2 -I 10.192.47.1 10.192.47.4 network-instance PRODUCTION
   Using network instance PRODUCTION
   PING 10.192.47.4 (10.192.47.4) from 10.192.47.1 : 56(84) bytes of data.
   64 bytes from 10.192.47.4: icmp_seq=1 ttl=64 time=0.399 ms
   64 bytes from 10.192.47.4: icmp_seq=2 ttl=64 time=0.416 ms

IPv6

   root@nokiatest2002:~# ip -6 -br addr show dev ens3f1np1.2054
   ens3f1np1.2054@ens3f1np1 UP             2620:0:860:126:be97:e1ff:fe28:5451/64 
   A:cmooney@lsw5-a8-codfw# ping -I 2620:0:860:126::1 2620:0:860:126:be97:e1ff:fe28:5451 network-instance PRODUCTION
   Using network instance PRODUCTION
   PING 2620:0:860:126:be97:e1ff:fe28:5451(2620:0:860:126:be97:e1ff:fe28:5451) from 2620:0:860:126::1 : 56 data bytes
   64 bytes from 2620:0:860:126:be97:e1ff:fe28:5451: icmp_seq=1 ttl=64 time=7.00 ms
   64 bytes from 2620:0:860:126:be97:e1ff:fe28:5451: icmp_seq=2 ttl=64 time=0.287 ms

Routed port in VRF

We configure a routed port in the VRF as follows:

   A:cmooney@ssw2-a8-codfw# info flat interface ethernet-1/33
   set / interface ethernet-1/33 description "Core: cr1-codfw:xe-1/1/1:3 {#7896541}"
   set / interface ethernet-1/33 admin-state enable
   set / interface ethernet-1/33 subinterface 0 admin-state enable
   set / interface ethernet-1/33 subinterface 0 ipv4 admin-state enable
   set / interface ethernet-1/33 subinterface 0 ipv4 address 10.192.254.15/31
   set / interface ethernet-1/33 subinterface 0 ipv6 admin-state enable
   set / interface ethernet-1/33 subinterface 0 ipv6 address 2620:0:860:137::2/64

IPv4

  A:cmooney@ssw2-a8-codfw# ping -M do -s 9150 -c 2 -I 10.192.254.15 10.192.254.14 network-instance PRODUCTION
   Using network instance PRODUCTION
   PING 10.192.254.14 (10.192.254.14) from 10.192.254.15 : 9150(9178) bytes of data.
   9158 bytes from 10.192.254.14: icmp_seq=1 ttl=64 time=1.37 ms
   9158 bytes from 10.192.254.14: icmp_seq=2 ttl=64 time=1.53 ms
   A:cmooney@ssw2-a8-codfw# ping -M do -s 9151 -c 2 -I 10.192.254.15 10.192.254.14 network-instance PRODUCTION
   Using network instance PRODUCTION
   PING 10.192.254.14 (10.192.254.14) from 10.192.254.15 : 9151(9179) bytes of data.
   ping: local error: message too long, mtu=9178
   ping: local error: message too long, mtu=9178
   A:cmooney@ssw2-a8-codfw# ping -s 9151 -c 2 -I 10.192.254.15 10.192.254.14 network-instance PRODUCTION
   Using network instance PRODUCTION
   PING 10.192.254.14 (10.192.254.14) from 10.192.254.15 : 9151(9179) bytes of data.
   9159 bytes from 10.192.254.14: icmp_seq=1 ttl=64 time=1.83 ms
   9159 bytes from 10.192.254.14: icmp_seq=2 ttl=64 time=3.55 ms

IPv6

   A:cmooney@ssw2-a8-codfw# ping -M do -s 9130 -c 2 -I 2620:0:860:137::2 2620:0:860:137::1 network-instance PRODUCTION
   Using network instance PRODUCTION
   PING 2620:0:860:137::1(2620:0:860:137::1) from 2620:0:860:137::2 : 9130 data bytes
   9138 bytes from 2620:0:860:137::1: icmp_seq=1 ttl=64 time=2.24 ms
   9138 bytes from 2620:0:860:137::1: icmp_seq=2 ttl=64 time=1.33 ms
   A:cmooney@ssw2-a8-codfw# ping -M do -s 9131 -c 2 -I 2620:0:860:137::2 2620:0:860:137::1 network-instance PRODUCTION
   Using network instance PRODUCTION
   PING 2620:0:860:137::1(2620:0:860:137::1) from 2620:0:860:137::2 : 9131 data bytes
   ping: local error: message too long, mtu: 9178
   ping: local error: message too long, mtu: 9178

For some reason sending the 9131 byte packet, removing the do-not-fragment bit, does not get a response (this should work if the two fragments are transmitted and received):

   A:cmooney@ssw2-a8-codfw# ping -s 9131 -c 2 -I 2620:0:860:137::2 2620:0:860:137::1 network-instance PRODUCTION
   Using network instance PRODUCTION
   PING 2620:0:860:137::1(2620:0:860:137::1) from 2620:0:860:137::2 : 9131 data bytes
   
   --- 2620:0:860:137::1 ping statistics ---
   2 packets transmitted, 0 received, 100% packet loss, time 1040ms

The Nokia shows them being sent in a tcpdump:

   20:02:22.148456 IP6 (flowlabel 0x728ee, hlim 64, next-header Fragment (44) payload length: 9136) 2620:0:860:137::2 > 2620:0:860:137::1: frag (0xf6ed6310:0|9128) ICMP6, echo request, id 36249, seq 1
   20:02:22.148481 IP6 (flowlabel 0x728ee, hlim 64, next-header Fragment (44) payload length: 19) 2620:0:860:137::2 > 2620:0:860:137::1: frag (0xf6ed6310:9128|11)

I'm not 100% sure of the reason for this, however it seems the MTU is correct, from the Juniper side we top out at the same value:

   cmooney@re0.cr1-codfw> ping 2620:0:860:137::2 source 2620:0:860:137::1 do-not-fragment size 9131 count 2    
   PING6(9179=40+8+9131 bytes) 2620:0:860:137::1 --> 2620:0:860:137::2
   ping: sendmsg: Message too long
   ping6: wrote 2620:0:860:137::2 9139 chars, ret=-1
   ping: sendmsg: Message too long
   ping6: wrote 2620:0:860:137::2 9139 chars, ret=-1

And doing a capture on the Nokia I can see that the maximum packet size each allows us to transit - 9130 - does equate to the same size on the wire:

   root@ssw2-a8-codfw:~# tcpdump -vvv -i e1-33.0 -l -p -nn -s0 icmp6 
   tcpdump: listening on e1-33.0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   20:04:30.111379 IP6 (flowlabel 0x728ee, hlim 64, next-header ICMPv6 (58) payload length: 9138) 2620:0:860:137::2 > 2620:0:860:137::1: [icmp6 sum ok] ICMP6, echo request, id 15730, seq 1
   20:04:30.112927 IP6 (flowlabel 0x728ee, hlim 64, next-header ICMPv6 (58) payload length: 9138) 2620:0:860:137::1 > 2620:0:860:137::2: [icmp6 sum ok] ICMP6, echo reply, id 15730, seq 1
   20:04:31.112837 IP6 (flowlabel 0x728ee, hlim 64, next-header ICMPv6 (58) payload length: 9138) 2620:0:860:137::2 > 2620:0:860:137::1: [icmp6 sum ok] ICMP6, echo request, id 15730, seq 2
   20:04:31.114520 IP6 (flowlabel 0x728ee, hlim 64, next-header ICMPv6 (58) payload length: 9138) 2620:0:860:137::1 > 2620:0:860:137::2: [icmp6 sum ok] ICMP6, echo reply, id 15730, seq 2
   20:04:47.786934 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 9138) 2620:0:860:137::1 > 2620:0:860:137::2: [icmp6 sum ok] ICMP6, echo request, id 55236, seq 0
   20:04:47.786974 IP6 (flowlabel 0xde8ca, hlim 64, next-header ICMPv6 (58) payload length: 9138) 2620:0:860:137::2 > 2620:0:860:137::1: [icmp6 sum ok] ICMP6, echo reply, id 55236, seq 0
   20:04:48.787532 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 9138) 2620:0:860:137::1 > 2620:0:860:137::2: [icmp6 sum ok] ICMP6, echo request, id 55236, seq 1
   20:04:48.787570 IP6 (flowlabel 0xde8ca, hlim 64, next-header ICMPv6 (58) payload length: 9138) 2620:0:860:137::2 > 2620:0:860:137::1: [icmp6 sum ok] ICMP6, echo reply, id 55236, seq 1

So this should not trouble us, I may be forgetting some quriks of fragmentation in v6 maybe.

Jumbo frames across L2 Vlan

The purpose of this test is to confirm that jumbo frames are transmitted within a vlan from host to host (switches just bridging frames).

Same Switch

Two IPs are known locally on the switch:

   A:cmooney@lsw2-a8-codfw# show arpnd arp-entries interface irb0 subinterface 2051
   +-----------+-----------+----------------+-----------+----------------------+-------------------------------------------+
   | Interface | Subinterf |    Neighbor    |  Origin   |  Link layer address  |                  Expiry                   |
   |           |    ace    |                |           |                      |                                           |
   +===========+===========+================+===========+======================+===========================================+
   | irb0      |      2051 |   10.192.44.21 |   dynamic | BC:97:E1:28:79:00    | 8 minutes from now                        |
   | irb0      |      2051 |   10.192.44.22 |   dynamic | 2C:EA:7F:3F:EA:94    | 8 minutes from now                        |
   +-----------+-----------+----------------+-----------+----------------------+-------------------------------------------+

We can see these MACs are local to it rather than being via VXLAN

   A:cmooney@lsw2-a8-codfw# show network-instance vlan-2051 bridge-table mac-table all
   ----------------------------------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2051
   ----------------------------------------------------------------------------------------------------------------------------
   +-------------------+-----------------------------+-----------+---------+--------+-------+-----------------------------+
   |      Address      |         Destination         |   Dest    |  Type   | Active | Aging |         Last Update         |
   |                   |                             |   Index   |         |        |       |                             |
   +===================+=============================+===========+=========+========+=======+=============================+
   | 2C:EA:7F:3F:EA:94 | ethernet-1/1.2051           | 1         | learnt  | true   | 1178  | 2025-04-10T20:17:33.000Z    |
   | BC:97:E1:28:79:00 | ethernet-1/10.2051          | 2         | learnt  | true   | 1178  | 2025-04-08T16:14:06.000Z    |
   +-------------------+-----------------------------+-----------+---------+--------+-------+-----------------------------+

If we have a large MTU on the host we can send jumbos fine:

   root@nokiatest2001:~# ip -4 -br addr show dev ens3f0np0
   ens3f0np0        UP             10.192.44.21/24 
   root@nokiatest2001:~# ip --json link show dev ens3f0np0 | jq .[].mtu
   9000
   root@nokiatest2001:~# ip neigh show 10.192.44.22
   10.192.44.22 dev ens3f0np0 lladdr 2c:ea:7f:3f:ea:94 REACHABLE 
   root@nokiatest2001:~# ping -c 2 -M do -s 8972 10.192.44.22
   PING 10.192.44.22 (10.192.44.22) 8972(9000) bytes of data.
   8980 bytes from 10.192.44.22: icmp_seq=1 ttl=64 time=0.628 ms
   8980 bytes from 10.192.44.22: icmp_seq=2 ttl=64 time=0.622 ms

Remote via VXLAN

In this case Vlan 2057 is configured on two separate switches, lsw4-a8-codfw and lsw5-a8-codfw, and they are bound to a VXLAN VNI which is advertised in EVPN.

We have a two connections to this vlan from test server nokiatest2002. These are on interfaces ens3f0np0 (lsw5) and ens3f1np1 (lsw4). We place the interfaces in separate network namespaces on the host side so it doesn't see the other one as local.

   # lsw4 connection
   
   root@nokiatest2002:~# ip -4 -br addr show dev ens3f1np1 
   ens3f1np1        UP             10.192.54.4/24 
   root@nokiatest2002:~# ip -br link show dev ens3f1np1
   ens3f1np1        UP             bc:97:e1:28:54:51 <BROADCAST,MULTICAST,UP,LOWER_UP> 
   root@nokiatest2002:~# ip --json link show dev ens3f1np1 | jq .[].mtu 
   9000
   # lsw5 connection
   
   root@nokiatest2002:~# ip -4 -br addr show dev ens3f0np0 
   ens3f0np0        UP             10.192.54.22/24 
   root@nokiatest2002:~# ip -br link show dev ens3f0np0
   ens3f0np0        UP             bc:97:e1:28:54:50 <BROADCAST,MULTICAST,UP,LOWER_UP> 
   root@nokiatest2002:~# ip --json link show dev ens3f0np0 | jq .[].mtu 
   9000

These MACs are known on each switch locally and via EVPN:

   A:cmooney@lsw4-a8-codfw# show network-instance vlan-2057 bridge-table mac-table all
   ----------------------------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2057
   ----------------------------------------------------------------------------------------------------------------------
   +-------------------+--------------------------+-----------+--------+--------+-------+--------------------------+
   |      Address      |       Destination        |   Dest    |  Type  | Active | Aging |       Last Update        |
   |                   |                          |   Index   |        |        |       |                          |
   +===================+==========================+===========+========+========+=======+==========================+
   | BC:97:E1:28:54:50 | ethernet-1/10.2057       | 2         | learnt | true   | 1163  | 2025-03-31T11:45:20.000Z |
   | BC:97:E1:28:54:51 | vxlan-                   | 6870904   | evpn   | true   | N/A   | 2025-04-10T18:26:23.000Z |
   |                   | interface:vxlan0.2057    |           |        |        |       |                          |
   |                   | vtep:10.192.252.39       |           |        |        |       |                          |
   |                   | vni:2002057              |           |        |        |       |                          |
   +-------------------+--------------------------+-----------+--------+--------+-------+--------------------------+
   A:cmooney@lsw5-a8-codfw# show network-instance vlan-2057 bridge-table mac-table all
   ----------------------------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2057
   ----------------------------------------------------------------------------------------------------------------------
   +-------------------+--------------------------+-----------+--------+--------+-------+--------------------------+
   |      Address      |       Destination        |   Dest    |  Type  | Active | Aging |       Last Update        |
   |                   |                          |   Index   |        |        |       |                          |
   +===================+==========================+===========+========+========+=======+==========================+
   | BC:97:E1:28:54:50 | vxlan-                   | 6756285   | evpn   | true   | N/A   | 2025-04-10T20:15:44.000Z |
   |                   | interface:vxlan0.2057    |           |        |        |       |                          |
   |                   | vtep:10.192.252.38       |           |        |        |       |                          |
   |                   | vni:2002057              |           |        |        |       |                          |
   | BC:97:E1:28:54:51 | ethernet-1/1.2057        | 18        | learnt | true   | 1157  | 2025-04-10T17:35:12.000Z |
   +-------------------+--------------------------+-----------+--------+--------+-------+--------------------------+

And we can ping with jumbo frames between the two interfaces:

   root@nokiatest2002:~# ping -s 8972 -c 2 -M do -I 10.192.54.4 10.192.54.22 
   PING 10.192.54.22 (10.192.54.22) from 10.192.54.4 : 8972(9000) bytes of data.
   8980 bytes from 10.192.54.22: icmp_seq=1 ttl=64 time=0.211 ms
   8980 bytes from 10.192.54.22: icmp_seq=2 ttl=64 time=0.210 ms
   root@nokiatest2002:~# ip neigh show 10.192.54.22
   10.192.54.22 dev ens3f1np1 lladdr bc:97:e1:28:54:50 REACHABLE

Jumbo frames L3 routing

In these tests we want to confirm we can send jumbo frames between hosts on different vlans, with the hosts sending the traffic via their configured default gateway (irb0 subinterface on the switch). Again we wish to test this works for two hosts on the same switch, and also across switches.

Same Switch

We ping from 10.192.46.21 on nokiatest2001 (vlan: private4-a8-codfw) to 10.192.54.22 on nokiatest2002 (vlan: private2-l-codfw). Both ports are connected to lsw4-a8-codfw so there the switch just routes from one vlan to the other.

   root@nokiatest2001:~# ip -4 -br addr show dev eno1 
   eno1             UP             10.192.46.21/24 
   root@nokiatest2001:~# ip route get fibmatch 10.192.54.22
   default via 10.192.46.1 dev eno1 onlink 
   root@nokiatest2001:~# mtr -b -w -c 2 10.192.54.22
   Start: 2025-04-11T17:01:13+0100
   HOST: nokiatest2001                                      Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2053.lsw4-a8-codfw.codfw.wmnet (10.192.46.1)   0.0%     2    0.2   0.2   0.2   0.2   0.0
     2.|-- ens3f0np0.nokiatest2002.codfw.wmnet (10.192.54.22)  0.0%     2    0.2   0.3   0.2   0.3   0.0
   root@nokiatest2001:~# ip -json link show dev eno1 | jq .[].mtu
   9000
   root@nokiatest2001:~# ping -c 2 -s 8972 -M do 10.192.54.22
   PING 10.192.54.22 (10.192.54.22) 8972(9000) bytes of data.
   8980 bytes from 10.192.54.22: icmp_seq=1 ttl=63 time=0.615 ms
   8980 bytes from 10.192.54.22: icmp_seq=2 ttl=63 time=0.625 ms

Remote via VXLAN

We ping from 10.192.45.21 on nokiatest2001, connected to lsw3-a8-cofw eth-1/1 (vlan 2052 - private3-a8-codfw), to 10.192.44.22 on nokiatest2002, connected to lsw2-a8-codfw eth-1/1 (vlan 2051 - private2-a8-codfw).

   root@nokiatest2001:~# ip -4 -br addr show dev ens3f1np1
   ens3f1np1        UP             10.192.45.21/24 
   root@nokiatest2001:~# ip --json link show dev ens3f1np1 | jq .[].mtu 
   9000
   root@nokiatest2001:~# ip route get fibmatch 10.192.44.22
   default via 10.192.45.1 dev ens3f1np1 
   root@nokiatest2001:~# mtr -b -w -c 2 10.192.44.22 
   Start: 2025-04-11T19:03:54+0100
   HOST: nokiatest2001                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2052.lsw3-a8-codfw.codfw.wmnet (10.192.45.1)  0.0%     2    0.2   0.2   0.2   0.2   0.0
     2.|-- anycast-gw-2056-codfw.codfw.wmnet (10.192.53.1)    0.0%     2    0.2   0.2   0.2   0.2   0.0
     3.|-- nokiatest2002.codfw.wmnet (10.192.44.22)           0.0%     2    0.2   0.2   0.2   0.2   0.0
   root@nokiatest2001:~# ping -M do -s 8972 10.192.44.22 
   PING 10.192.44.22 (10.192.44.22) 8972(9000) bytes of data.
   8980 bytes from 10.192.44.22: icmp_seq=1 ttl=62 time=0.619 ms
   8980 bytes from 10.192.44.22: icmp_seq=2 ttl=62 time=0.628 ms
   8980 bytes from 10.192.44.22: icmp_seq=3 ttl=62 time=0.628 ms
   8980 bytes from 10.192.44.22: icmp_seq=4 ttl=62 time=0.628 ms

Remote MAC Learning on Vlan/VNI & Client-Client L2 Forwarding

First we take two ports on nokiatest2002, one connected to lsw4 and one connected to lsw5 (in separate namespaces on the host side).

   root@nokiatest2002:~# ip -4 -br addr show dev ens3f0np0 
   ens3f0np0        UP             10.192.54.22/24 
   root@nokiatest2002:~# ip -br link show dev ens3f0np0 
   ens3f0np0        UP             bc:97:e1:28:54:50 <BROADCAST,MULTICAST,UP,LOWER_UP> 
   root@nokiatest2002:~# ip -4 -br addr show dev ens3f1np1
   ens3f1np1        UP             10.192.54.4/24 
   root@nokiatest2002:~# ip -br link show dev ens3f1np1 
   ens3f1np1        UP             bc:97:e1:28:54:51 <BROADCAST,MULTICAST,UP,LOWER_UP>

We add some static arp bindings either side to remove the need for the hosts to ARP (and for the switches to snoop on these to validate pure L2 unicast function):

   root@nokiatest2002:~# ip neigh show 10.192.54.4
   10.192.54.4 dev ens3f0np0 lladdr bc:97:e1:28:54:51 PERMANENT 
  root@nokiatest2002:~# ip neigh show 10.192.54.22
  10.192.54.22 dev ens3f1np1 lladdr bc:97:e1:28:54:50 PERMANENT 

We can then ping, which results in just unicast Ethernet frames sent over the vlan:

   root@nokiatest2002:~# ping -c 2 10.192.54.22 
   PING 10.192.54.22 (10.192.54.22) 56(84) bytes of data.
   64 bytes from 10.192.54.22: icmp_seq=1 ttl=64 time=0.085 ms
   64 bytes from 10.192.54.22: icmp_seq=2 ttl=64 time=0.159 ms

Looking at the MAC forwarding table for vlan 2057 on both switches we can see it looks as expected, each learns the MAC from the locally connected device on the direct port, and see the other one as reachable over VXLAN due to BGP EVPN updates from the other switch:

   A:lsw4-a8-codfw# show network-instance vlan-2057 bridge-table mac-table all
   ---------------------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2057
   ---------------------------------------------------------------------------------------------------------------
   +-------------------+----------------------------+-----------+---------+--------+-------+----------------------------+
   |      Address      |        Destination         |   Dest    |  Type   | Active | Aging |        Last Update         |
   |                   |                            |   Index   |         |        |       |                            |
   +===================+============================+===========+=========+========+=======+============================+
   | 00:00:5E:00:01:01 | irb-interface              | 0         | irb-int | true   | N/A   | 2025-04-11T01:18:20.000Z   |
   |                   |                            |           | erface- |        |       |                            |
   |                   |                            |           | anycast |        |       |                            |
   | 78:1F:7C:A8:2F:D4 | irb-interface              | 0         | irb-int | true   | N/A   | 2025-04-11T01:18:20.000Z   |
   |                   |                            |           | erface  |        |       |                            |
   | 78:1F:7C:A8:3B:D4 | vxlan-                     | 7235568   | evpn-   | true   | N/A   | 2025-04-11T13:15:48.000Z   |
   |                   | interface:vxlan0.2057      |           | static  |        |       |                            |
   |                   | vtep:10.192.252.39         |           |         |        |       |                            |
   |                   | vni:2002057                |           |         |        |       |                            |
   | BC:97:E1:28:54:50 | ethernet-1/10.2057         | 17        | learnt  | true   | 1173  | 2025-04-14T11:21:35.000Z   |
   | BC:97:E1:28:54:51 | vxlan-                     | 7235568   | evpn    | true   | N/A   | 2025-04-14T11:17:43.000Z   |
   |                   | interface:vxlan0.2057      |           |         |        |       |                            |
   |                   | vtep:10.192.252.39         |           |         |        |       |                            |
   |                   | vni:2002057                |           |         |        |       |                            |
   +-------------------+----------------------------+-----------+---------+--------+-------+----------------------------+
   A:lsw5-a8-codfw# show network-instance vlan-2057 bridge-table mac-table all
   ---------------------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2057
   ---------------------------------------------------------------------------------------------------------------
   +-------------------+----------------------------+-----------+---------+--------+-------+----------------------------+
   |      Address      |        Destination         |   Dest    |  Type   | Active | Aging |        Last Update         |
   |                   |                            |   Index   |         |        |       |                            |
   +===================+============================+===========+=========+========+=======+============================+
   | 00:00:5E:00:01:01 | irb-interface              | 0         | irb-int | true   | N/A   | 2025-04-11T13:10:33.000Z   |
   |                   |                            |           | erface- |        |       |                            |
   |                   |                            |           | anycast |        |       |                            |
   | 78:1F:7C:A8:3B:D4 | irb-interface              | 0         | irb-int | true   | N/A   | 2025-04-11T13:10:33.000Z   |
   |                   |                            |           | erface  |        |       |                            |
   | BC:97:E1:28:54:50 | vxlan-                     | 7257831   | evpn    | true   | N/A   | 2025-04-14T11:21:36.000Z   |
   |                   | interface:vxlan0.2057      |           |         |        |       |                            |
   |                   | vtep:10.192.252.38         |           |         |        |       |                            |
   |                   | vni:2002057                |           |         |        |       |                            |
   | BC:97:E1:28:54:51 | ethernet-1/1.2057          | 14        | learnt  | true   | 1192  | 2025-04-11T13:27:06.000Z   |
   +-------------------+----------------------------+-----------+---------+--------+-------+----------------------------+

Client to Client broadcast forwarding / ingress replication

Simple test that broadcasts are relayed to all hosts within a Vlan. In the EVPN context this tests that ingress-replication is working correctly and the ingress switch properly forwards broadcasts to all other participating switches in the vlan based on a type-3 EVPN route.

Same Switch (Pure L2)

We generate some ARPs from nokiatest2001 for an IP that is not on the vlan right now:

   root@nokiatest2001:~# arping -I eno1 10.192.46.199 
   ARPING 10.192.46.199
   Timeout
   Timeout
   Timeout
   Timeout

On another host connected to the same switch and vlan, we enable it's port but don't configure any IP:

   root@nokiatest2002:~# ip -br addr show 
   lo               UNKNOWN        127.0.0.1/8 ::1/128 
   ens3f0np0        UP             

If we do a tcpdump we see the ARP broadcasts are being properly flooded within the vlan:

   root@nokiatest2002:~# tcpdump -e -i ens3f0np0 -l -p -nn 
   listening on ens3f0np0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   19:25:16.982225 2c:ea:7f:3f:f7:e8 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.192.46.199 tell 10.192.46.21, length 46
   19:25:17.983343 2c:ea:7f:3f:f7:e8 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.192.46.199 tell 10.192.46.21, length 46

Remote Switch (VXLAN tunneled)

In this case we will use two ports on nokiatest2002, ens3f0np0 terminating on lsw4-a8 and ens3f1np1 terminating on lsw5-a8. On the switch side both are in the vlan 2057, with EVPN/VXLAN configuration to stretch between switches. On the host side we place the ports in different network namespaces for isolation.

Firstly on the switches we can see the required EVPN type-3 routes are being sent by each switch. On lsw4 we can see there is one there which originated from lsw5:

   A:lsw4-a8-codfw# show network-instance default protocols bgp routes evpn route-type 3 originating-router 10.192.252.39 detail
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Show report for the EVPN routes in network-instance  "default"
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Route Distinguisher: 10.192.252.39:2057
   Tag-ID             : 0
   Originating router : 10.192.252.39
   neighbor           : 10.192.252.34
   Received paths     : 1
     Path 1: <Best,Valid,Used,>
       Label             : 2002057
       Route source      : neighbor 10.192.252.34 (last modified 4d3h41m53s ago)
       Route preference  : No MED, LocalPref is 100
       Atomic Aggr       : false
       BGP next-hop      : 10.192.252.39
       AS Path           :  i
       Communities       : [target:64812:2057, bgp-tunnel-encap:VXLAN]
       RR Attributes     : Originator-ID 10.192.252.39, Cluster-List is [10.192.252.34]
       Aggregation       : None
       Unknown Attr      : None
       Invalid Reason    : None
       Tie Break Reason  : none

Likewise on lsw5 we can see the equivalent having been learnt from lsw4:

   A:lsw5-a8-codfw# show network-instance default protocols bgp routes evpn route-type 3 originating-router 10.192.252.38 detail
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Show report for the EVPN routes in network-instance  "default"
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Route Distinguisher: 10.192.252.38:2057
   Tag-ID             : 0
   Originating router : 10.192.252.38
   neighbor           : 10.192.252.34
   Received paths     : 1
     Path 1: <Best,Valid,Used,>
       Label             : 2002057
       Route source      : neighbor 10.192.252.34 (last modified 5d3h41m40s ago)
       Route preference  : No MED, LocalPref is 100
       Atomic Aggr       : false
       BGP next-hop      : 10.192.252.38
       AS Path           :  i
       Communities       : [target:64812:2057, bgp-tunnel-encap:VXLAN]
       RR Attributes     : Originator-ID 10.192.252.38, Cluster-List is [10.192.252.34]
       Aggregation       : None
       Unknown Attr      : None
       Invalid Reason    : None
       Tie Break Reason  : none


These routes tell the switches to make a copy of any broadcast received on the Vlan and send it to the remote device over VXLAN so that it gets to every host on the segment.


We can test this from the host by generating some ARPs:

   root@nokiatest2002:~# arping -S 1.2.3.3 -I ens3f0np0 1.2.3.4
   ARPING 1.2.3.4
   Timeout
   Timeout

And we see them arriving on the interface connected to the other switch as expected.

   root@nokiatest2002:~# tcpdump -e -i ens3f1np1 -l -p -nn
   listening on ens3f1np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   19:35:30.433493 bc:97:e1:28:54:50 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 1.2.3.4 tell 1.2.3.3, length 46
   19:35:31.434609 bc:97:e1:28:54:50 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 1.2.3.4 tell 1.2.3.3, length 46

Client to Client L2 multicast forwarding / ingress replication

We do not make use of routed multicast on any of the WMF networks, however certain protocols (IPv6 NDP, several routing protocols) transmit multicast frames when operating over a LAN. Switches are required to either flood these to all ports, or maintain an internal tree of listeners based on IGMP snooping.

Same Switch (Pure L2)

In this case we simply ping a multicast IPv4 address to generate the multicast L2 frames:

   root@nokiatest2001:~# ping 224.0.0.1
   PING 224.0.0.1 (224.0.0.1) 56(84) bytes of data.
   ^C
   --- 224.0.0.1 ping statistics ---
   27 packets transmitted, 0 received, 100% packet loss, time 26602ms

From another device connected to the same vlan on the same switch we see the multicasts being received:

   root@nokiatest2002:~# tcpdump -e -l -nn -i ens3f0np0
   listening on ens3f0np0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   20:23:54.861696 2c:ea:7f:3f:f7:e8 > 01:00:5e:00:00:01, ethertype IPv4 (0x0800), length 98: 10.192.46.21 > 224.0.0.1: ICMP echo request, id 3903, seq 1, length 64
   20:23:55.863354 2c:ea:7f:3f:f7:e8 > 01:00:5e:00:00:01, ethertype IPv4 (0x0800), length 98: 10.192.46.21 > 224.0.0.1: ICMP echo request, id 3903, seq 2, length 64

Remote Switch (VXLAN tunneled)

Using the same setup from the broadcast test we can send some pings out our interface to lsw4:

   root@nokiatest2002:~# ping -c 2 224.0.0.1
   PING 224.0.0.1 (224.0.0.1) 56(84) bytes of data.
   ^C
   --- 224.0.0.1 ping statistics ---
   2 packets transmitted, 0 received, 100% packet loss, time 1025ms

We see these received on from the port connected to lsw5:

   root@nokiatest2002:~# tcpdump -e -i ens3f1np1 -l -p -nn
   listening on ens3f1np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   19:45:44.672064 bc:97:e1:28:54:50 > 01:00:5e:00:00:01, ethertype IPv4 (0x0800), length 98: 10.192.54.22 > 224.0.0.1: ICMP echo request, id 38919, seq 1, length 64
   19:45:45.696712 bc:97:e1:28:54:50 > 01:00:5e:00:00:01, ethertype IPv4 (0x0800), length 98: 10.192.54.22 > 224.0.0.1: ICMP echo request, id 38919, seq 2, length 64

Inter-Vlan/subnet routing via IRB interfaces on same switch

In this case we test the simple case of forwarding between two vlans on the same device, i.e. routing via the irb0 interface.

Switch config:

   set / interface irb0 subinterface 2053 ipv4 address 10.192.46.1/24
   set / interface irb0 subinterface 2053 ipv6 address 2620:0:860:125::1/64
   set / interface irb0 subinterface 2057 ipv4 address 10.192.54.1/24 anycast-gw true
   set / interface irb0 subinterface 2057 ipv4 address 10.192.54.2/24 primary
   set / interface irb0 subinterface 2057 ipv6 address 2620:0:860:129::1/64 anycast-gw true
   set / interface irb0 subinterface 2057 ipv6 address 2620:0:860:129::2/64 primary

The second interface is an anycast gw but that is not relevant for this test.

The first host is configured on vlan-2053 as follows:

   root@nokiatest2001:~# ip -br addr show dev eno1 scope global 
   eno1             UP             10.192.46.21/24 2620:0:860:125::21/64
   root@nokiatest2001:~# ip route show default
   default via 10.192.46.1 dev eno1 onlink 
   root@nokiatest2001:~# ip -6 route show default
   default via 2620:0:860:125::1 dev eno1 metric 1024 pref medium

The second host is configured on vlan-2057 as follows:

   root@nokiatest2002:~# ip -br addr show dev ens3f0np0 scope global 
   ens3f0np0        UP             10.192.54.22/24 2620:0:860:129:be97:e1ff:fe28:5450/64 
   root@nokiatest2002:~# ip -4 route show default 
   default via 10.192.54.1 dev ens3f0np0 
   root@nokiatest2002:~# ip -6 route show default 
   default via fe80::200:5eff:fe00:101 dev ens3f0np0 proto ra metric 1024 expires 582sec hoplimit 64 pref medium

IPv4

   root@nokiatest2001:~# mtr -b -w -c 2 10.192.54.22
   Start: 2025-04-16T20:33:15+0100
   HOST: nokiatest2001                                      Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2053.lsw4-a8-codfw.codfw.wmnet (10.192.46.1)   0.0%     2   0.2   0.1   0.1   0.2   0.0
     2.|-- ens3f0np0.nokiatest2002.codfw.wmnet (10.192.54.22)  0.0%     2    0.2   0.3   0.2   0.3   0.0
   root@nokiatest2002:~# mtr -b -w -c 2 10.192.46.21 
   Start: 2025-04-16T20:31:31+0100
   HOST: nokiatest2002                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2057.lsw4-a8-codfw.codfw.wmnet (10.192.54.2)  0.0%     2    0.1   0.1   0.1   0.2   0.0
     2.|-- nokiatest2001.codfw.wmnet (10.192.46.21)           0.0%     2    0.2   0.2   0.2   0.2   0.0

IPv6

   root@nokiatest2001:~# mtr -b -w -c 2 2620:0:860:129:be97:e1ff:fe28:5450
   Start: 2025-04-16T20:35:17+0100
   HOST: nokiatest2001                                           Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2053.lsw4-a8-codfw.codfw.wmnet (2620:0:860:125::1)  0.0%     2    0.3   4.5   0.3   8.8   6.0
     2.|-- 2620:0:860:129:be97:e1ff:fe28:5450                       0.0%     2    0.3   0.3   0.3   0.3   0.0
   root@nokiatest2002:~# mtr -b -w -c 2 2620:0:860:125::21 
   Start: 2025-04-16T20:33:29+0100
   HOST: nokiatest2002                                           Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2057.lsw4-a8-codfw.codfw.wmnet (2620:0:860:129::2)  0.0%     2    0.1   0.1   0.1   0.2   0.0
     2.|-- nokiatest2001.codfw.wmnet (2620:0:860:125::21)           0.0%     2    0.2   0.2   0.2   0.2   0.0

Inter-Vlan/subnet routing via IRB interfaces on separate switches

This is the a similar test to the last one, however in this case the destination subnet is connected to a separate switch, so the traffic has to traverse the leaf/spine links between racks.

We ping from nokiatest2001 port connected to vlan 2053 on lsw3, which is configured as follows:

   set / interface irb0 subinterface 2052 ipv4 address 10.192.45.1/24
   set / interface irb0 subinterface 2052 ipv6 address 2620:0:860:124::1/64
   root@nokiatest2001:~# ip -br addr show scope global dev ens3f1np1 
   ens3f1np1        UP             10.192.45.21/24 2620:0:860:124::21/64

To nokiatest2002 port connected to vlan 2051 on lsw2, configured as follows on the switch:

   set / interface irb0 subinterface 2051 ipv4 address 10.192.44.1/24
   set / interface irb0 subinterface 2051 ipv6 address 2620:0:860:123::1/64
   root@nokiatest2002:~# ip -br addr show scope global dev eno1 
   eno1             UP             10.192.44.22/24 2620:0:860:123::22/64 

IPv4

   root@nokiatest2001:~# mtr -b -w -c 2 10.192.44.22 
   Start: 2025-04-16T20:43:22+0100
   HOST: nokiatest2001                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2052.lsw3-a8-codfw.codfw.wmnet (10.192.45.1)  0.0%     2    0.2   0.2   0.2   0.2   0.0
     2.|-- lo50.lsw2-a8-codfw.codfw.wmnet (10.192.255.36)     0.0%     2    0.1   0.1   0.1   0.1   0.0
     3.|-- nokiatest2002.codfw.wmnet (10.192.44.22)           0.0%     2    0.2   0.2   0.2   0.2   0.0
   root@nokiatest2002:~# mtr -b -w -c 1 10.192.45.21 
   Start: 2025-04-16T21:00:05+0100
   HOST: nokiatest2002                                      Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2051.lsw2-a8-codfw.codfw.wmnet (10.192.44.1)   0.0%     1    0.2   0.2   0.2   0.2   0.0
     2.|-- irb0-2052.lsw3-a8-codfw.codfw.wmnet (10.192.45.1)   0.0%     1    0.2   0.2   0.2   0.2   0.0
     3.|-- ens3f1np1.nokiatest2001.codfw.wmnet (10.192.45.21)  0.0%     1    0.2   0.2   0.2   0.2   0.0

NOTE: During this test a bug was observed. Initially in the traceroute I was getting a response from anycast-gw IP 10.192.53.1 at hop 2 (lsw3). The reverse DNS for this did not include the switch name, so I removed the irb sub-interface it was configured on entirely, to force lsw3 to choose another IP to source the TTL exceededs from. What was odd is that even with the subinterface it belonged to deleted lsw3 kept using that IP to source ICMP messages.

IPv6

   root@nokiatest2001:~# mtr -b -w -c 2 2620:0:860:123::22
   Start: 2025-04-16T21:03:28+0100
   HOST: nokiatest2001                                           Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2052.lsw3-a8-codfw.codfw.wmnet (2620:0:860:124::1)  0.0%     2    0.2   0.7   0.2   1.3   0.8
     2.|-- irb0-2055.lsw2-a8-codfw.codfw.wmnet (2620:0:860:127::2)  0.0%     2    0.2   0.2   0.2   0.2   0.0
     3.|-- nokiatest2002.codfw.wmnet (2620:0:860:123::22)           0.0%     2    0.3   0.3   0.3   0.3   0.0
   root@nokiatest2002:~# mtr -b -w -c 2 2620:0:860:124::21 
   Start: 2025-04-16T21:01:38+0100
   HOST: nokiatest2002                                            Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2051.lsw2-a8-codfw.codfw.wmnet (2620:0:860:123::1)   0.0%     2    0.2   0.2   0.2   0.3   0.0
     2.|-- irb0-2052.lsw3-a8-codfw.codfw.wmnet (2620:0:860:124::1)  50.0%     2    0.3   0.3   0.3   0.3   0.0
     3.|-- ens3f1np1.nokiatest2001.codfw.wmnet (2620:0:860:124::21)  0.0%     2    0.3   0.3   0.3   0.3   0.0

IPv6 RA Generation from Vlan Interfaces

We generate IPv6 router-advertisements on vlan interfaces to give hosts their default route (and in some cases configure their address with SLAAC + IP Token).

We can configure this on an IRB sub-interface as follows:

   set / interface irb0 subinterface 2057 ipv6 router-advertisement router-role admin-state enable
   set / interface irb0 subinterface 2057 ipv6 router-advertisement router-role min-advertisement-interval 30
   set / interface irb0 subinterface 2057 ipv6 router-advertisement router-role router-lifetime 600
   set / interface irb0 subinterface 2057 ipv6 router-advertisement router-role prefix 2620:0:860:129::/64

And on a host we can see we receive the RAs every 30 seconds as desired:

   root@nokiatest2002:~# tcpdump -i ens3f0np0 -l -p -nn icmp6 and 'ip6[40] = 134'
   listening on ens3f0np0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   21:06:32.976317 IP6 fe80::200:5eff:fe00:101 > ff02::1: ICMP6, router advertisement, length 56
   21:07:02.903555 IP6 fe80::200:5eff:fe00:101 > ff02::1: ICMP6, router advertisement, length 56
   21:07:32.932953 IP6 fe80::200:5eff:fe00:101 > ff02::1: ICMP6, router advertisement, length 56

If the host is configured to address auto-configuration through SLAAC it works correctly:

   root@nokiatest2002:~# ip -6 -br addr show scope global dev ens3f0np0
   ens3f0np0        UP             2620:0:860:129:be97:e1ff:fe28:5450/64

NOTE: in the case of EVPN anycast GW it does not seem to be possible to configure the IPv6 link-local address manually on an interface. On the Juniper devices we set this to the same thing on every switch, so that all RAs flooded in the Vlan provide the same IP as the default gateway. If we try to configure this on Nokia it fails:

   set / interface irb0 subinterface 2057 ipv6 address fe80::2057:0:1/64 type link-local-unicast
   set / interface irb0 subinterface 2057 ipv6 address fe80::2057:0:1/64 anycast-gw true
   set / interface irb0 subinterface 2057 ipv6 address fe80::2057:0:3/64 type link-local-unicast
   
   A:cmooney@lsw5-a8-codfw# commit now
   Error in /interface[name=irb0]/subinterface[index=2057]/ipv6/address[ip-prefix=fe80::2057:0:1/64]/anycast-gw:
       not supported on link local address
   Error: Commit failed

However this looks to not be an issue, if we check the irb subinterface state on lsw4 it shows us this:

   A:cmooney@lsw4-a8-codfw# info flat from state interface irb0 subinterface 2057 | grep fe80
   / interface irb0 subinterface 2057 ipv6 address fe80::200:5eff:fe00:101/64 type link-local-unicast
   / interface irb0 subinterface 2057 ipv6 address fe80::200:5eff:fe00:101/64 anycast-gw true
   / interface irb0 subinterface 2057 ipv6 address fe80::200:5eff:fe00:101/64 origin link-layer
   / interface irb0 subinterface 2057 ipv6 address fe80::200:5eff:fe00:101/64 status preferred
   / interface irb0 subinterface 2057 ipv6 address fe80::7a1f:7cff:fea8:2fd4/64 type link-local-unicast
   / interface irb0 subinterface 2057 ipv6 address fe80::7a1f:7cff:fea8:2fd4/64 origin link-layer
   / interface irb0 subinterface 2057 ipv6 address fe80::7a1f:7cff:fea8:2fd4/64 status preferred

Notice that is has configured two link-local IPs, the first of which has "anycast-gw true" set. Now let's look at the equivalent on lsw5:

   A:cmooney@lsw5-a8-codfw# info flat from state interface irb0 subinterface 2057 | grep fe80
   / interface irb0 subinterface 2057 ipv6 address fe80::200:5eff:fe00:101/64 type link-local-unicast
   / interface irb0 subinterface 2057 ipv6 address fe80::200:5eff:fe00:101/64 anycast-gw true
   / interface irb0 subinterface 2057 ipv6 address fe80::200:5eff:fe00:101/64 origin link-layer
   / interface irb0 subinterface 2057 ipv6 address fe80::200:5eff:fe00:101/64 status preferred
   / interface irb0 subinterface 2057 ipv6 address fe80::7a1f:7cff:fea8:3bd4/64 type link-local-unicast
   / interface irb0 subinterface 2057 ipv6 address fe80::7a1f:7cff:fea8:3bd4/64 origin link-layer
   / interface irb0 subinterface 2057 ipv6 address fe80::7a1f:7cff:fea8:3bd4/64 status preferred

It also has two link-locals. The second is different, however the first one, again with "anycast-gw true", matches the one lsw4 is using. We can also see the IP matches the one the RAs we captured in the previous test were coming from. So while we cannot configure the same IP everywhere the switches seem to auto-generate the same link-local anycast IP on each one, and use this to source RAs. So there should be no issues or complications with RAs being sourced from different IPs.

MC-LAG / ESI-LAG Function

We can create multi-chassis LAG functionality through the EVPN multi-homing feature by defining an Ethernet Segment and associating with a lag interface. Example configuration here adding a single port on a switch to a lag interface, configured as an access port in vlan2055:

   interface ethernet-1/12 description "qfxtest xe-0/0/12"
   interface ethernet-1/12 admin-state enable
   interface ethernet-1/12 ethernet aggregate-id lag1
   interface lag1 description "ESI-LAG to qfxtest"
   interface lag1 admin-state enable
   interface lag1 vlan-tagging false
   interface lag1 subinterface 2055 type bridged
   interface lag1 subinterface 2055 admin-state enable
   interface lag1 lag lag-type lacp
   interface lag1 lag lacp interval FAST
   interface lag1 lag lacp lacp-mode ACTIVE
   interface lag1 lag lacp admin-key 101
   interface lag1 lag lacp system-id-mac 00:00:00:00:00:11
   interface lag1 lag lacp system-priority 101
   system network-instance protocols evpn ethernet-segments bgp-instance 1 ethernet-segment LAG1 admin-state enable
   system network-instance protocols evpn ethernet-segments bgp-instance 1 ethernet-segment LAG1 esi 00:01:00:00:00:00:00:00:00:01
   system network-instance protocols evpn ethernet-segments bgp-instance 1 ethernet-segment LAG1 interface lag1
   system network-instance protocols bgp-vpn bgp-instance 1
   network-instance vlan-2055 interface lag1.2055

The configuration on both Nokia switches is the exact same. We connected one port of a single Juniper QFX5100 to each Nokia with this configuration:

   set chassis aggregated-devices ethernet device-count 1
   
   set interfaces xe-0/0/12 description "lsw2-a8-codfw ethernet1/12"
   set interfaces xe-0/0/12 ether-options 802.3ad ae0
   
   set interfaces xe-0/0/24 description "lsw3-a8-codfw ethernet1/24"
   set interfaces xe-0/0/24 ether-options 802.3ad ae0
   
   set interfaces ae0 description srl_mc_lag_test
   set interfaces ae0 mtu 9192
   set interfaces ae0 aggregated-ether-options lacp active
   set interfaces ae0 aggregated-ether-options lacp periodic fast
   set interfaces ae0 unit 0 family ethernet-switching interface-mode access
   set interfaces ae0 unit 0 family ethernet-switching vlan members private1-l-codfw
   
   set vlans private1-l-codfw vlan-id 2055
   set vlans private1-l-codfw l3-interface irb.2055
   
   set interfaces irb unit 2055 description private1-l-codfw
   set interfaces irb unit 2055 family inet address 10.192.52.187/24
   
   set routing-options static route 0.0.0.0/0 next-hop 10.192.52.1

The interfaces show 'up' on the Juniper side and the LCAP looks like a normal LAG to a single device:

   root@qfxtest> show interfaces descriptions 
   Interface       Admin Link Description
   xe-0/0/12       up    up   lsw2-a8-codfw ethernet1/12
   xe-0/0/24       up    up   lsw3-a8-codfw ethernet1/24
   ae0             up    up   srl_mc_lag_test
   irb.2055        up    up   private1-l-codfw
   root@qfxtest> show lacp interfaces ae0 
   Aggregated interface: ae0
       LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
         xe-0/0/12      Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
         xe-0/0/12    Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
         xe-0/0/24      Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
         xe-0/0/24    Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
       LACP protocol:        Receive State  Transmit State          Mux State 
         xe-0/0/12                 Current   Fast periodic Collecting distributing
         xe-0/0/24                 Current   Fast periodic Collecting distributing


On the Nokia side we can see the LAG looks good on both switches:

   A:lsw2-a8-codfw# show lag lag1 lacp-state
   -------------------------------------------------------------------------------------------------------------------------------------
   LACP State for lag1
   -------------------------------------------------------------------------------------------------------------------------------------
   Lag Id         : lag1
   Interval       : FAST
   Mode           : ACTIVE
   System Id      : 00:00:00:00:00:11
   System Priority: 101
   +--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
   |   Members    |   Oper    | Activity  |  Timeout  |   State   | System Id | Oper key  |  Partner  |  Partner  |  Port No  |  Partner  |
   |              |   state   |           |           |           |           |           |    Id     |    Key    |           |  Port No  |
   +==============+===========+===========+===========+===========+===========+===========+===========+===========+===========+===========+
   | ethernet-    | up        | ACTIVE    | SHORT     | IN_SYNC/T | 00:00:00: | 101       | 38:4F:49: | 1         | 1         | 1         |
   | 1/12         |           |           |           | rue/True/ | 00:00:11  |           | A5:F5:80  |           |           |           |
   |              |           |           |           | True      |           |           |           |           |           |           |
   +--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
   A:lsw3-a8-codfw# show lag lag1 lacp-state
   -------------------------------------------------------------------------------------------------------------------------------------
   LACP State for lag1
   -------------------------------------------------------------------------------------------------------------------------------------
   Lag Id         : lag1
   Interval       : FAST
   Mode           : ACTIVE
   System Id      : 00:00:00:00:00:11
   System Priority: 101
   +--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
   |   Members    |   Oper    | Activity  |  Timeout  |   State   | System Id | Oper key  |  Partner  |  Partner  |  Port No  |  Partner  |
   |              |   state   |           |           |           |           |           |    Id     |    Key    |           |  Port No  |
   +==============+===========+===========+===========+===========+===========+===========+===========+===========+===========+===========+
   | ethernet-    | up        | ACTIVE    | SHORT     | IN_SYNC/T | 00:00:00: | 101       | 38:4F:49: | 1         | 1         | 2         |
   | 1/24         |           |           |           | rue/True/ | 00:00:11  |           | A5:F5:80  |           |           |           |
   |              |           |           |           | True      |           |           |           |           |           |           |
   +--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+

The Ethernet-segment exists and looks ok on both also:

   A:lsw2-a8-codfw# show system network-instance ethernet-segments LAG1
   --------------------------------------------------------------------------------------------
   LAG1 is up, all-active
     ESI      : 00:01:00:00:00:00:00:00:00:01
     Alg      : default
     Peers    : 10.192.252.37
     Interface: lag1
     Next-hop : N/A
     evi      : N/A
     Network-instances:
        vlan-2055
         Candidates : 10.192.252.36, 10.192.252.37 (DF)
         Interface : lag1.2055
   A:lsw3-a8-codfw# show system network-instance ethernet-segments LAG1
   --------------------------------------------------------------------------------------------
   LAG1 is up, all-active
     ESI      : 00:01:00:00:00:00:00:00:00:01
     Alg      : default
     Peers    : 10.192.252.36
     Interface: lag1
     Next-hop : N/A
     evi      : N/A
     Network-instances:
        vlan-2055
         Candidates : 10.192.252.36, 10.192.252.37 (DF)
         Interface : lag1.2055

The EVPN routes for the ES originated from lsw3 can be seen on lsw2:

   A:lsw2-a8-codfw# show network-instance default protocols bgp routes evpn route-type 1 summary
   ----------------------------------------------------------------------------------------------------------------------------------
   Show report for the BGP route table of network-instance "default"
   ----------------------------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale
   Origin codes: i=IGP, e=EGP, ?=incomplete
   ----------------------------------------------------------------------------------------------------------------------------------
   BGP Router ID: 10.192.252.36      AS: 64812      Local AS: 64812
   ----------------------------------------------------------------------------------------------------------------------------------
   ----------------------------------------------------------------------------------------------------------------------------------
   Type 1 Ethernet Auto-Discovery Routes
   +--------+--------------------+--------------------------------+------------+--------------------+--------------------+--------------------+
   | Status |       Route-       |              ESI               |   Tag-ID   |      neighbor      |      Next-hop      |       Label        |
   |        |   distinguisher    |                                |            |                    |                    |                    |
   +========+====================+================================+============+====================+====================+====================+
   | u*>    | 10.192.252.37:2055 | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.34      | 10.192.252.37      | 2002055            |
   | *      | 10.192.252.37:2055 | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.35      | 10.192.252.37      | 2002055            |
   | u*>    | 10.192.252.37:2055 | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.34      | 10.192.252.37      | -                  |
   | *      | 10.192.252.37:2055 | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.35      | 10.192.252.37      | -                  |
   +--------+--------------------+--------------------------------+------------+--------------------+--------------------+--------------------+
   A:lsw2-a8-codfw# show network-instance default protocols bgp routes evpn route-type 4 summary
   ----------------------------------------------------------------------------------------------------------------------------------
   Show report for the BGP route table of network-instance "default"
   ----------------------------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale
   Origin codes: i=IGP, e=EGP, ?=incomplete
   ----------------------------------------------------------------------------------------------------------------------------------
   BGP Router ID: 10.192.252.36      AS: 64812      Local AS: 64812
   ----------------------------------------------------------------------------------------------------------------------------------
   Type 4 Ethernet Segment Routes
   +--------+-----------------------+--------------------------------+-----------------------+-----------------------+-----------------------+
   | Status |  Route-distinguisher  |              ESI               |  originating-router   |       neighbor        |       Next-Hop        |
   +========+=======================+================================+=======================+=======================+=======================+
   | u*>    | 10.192.252.37:0       | 00:01:00:00:00:00:00:00:00:01  | 10.192.252.37         | 10.192.252.34         | 10.192.252.37         |
   | *      | 10.192.252.37:0       | 00:01:00:00:00:00:00:00:00:01  | 10.192.252.37         | 10.192.252.35         | 10.192.252.37         |
   +--------+-----------------------+--------------------------------+-----------------------+-----------------------+-----------------------+

And we can see the routes from lsw2 on lsw3:

   A:lsw3-a8-codfw# show network-instance default protocols bgp routes evpn route-type 1 summary
   ----------------------------------------------------------------------------------------------------------------------------------
   Show report for the BGP route table of network-instance "default"
   ----------------------------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale
   Origin codes: i=IGP, e=EGP, ?=incomplete
   ----------------------------------------------------------------------------------------------------------------------------------
   BGP Router ID: 10.192.252.37      AS: 64812      Local AS: 64812
   ----------------------------------------------------------------------------------------------------------------------------------
   ----------------------------------------------------------------------------------------------------------------------------------
   Type 1 Ethernet Auto-Discovery Routes
   +--------+--------------------+--------------------------------+------------+--------------------+--------------------+--------------------+
   | Status |       Route-       |              ESI               |   Tag-ID   |      neighbor      |      Next-hop      |       Label        |
   |        |   distinguisher    |                                |            |                    |                    |                    |
   +========+====================+================================+============+====================+====================+====================+
   | u*>    | 10.192.252.36:2055 | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.34      | 10.192.252.36      | 2002055            |
   | *      | 10.192.252.36:2055 | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.35      | 10.192.252.36      | 2002055            |
   | u*>    | 10.192.252.36:2055 | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.34      | 10.192.252.36      | -                  |
   | *      | 10.192.252.36:2055 | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.35      | 10.192.252.36      | -                  |
   +--------+--------------------+--------------------------------+------------+--------------------+--------------------+--------------------+
   A:lsw3-a8-codfw# show network-instance default protocols bgp routes evpn route-type 4 summary
   ----------------------------------------------------------------------------------------------------------------------------------
   Show report for the BGP route table of network-instance "default"
   ----------------------------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale
   Origin codes: i=IGP, e=EGP, ?=incomplete
   ----------------------------------------------------------------------------------------------------------------------------------
   BGP Router ID: 10.192.252.37      AS: 64812      Local AS: 64812
   ----------------------------------------------------------------------------------------------------------------------------------
   Type 4 Ethernet Segment Routes
   +--------+-----------------------+--------------------------------+-----------------------+-----------------------+-----------------------+
   | Status |  Route-distinguisher  |              ESI               |  originating-router   |       neighbor        |       Next-Hop        |
   +========+=======================+================================+=======================+=======================+=======================+
   | u*>    | 10.192.252.36:0       | 00:01:00:00:00:00:00:00:00:01  | 10.192.252.36         | 10.192.252.34         | 10.192.252.36         |
   | *      | 10.192.252.36:0       | 00:01:00:00:00:00:00:00:00:01  | 10.192.252.36         | 10.192.252.35         | 10.192.252.36         |
   +--------+-----------------------+--------------------------------+-----------------------+-----------------------+-----------------------+

The Juniper switch's IRB interface MAC is visible on both switches participating in the LAG:

   root@qfxtest> show interfaces irb | match "Hardware address" 
     Current address: 38:4f:49:a5:f5:80, Hardware address: 38:4f:49:a5:f5:80
   A:lsw2-a8-codfw# show network-instance vlan-2055 bridge-table mac-table mac 38:4f:49:a5:f5:80
   -------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2055
   -------------------------------------------------------------------------------------------------
   Mac                     : 38:4F:49:A5:F5:80
   Destination             : lag1.2055
   Dest Index              : 15
   Type                    : learnt
   Programming Status      : Success
   Aging                   : 1194
   Last Update             : 2025-04-12T14:16:59.000Z
   Duplicate Detect time   : N/A
   Hold down time remaining: N/A
   -------------------------------------------------------------------------------------------------
   A:lsw3-a8-codfw# show network-instance vlan-2055 bridge-table mac-table mac 38:4f:49:a5:f5:80
   -------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2055
   -------------------------------------------------------------------------------------------------
   Mac                     : 38:4F:49:A5:F5:80
   Destination             : lag1.2055
   Dest Index              : 15
   Type                    : learnt
   Programming Status      : Success
   Aging                   : 1170
   Last Update             : 2025-04-12T14:16:12.000Z
   Duplicate Detect time   : N/A
   Hold down time remaining: N/A
   -------------------------------------------------------------------------------------------------

We can also see it learnt in EVPN on other switches participating in vlan 2055, with the next-hop listed as the ESI ID:

   A:lsw4-a8-codfw# show network-instance vlan-2055 bridge-table mac-table mac 38:4f:49:a5:f5:80
   -------------------------------------------------------------------------------------------------
   Mac-table of network instance vlan-2055
   -------------------------------------------------------------------------------------------------
   Mac                     : 38:4F:49:A5:F5:80
   Destination             : vxlan-interface:vxlan0.2055 esi:00:01:00:00:00:00:00:00:00:01
   Dest Index              : 7235590
   Type                    : evpn
   Programming Status      : Success
   Aging                   : N/A
   Last Update             : 2025-04-12T14:58:09.000Z
   Duplicate Detect time   : N/A
   Hold down time remaining: N/A
   -------------------------------------------------------------------------------------------------

Routes to the ESI to the two devices participating in the LAG look correct (they are learnt twice, once from each RR):

   A:lsw4-a8-codfw# show network-instance default protocols bgp routes evpn route-type 1 esi 00:01:00:00:00:00:00:00:00:01 summary
   -------------------------------------------------------------------------------------------------------------------------------------------
   Show report for the BGP route table of network-instance "default"
   -------------------------------------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale
   Origin codes: i=IGP, e=EGP, ?=incomplete
   -------------------------------------------------------------------------------------------------------------------------------------------
   BGP Router ID: 10.192.252.38      AS: 64812      Local AS: 64812
   -------------------------------------------------------------------------------------------------------------------------------------------
   -------------------------------------------------------------------------------------------------------------------------------------------
   Type 1 Ethernet Auto-Discovery Routes
   +--------+----------------------+--------------------------------+------------+----------------------+----------------------+----------------------+
   | Status | Route-distinguisher  |              ESI               |   Tag-ID   |       neighbor       |       Next-hop       |        Label         |
   +========+======================+================================+============+======================+======================+======================+
   | u*>    | 10.192.252.36:2055   | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.34        | 10.192.252.36        | 2002055              |
   | *      | 10.192.252.36:2055   | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.35        | 10.192.252.36        | 2002055              |
   | u*>    | 10.192.252.36:2055   | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.34        | 10.192.252.36        | -                    |
   | *      | 10.192.252.36:2055   | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.35        | 10.192.252.36        | -                    |
   | u*>    | 10.192.252.37:2055   | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.34        | 10.192.252.37        | 2002055              |
   | *      | 10.192.252.37:2055   | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.35        | 10.192.252.37        | 2002055              |
   | u*>    | 10.192.252.37:2055   | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.34        | 10.192.252.37        | -                    |
   | *      | 10.192.252.37:2055   | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.35        | 10.192.252.37        | -                    |
   +--------+----------------------+--------------------------------+------------+----------------------+----------------------+----------------------+
   8 Ethernet Auto-Discovery routes 4 used, 8 valid

If we ping the QFX's IRB interfacec IP from elsewhere on the network it looks good, from a device connected to a switch participating in the LAG and from elsewhere:

   root@nokiatest2001:~# mtr -b -w -c 2 10.192.52.187 
   Start: 2025-04-12T16:12:08+0100
   HOST: nokiatest2001                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2052.lsw3-a8-codfw.codfw.wmnet (10.192.45.1)  0.0%     2    0.1   0.1   0.1   0.1   0.0
     2.|-- 10.192.52.187                                      0.0%     2   11.9  15.1  11.9  18.3   4.6
   root@nokiatest2002:~# mtr -b -w -c 2 10.192.52.187 
   Start: 2025-04-12T16:09:57+0100
   HOST: nokiatest2002                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2057.lsw5-a8-codfw.codfw.wmnet (10.192.54.3)  0.0%     2    0.2   0.2   0.2   0.2   0.0
     2.|-- ???                                               100.0     2    0.0   0.0   0.0   0.0   0.0
     3.|-- 10.192.52.187                                      0.0%     2   19.6  19.8  19.6  20.0   0.3

If we disable one of the Juniper interfaces while pinging we see that we lose 2 pings, after which traffic is re-routed to use the link via the other switch:

   root@qfxtest# set interfaces xe-0/0/12 disable    
   root@nokiatest2002:~# ping 10.192.52.187 
   PING 10.192.52.187 (10.192.52.187) 56(84) bytes of data.
   64 bytes from 10.192.52.187: icmp_seq=1 ttl=62 time=117 ms
   64 bytes from 10.192.52.187: icmp_seq=2 ttl=62 time=21.9 ms
   64 bytes from 10.192.52.187: icmp_seq=3 ttl=62 time=12.8 ms
   64 bytes from 10.192.52.187: icmp_seq=6 ttl=62 time=33.3 ms
   64 bytes from 10.192.52.187: icmp_seq=7 ttl=62 time=11.8 ms

We can see on the Nokia switch *not* participating in the LAG that the ESI routes have updated. We now only have routes with one of the next-hops, the ones from the other switch were withdrawn when it's link went down:

   A:lsw4-a8-codfw# show network-instance default protocols bgp routes evpn route-type 1 esi 00:01:00:00:00:00:00:00:00:01 summary
   -------------------------------------------------------------------------------------------------------------------------------------
   Show report for the BGP route table of network-instance "default"
   -------------------------------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale
   Origin codes: i=IGP, e=EGP, ?=incomplete
   -------------------------------------------------------------------------------------------------------------------------------------
   BGP Router ID: 10.192.252.38      AS: 64812      Local AS: 64812
   -------------------------------------------------------------------------------------------------------------------------------------
   -------------------------------------------------------------------------------------------------------------------------------------
   Type 1 Ethernet Auto-Discovery Routes
   +--------+--------------------+--------------------------------+------------+--------------------+--------------------+--------------------+
   | Status |       Route-       |              ESI               |   Tag-ID   |      neighbor      |      Next-hop      |       Label        |
   |        |   distinguisher    |                                |            |                    |                    |                    |
   +========+====================+================================+============+====================+====================+====================+
   | u*>    | 10.192.252.37:2055 | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.34      | 10.192.252.37      | 2002055            |
   | *      | 10.192.252.37:2055 | 00:01:00:00:00:00:00:00:00:01  | 0          | 10.192.252.35      | 10.192.252.37      | 2002055            |
   | u*>    | 10.192.252.37:2055 | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.34      | 10.192.252.37      | -                  |
   | *      | 10.192.252.37:2055 | 00:01:00:00:00:00:00:00:00:01  | 4294967295 | 10.192.252.35      | 10.192.252.37      | -                  |
   +--------+--------------------+--------------------------------+------------+--------------------+--------------------+--------------------+
   4 Ethernet Auto-Discovery routes 2 used, 4 valid

BGP Peering on Vlan segment to end device

These tests validate the kind of BGP sessions we have to servers and VMs.

IPv4 only

Switch config:

   set / network-instance PRODUCTION protocols bgp afi-safi ipv4-unicast admin-state enable
   set / network-instance PRODUCTION protocols bgp group k8s peer-as 64602
   set / network-instance PRODUCTION protocols bgp group k8s local-as as-number 14907
   set / network-instance PRODUCTION protocols bgp group k8s local-as prepend-global-as false
   set / network-instance PRODUCTION protocols bgp neighbor 10.192.46.21 description nokiatest2001
   set / network-instance PRODUCTION protocols bgp neighbor 10.192.46.21 peer-group k8s

Device config (FRR - used as it is easy to configure on the fly in vtysh):

   router bgp 64602
    neighbor 10.192.46.1 remote-as 14907
    neighbor 10.192.46.1 description lsw4-a8-codfw
    !
    address-family ipv4 unicast
     redistribute connected
     neighbor 10.192.46.1 route-map ALL-IN in
     neighbor 10.192.46.1 route-map BGP:OUT out
    exit-address-family
   exit
   !
   ip prefix-list PFX:TEST seq 5 permit 5.6.7.8/32
   !
   route-map BGP:OUT permit 100
    match ip address prefix-list PFX:TEST
   exit
   !
   route-map ALL-IN permit 100
   exit

The BGP session establishes:

   nokiatest2001# show bgp summary  
   
   IPv4 Unicast Summary (VRF default):
   BGP router identifier 10.192.46.21, local AS number 64602 vrf-id 0
   BGP table version 911
   RIB entries 1758, using 330 KiB of memory
   Peers 1, using 724 KiB of memory
   
   Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
   10.192.46.1     4      14907       138        28        0    0    0 00:12:16          903        0 lsw4-a8-codfw
   
   Total number of neighbors 1

We receive routes as expected (though we tend not to use this functionality):

   nokiatest2001# show bgp ipv4 unicast neighbors 10.192.46.1 routes 
   BGP table version is 4525, local router ID is 5.6.7.8, vrf id 0
   Default local pref 100, local AS 64602
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 0.0.0.0/0        10.192.46.1                            0 14907 14907 i
   *> 4.15.72.112/30   10.192.46.1                            0 14907 14907 i
   *> 4.16.71.244/30   10.192.46.1                            0 14907 14907 i
   *> 4.53.96.36/30    10.192.46.1                            0 14907 14907 i
   *> 10.2.1.6/32      10.192.46.1                            0 14907 14907 64811 64600 i
   *> 10.2.1.7/32      10.192.46.1                            0 14907 14907 64811 64600 i
   *> 10.2.1.8/32      10.192.46.1                            0 14907 14907 64811 64600 i
   *> 10.2.1.9/32      10.192.46.1                            0 14907 14907 64811 64600 i

We are announcing a test route to the switch:

   nokiatest2001# show bgp ipv4 unicast neighbors 10.192.46.1 advertised-routes 
   BGP table version is 4527, local router ID is 5.6.7.8, vrf id 0
   Default local pref 100, local AS 64602
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 5.6.7.8/32       0.0.0.0                  0         32768 ?

This is received by the Nokia switch:

   A:lsw4-a8-codfw# show network-instance PRODUCTION protocols bgp neighbor 10.192.46.21 received-routes ipv4
   ---------------------------------------------------------------------------------------------------------------
   Peer        : 10.192.46.21, remote AS: 64602, local AS: 14907
   Type        : static
   Description : nokiatest2001
   Group       : k8s
   ---------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale
   Origin codes: i=IGP, e=EGP, ?=incomplete
   +-----------------------------------------------------------------------------------------------------------------------+
   |    Status        Network        Path-id        Next Hop         MED          LocPref         AsPath         Origin    |
   +=======================================================================================================================+
   |     u*>        5.6.7.8/32     0              10.192.46.21        -                        [64602]              ?      |
   +-----------------------------------------------------------------------------------------------------------------------+

On another switch in the fabric we can see the route:

   A:lsw2-a8-codfw# show network-instance PRODUCTION route-table ipv4-unicast route 5.6.7.8
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   IPv4 unicast route table of network instance PRODUCTION
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+
   |          Prefix          |  ID   | Route Type |     Route Owner      |  Active  |  Origin  | Metric  |    Pref    |    Next-hop    |    Next-hop    |  Backup Next-  |   Backup Next-hop   |
   |                          |       |            |                      |          | Network  |         |            |     (Type)     |   Interface    |   hop (Type)   |      Interface      |
   |                          |       |            |                      |          | Instance |         |            |                |                |                |                     |
   +==========================+=======+============+======================+==========+==========+=========+============+================+================+================+=====================+
   | 5.6.7.8/32               | 0     | bgp-evpn   | bgp_evpn_mgr         | True     | PRODUCTI | 28      | 170        | 10.192.252.38/ |                |                |                     |
   |                          |       |            |                      |          | ON       |         |            | 32 (indirect/v |                |                |                     |
   |                          |       |            |                      |          |          |         |            | xlan)          |                |                |                     |
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+

And from another machine on the fabric we can ping it:

   root@nokiatest2002:~# mtr -b -w -c 2 5.6.7.8
   Start: 2025-04-17T16:12:06+0100
   HOST: nokiatest2002                                            Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2051.lsw2-a8-codfw.codfw.wmnet (10.192.44.1)         0.0%     2    0.2   0.2   0.2   0.3   0.0
     2.|-- lo50.lsw4-a8-codfw.codfw.wmnet (10.192.255.38)            0.0%     2    0.2   0.2   0.2   0.2   0.0
     3.|-- 5.6.7.8                                                   0.0%     2    0.3   0.3   0.3   0.3   0.0

Lastly the route is propagated correctly to external peers (such as CR routers) from other switches (e.g. spines) in the fabric:

   cmooney@re0.cr2-codfw> show route table inet.0 5.6.7.8/32 exact terse 
   
   inet.0: 972817 destinations, 3388680 routes (972590 active, 4 holddown, 539 hidden)
   Restart Complete
   + = Active Route, - = Last Active, * = Both
   
   A V Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
   * ? 5.6.7.8/32         B 170        100                             64812 64602 ?
     unverified                                       >10.192.254.17

(I should not have used this IP, but Telefonica DE should also use RPKI so it looks invalid to our CRs!)

IPv4 carrying IPv4 & IPv6 address families

For this test we simply enable IPv6 unicast safi for the existing peer:

   set / network-instance PRODUCTION protocols bgp neighbor 10.192.46.21 afi-safi ipv6-unicast admin-state enable

On the host side we add this to the FRR config:

   ipv6 prefix-list PFX:TEST6 seq 5 permit 2001:db8::187/128
   route-map BGP:OUT6 permit 100
    match ipv6 address prefix-list PFX:TEST6
   router bgp 64602
    address-family ipv6 unicast
     redistribute connected
     neighbor 10.192.46.1 activate
     neighbor 10.192.46.1 route-map ALL-IN in
     neighbor 10.192.46.1 route-map BGP:OUT6 out

The BGP session establishes:

   nokiatest2001# show bgp ipv6 unicast summary 
   BGP router identifier 198.18.187.187, local AS number 64602 vrf-id 0
   BGP table version 697
   RIB entries 1340, using 251 KiB of memory
   Peers 1, using 724 KiB of memory
   
   Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
   10.192.46.1     4      14907       490       117        0    0    0 00:05:33          687        1 lsw4-a8-codfw

We receive routes ok:

   nokiatest2001# show bgp ipv6 unicast neighbors 10.192.46.1 routes 
   BGP table version is 697, local router ID is 198.18.187.187, vrf id 0
   Default local pref 100, local AS 64602
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> ::/0             ::ffff:ac0:2e01                        0 14907 14907 i
   *> 2001:218:4000:5000::148/126
                       ::ffff:ac0:2e01                        0 14907 14907 i
   *> 2001:418:0:5000::3b2/127
                       ::ffff:ac0:2e01                        0 14907 14907 i
   *> 2001:418:0:5000::6fa/127
                       ::ffff:ac0:2e01                        0 14907 14907 i
   *> 2001:418:16::110/127
                       ::ffff:ac0:2e01                        0 14907 14907 i
   *> 2001:418:16::118/127
                       ::ffff:ac0:2e01                        0 14907 14907 i

And announce our test one:

   nokiatest2001# show bgp ipv6 unicast neighbors 10.192.46.1 advertised-routes 
   BGP table version is 698, local router ID is 198.18.187.187, vrf id 0
   Default local pref 100, local AS 64602
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 2001:db8::187/128
                       ::                       0         32768 ?

Which is received fine by the switch:

   A:lsw4-a8-codfw# show network-instance PRODUCTION protocols bgp neighbor 10.192.46.21 received-routes ipv6
   -------------------------------------------------------------------------------------------------------------------------------------------------
   Peer        : 10.192.46.21, remote AS: 64602, local AS: 14907
   Type        : static
   Description : nokiatest2001
   Group       : k8s
   -------------------------------------------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale
   Origin codes: i=IGP, e=EGP, ?=incomplete
   +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   |       Status              Network              Path-id              Next Hop               MED                LocPref               AsPath               Origin       |
   +=======================================================================================================================================================================+
   |        u*>           2001:db8::187/128    0                    2620:0:860:125::21           -                                 [64602]                       ?         |
   +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Once again this is present on other switches in the fabric learnt over IBGP:

   A:lsw2-a8-codfw# show network-instance PRODUCTION route-table ipv6-unicast route 2001:db8::187
   ---------------------------------------------------------------------------------------------------
   IPv6 unicast route table of network instance PRODUCTION
   ---------------------------------------------------------------------------------------------------
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+
   |          Prefix          |  ID   | Route Type |     Route Owner      |  Active  |  Origin  | Metric  |    Pref    |    Next-hop    |    Next-hop    |  Backup Next-  |   Backup Next-hop   |
   |                          |       |            |                      |          | Network  |         |            |     (Type)     |   Interface    |   hop (Type)   |      Interface      |
   |                          |       |            |                      |          | Instance |         |            |                |                |                |                     |
   +==========================+=======+============+======================+==========+==========+=========+============+================+================+================+=====================+
   | 2001:db8::187/128        | 0     | bgp-evpn   | bgp_evpn_mgr         | True     | PRODUCTI | 28      | 170        | 10.192.252.38/ |                |                |                     |
   |                          |       |            |                      |          | ON       |         |            | 32 (indirect/v |                |                |                     |
   |                          |       |            |                      |          |          |         |            | xlan)          |                |                |                     |
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+

And we can route to it fine from elsewhere on the fabric:

   root@nokiatest2002:~# mtr -b -w -c 2 2001:db8::187
   Start: 2025-04-17T16:36:32+0100
   HOST: nokiatest2002                                           Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2051.lsw2-a8-codfw.codfw.wmnet (2620:0:860:123::1)  0.0%     2    0.3   0.3   0.3   0.3   0.0
     2.|-- lo50.lsw4-a8-codfw.codfw.wmnet (2620:0:860:13f::27)      0.0%     2    0.3   0.3   0.3   0.3   0.0
     3.|-- 2001:db8::187                                            0.0%     2    0.3   0.3   0.3   0.3   0.0

Lastly it's properly exported to EBGP peers (i.e. to CR from Spine):

   cmooney@re0.cr2-codfw> show route table inet6.0 2001:db8::187/128 exact terse 
   inet6.0: 214220 destinations, 1020296 routes (214174 active, 0 holddown, 167 hidden)
   Restart Complete
   + = Active Route, - = Last Active, * = Both
   
   A V Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
   * ? 2001:db8::187/128  B 170        100                             64812 64602 ?
     unverified                                       >2620:0:860:138::2

Separate peering over IPv6 carrying IPv6 SAFI

To validate this we will revert to the configuration in the IPv4 only test, and then add a second neighbor relationship over the IPv6 global unicast IPs either side.

We add the new neighbor on the switch side, only enabling it for IPv6:

   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:125::21 description nokiatest2001
   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:125::21 peer-group k8s
   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:125::21 afi-safi ipv4-unicast admin-state disable
   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:125::21 afi-safi ipv6-unicast admin-state enable

And on the host side in FRR:

   router bgp 64602
    neighbor 2620:0:860:125::1 remote-as 14907
    neighbor 2620:0:860:125::1 description lsw4-a8-codfw
    !
    address-family ipv4 unicast
     no neighbor 2620:0:860:125::1 activate
    exit-address-family
    !
    address-family ipv6 unicast
     redistribute connected
     neighbor 2620:0:860:125::1 activate
     neighbor 2620:0:860:125::1 route-map ALL-IN in
     neighbor 2620:0:860:125::1 route-map BGP:OUT6 out

The session establishes:

   nokiatest2001# show bgp ipv6 unicast summary 
   BGP router identifier 198.18.187.187, local AS number 64602 vrf-id 0
   BGP table version 2095
   RIB entries 1340, using 251 KiB of memory
   Peers 1, using 724 KiB of memory
   
   Neighbor          V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
   2620:0:860:125::1 4      14907       105        16        0    0    0 00:01:45          687        1 lsw4-a8-codfw

And we both receive and send IPv6 routes:

   nokiatest2001# show bgp ipv6 unicast neighbors 2620:0:860:125::1 routes 
   BGP table version is 2098, local router ID is 198.18.187.187, vrf id 0
   Default local pref 100, local AS 64602
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> ::/0             2620:0:860:125::1
                                                              0 14907 14907 i
   *> 2001:218:4000:5000::148/126
                       2620:0:860:125::1
                                                              0 14907 14907 i
   *> 2001:418:0:5000::3b2/127
                       2620:0:860:125::1
                                                              0 14907 14907 i
   *> 2001:418:0:5000::6fa/127
                       2620:0:860:125::1
                                                              0 14907 14907 i
   nokiatest2001# show bgp ipv6 unicast neighbors 2620:0:860:125::1 advertised-routes 
   BGP table version is 2100, local router ID is 198.18.187.187, vrf id 0
   Default local pref 100, local AS 64602
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 2001:db8::187/128
                       ::                       0         32768 ?

The route remains pingable from elsewhere on the fabric:

   root@nokiatest2002:~# mtr -b -w -c 2 2001:db8::187
   Start: 2025-04-17T16:52:18+0100
   HOST: nokiatest2002                                           Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb0-2051.lsw2-a8-codfw.codfw.wmnet (2620:0:860:123::1)  0.0%     2    0.2   0.2   0.2   0.3   0.0
     2.|-- lo50.lsw4-a8-codfw.codfw.wmnet (2620:0:860:13f::27)      0.0%     2    0.3   0.3   0.3   0.3   0.0
     3.|-- 2001:db8::187                                            0.0%     2    0.3   0.3   0.3   0.3   0.0

And it is still announced outside:

   cmooney@re0.cr2-codfw> show route table inet6.0 2001:db8::187/128 exact terse    
   
   inet6.0: 214418 destinations, 1020361 routes (213969 active, 408 holddown, 201 hidden)
   Restart Complete
   + = Active Route, - = Last Active, * = Both
   
   A V Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
   * ? 2001:db8::187/128  B 170        100                             64812 64602 ?
     unverified                                       >2620:0:860:138::2

BFD on BGP session from switch to server

We enable BFD on the switch side by both enabling it for the BGP group, and also for the irb0 sub-interface:

   set / bfd subinterface irb0.2053 admin-state enable
   set / network-instance PRODUCTION protocols bgp group k8s failure-detection enable-bfd true

On the host/frr side we enable BFD for the two neighbors:

   router bgp 64602
    neighbor 10.192.46.1 bfd
    neighbor 2620:0:860:125::1 bfd

Once complete it comes up:

   nokiatest2001# show bfd peers brief 
   Session count: 2
   SessionId  LocalAddress                             PeerAddress                             Status         
   =========  ============                             ===========                             ======         
   2028478858 2620:0:860:125::21                       2620:0:860:125::1                       up             
   3954664370 10.192.46.21                             10.192.46.1                             up   

Also looks good on the switch side:

   A:lsw4-a8-codfw# info from state bfd network-instance PRODUCTION
       bfd {
           network-instance PRODUCTION {
               peer 16388 {
                   oper-state up
                   local-address 10.192.46.1
                   remote-address 10.192.46.21
                   remote-discriminator 3954664370
                   subscribed-protocols BGP
                   session-state UP
                   remote-session-state UP
                   last-state-transition "2025-04-17T16:09:52.869Z (2 minutes ago)"
                   failure-transitions 0
                   local-diagnostic-code NO_DIAGNOSTIC
                   remote-diagnostic-code NO_DIAGNOSTIC
                   remote-minimum-receive-interval 300000
                   remote-control-plane-independent false
                   active-transmit-interval 1000000
                   active-receive-interval 1000000
                   remote-multiplier 3
                   async {
                       last-packet-transmitted "2025-04-17T16:12:19.769Z (4 seconds ago)"
                       last-packet-received "2025-04-17T16:12:19.229Z (4 seconds ago)"
                       transmitted-packets 195
                       received-packets 468
                       up-transitions 1
                   }
               }
               peer 16389 {
                   oper-state up
                   local-address 2620:0:860:125::1
                   remote-address 2620:0:860:125::21
                   remote-discriminator 2028478858
                   subscribed-protocols BGP
                   session-state UP
                   remote-session-state UP
                   last-state-transition "2025-04-17T16:09:52.351Z (2 minutes ago)"
                   failure-transitions 0
                   local-diagnostic-code NO_DIAGNOSTIC
                   remote-diagnostic-code NO_DIAGNOSTIC
                   remote-minimum-receive-interval 300000
                   remote-control-plane-independent false
                   active-transmit-interval 1000000
                   active-receive-interval 1000000
                   remote-multiplier 3
                   async {
                       last-packet-transmitted "2025-04-17T16:12:19.868Z (4 seconds ago)"
                       last-packet-received "2025-04-17T16:12:19.779Z (4 seconds ago)"
                       transmitted-packets 195
                       received-packets 477
                       up-transitions 1
                   }
               }
           }
       }

eBGP Peering in VRF to external device

The key thing we want to test here is the connectivity to our CR rotuers or other Spine switches, which will be using EBGP. We want to ensure that all local routes known from IBGP get exported correctly (esp. in the EVPN context).

We add the following configuration to match what we do on our Junipers, with a separate peering over each address family to distribute routes for that SAFI:

   set / routing-policy prefix-set no_host_routes4 prefix 0.0.0.0/0 mask-length-range 32..32
   set / routing-policy prefix-set no_host_routes6 prefix ::/0 mask-length-range 128..128
   
   set / routing-policy prefix-set overlay_loopback4 prefix 10.192.255.0/24 mask-length-range 32..32
   set / routing-policy prefix-set overlay_loopback6 prefix 2620:0:860:13f::/64 mask-length-range 128..128
   
   set / routing-policy policy core_evpn_out statement loopback4 match prefix-set overlay_loopback4
   set / routing-policy policy core_evpn_out statement loopback6 match prefix-set overlay_loopback6
   
   set / routing-policy policy core_evpn_out statement no_host_routes4 match prefix-set no_host_routes4
   set / routing-policy policy core_evpn_out statement no_host_routes4 match bgp as-path-length value 0
   set / routing-policy policy core_evpn_out statement no_host_routes4 action policy-result reject
   
   set / routing-policy policy core_evpn_out statement no_host_routes6 match prefix-set no_host_routes6
   set / routing-policy policy core_evpn_out statement no_host_routes6 match bgp as-path-length value 0
   set / routing-policy policy core_evpn_out statement no_host_routes6 action policy-result reject
   
   set / network-instance PRODUCTION protocols bgp autonomous-system 64812
   set / network-instance PRODUCTION protocols bgp router-id 10.192.252.34
   set / network-instance PRODUCTION protocols bgp afi-safi ipv4-unicast admin-state enable
   
   set / network-instance PRODUCTION protocols bgp group core export-policy [ core_evpn_out ]
   set / network-instance PRODUCTION protocols bgp group core import-policy [ ALL ]
   set / network-instance PRODUCTION protocols bgp group core timers minimum-advertisement-interval 1
   
   set / network-instance PRODUCTION protocols bgp neighbor 10.192.254.14 description cr1-codfw
   set / network-instance PRODUCTION protocols bgp neighbor 10.192.254.14 peer-as 14907
   set / network-instance PRODUCTION protocols bgp neighbor 10.192.254.14 peer-group core
   set / network-instance PRODUCTION protocols bgp neighbor 10.192.254.14 afi-safi ipv6-unicast admin-state disable
   
   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:137::1 description cr1-codfw
   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:137::1 peer-as 14907
   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:137::1 peer-group core
   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:137::1 afi-safi ipv4-unicast admin-state disable
   set / network-instance PRODUCTION protocols bgp neighbor 2620:0:860:137::1 afi-safi ipv6-unicast admin-state enable

On the CR side we add the new peers like any other peer in our "Switch" BGP group@

   set protocols bgp group Switch neighbor 10.192.254.15 description ssw2-a8-codfw
   set protocols bgp group Switch neighbor 10.192.254.15 family inet unicast
   set protocols bgp group Switch neighbor 10.192.254.15 peer-as 64812
   set protocols bgp group Switch neighbor 2620:0:860:137::2 description ssw2-a8-codfw
   set protocols bgp group Switch neighbor 2620:0:860:137::2 family inet6 unicast
   set protocols bgp group Switch neighbor 2620:0:860:137::2 peer-as 64812

With this done BGP establishes:

   cmooney@re0.cr1-codfw> show bgp summary group Switch | match 64812 
   10.192.254.15         64812      22648      34755       0       4 6d 17:05:22 Establ
   2620:0:860:137::2       64812      22661      34261       0       4 6d 17:05:22 Establ

The core routers learn the networks they should:

   cmooney@re0.cr1-codfw> show route table inet.0 receive-protocol bgp 10.192.254.15 
   
   inet.0: 972634 destinations, 2100876 routes (972633 active, 1 holddown, 2 hidden)
   Restart Complete
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 10.192.44.0/24          10.192.254.15                           64812 I
   * 10.192.45.0/24          10.192.254.15                           64812 I
   * 10.192.46.0/24          10.192.254.15                           64812 I
   * 10.192.47.0/24          10.192.254.15                           64812 I
   * 10.192.52.0/24          10.192.254.15                           64812 I
   * 10.192.54.0/24          10.192.254.15                           64812 I
   * 10.192.55.0/24          10.192.254.15                           64812 I
     10.192.254.14/31        10.192.254.15                           64812 I
     10.192.254.16/31        10.192.254.15                           64812 I
   * 10.192.255.34/32        10.192.254.15                           64812 I
   * 10.192.255.35/32        10.192.254.15                           64812 I
   * 10.192.255.36/32        10.192.254.15                           64812 I
   * 10.192.255.37/32        10.192.254.15                           64812 I
   * 10.192.255.38/32        10.192.254.15                           64812 I
   * 10.192.255.39/32        10.192.254.15                           64812 I
   cmooney@re0.cr1-codfw> show route table inet6.0 receive-protocol bgp 2620:0:860:137::2 
   
   inet6.0: 214224 destinations, 567162 routes (214223 active, 1 holddown, 0 hidden)
   Restart Complete
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 2620:0:860:123::/64     2620:0:860:137::2                       64812 I
   * 2620:0:860:124::/64     2620:0:860:137::2                       64812 I
   * 2620:0:860:125::/64     2620:0:860:137::2                       64812 I
   * 2620:0:860:126::/64     2620:0:860:137::2                       64812 I
   * 2620:0:860:127::/64     2620:0:860:137::2                       64812 I
   * 2620:0:860:129::/64     2620:0:860:137::2                       64812 I
   * 2620:0:860:12a::/64     2620:0:860:137::2                       64812 I
     2620:0:860:137::/64     2620:0:860:137::2                       64812 I
     2620:0:860:138::/64     2620:0:860:137::2                       64812 I
     2620:0:860:13f::23/128
   *                         2620:0:860:137::2                       64812 I
     2620:0:860:13f::24/128
   *                         2620:0:860:137::2                       64812 I
     2620:0:860:13f::25/128
   *                         2620:0:860:137::2                       64812 I
     2620:0:860:13f::26/128
   *                         2620:0:860:137::2                       64812 I
     2620:0:860:13f::27/128
   *                         2620:0:860:137::2                       64812 I
     2620:0:860:13f::28/128
   *                         2620:0:860:137::2                       64812 I


Note that there does not appear to be any way to get the Nokia devices to export IBGP routes with a MED equal to the OSPF cost to to the next-hop / originating switch loopback. We have raised this with Nokia. It is not a day-to-day concern but it does mean we may have sub-optimal routing in certain failure scenarios (like when a Spine is disconnected from a LEAF but the CR passes traffic for that leaf to it anyway).

The routes are received correctly on

   A:ssw2-a8-codfw# show network-instance PRODUCTION protocols bgp neighbor 10.192.254.14 received-routes ipv4 | head --lines 21
   ----------------------------------------------------------------------------------------------------------------------
   Peer        : 10.192.254.14, remote AS: 14907, local AS: 64812
   Type        : static
   Description : cr1-codfw
   Group       : core
   ----------------------------------------------------------------------------------------------------------------------
   Status codes: u=used, *=valid, >=best, x=stale, b=backup
   Origin codes: i=IGP, e=EGP, ?=incomplete
   +-------------------------------------------------------------------------------------------------------------------------------+
   |    Status          Network         Path-id        Next Hop           MED           LocPref         AsPath          Origin     |
   +===============================================================================================================================+
   |      u*>        0.0.0.0/0       0               10.192.254.14         -                         [14907]               i       |
   |      u*>        4.15.72.112/3   0               10.192.254.14        400                        [14907]               i       |
   |                 0                                                                                                             |
   |      u*>        4.16.71.244/3   0               10.192.254.14        310                        [14907]               i       |
   |                 0                                                                                                             |
   |      u*>        4.53.96.36/30   0               10.192.254.14        245                        [14907]               i       |
   |       *         10.2.1.6/32     0               10.192.254.14         5                         [14907,               i       |
   |                                                                                                 64811, 64600]                 |
   |       *         10.2.1.7/32     0               10.192.254.14         5                         [14907,               i       |
   |                                                                                                 64811, 64600]                 |
   --{ running }--[  ]--
   A:ssw2-a8-codfw#


As can be seen the CRs export the routes with a MED. If we look at one of these routes from a leaf device (learning them over EVPN SAFI) we can see that these MEDs are preserved:

  A:lsw2-a8-codfw# show network-instance default protocols bgp routes evpn route-type 5 prefix 10.67.134.192/26 detail
  -------------------------------------------------------------------------------------------------------------
  Show report for the EVPN routes in network-instance  "default"
  -------------------------------------------------------------------------------------------------------------
  Route Distinguisher: 10.192.252.34:5000
  Tag-ID             : 0
  ip-prefix-len      : 26
  ip-prefix          : 10.67.134.192/26
  neighbor           : 10.192.252.34
  Gateway IP         : 0.0.0.0
  Received paths     : 1
    Path 1: <Best,Valid,Used,>
      ESI               : 00:00:00:00:00:00:00:00:00:00
      Label             : 3005000
      Route source      : neighbor 10.192.252.34 (last modified 16h3m41s ago)
      Route preference  : MED is 312, LocalPref is 100
      Atomic Aggr       : false
      BGP next-hop      : 10.192.252.34
      AS Path           :  i [14907, 64601]
      Communities       : [14907:14, target:64812:5000, mac-nh:78:1f:7c:5e:12:6a, bgp-tunnel-encap:VXLAN]
      RR Attributes     : No Originator-ID, Cluster-List is []
      Aggregation       : None
      Unknown Attr      : None
      Invalid Reason    : None
      Tie Break Reason  : none
  ---------------------------------------------------------------------------------------------------------------
  Route Distinguisher: 10.192.252.35:5000
  Tag-ID             : 0
  ip-prefix-len      : 26
  ip-prefix          : 10.67.134.192/26
  neighbor           : 10.192.252.35
  Gateway IP         : 0.0.0.0
  Received paths     : 1
    Path 1: <Best,Valid,Used,>
      ESI               : 00:00:00:00:00:00:00:00:00:00
      Label             : 3005000
      Route source      : neighbor 10.192.252.35 (last modified 16h3m41s ago)
      Route preference  : MED is 317, LocalPref is 100
      Atomic Aggr       : false
      BGP next-hop      : 10.192.252.35
      AS Path           :  i [14907, 64601]
      Communities       : [14907:14, target:64812:5000, mac-nh:78:1f:7c:5e:0e:6a, bgp-tunnel-encap:VXLAN]
      RR Attributes     : No Originator-ID, Cluster-List is []
      Aggregation       : None
      Unknown Attr      : None
      Invalid Reason    : None
      Tie Break Reason  : none
  ---------------------------------------------------------------------------------------------------------------

This is good and will lead to more optimal selection of what Spine (and thus CR router) to send traffic to for destinations, matching with CR router has the shorter path to the destination. For unicast IBGP this will happen of course, as the MED is obligatory to carry in IBGP, but keeping it via EVPN IBGP was not something Juniper did (see https://phabricator.wikimedia.org/T344547#9301201)

ECMP Routing in underlay network / OSPF

Our IBGP routes are announced with a next-hop of the system0 loopback for a device. We need to enable ECMP for next-hop resolution to these destination so that all available paths are used to go between A and B. As OSPF is our underlay protocol we thus configure:

   set / network-instance default protocols ospf instance ospfv2 max-ecmp-paths 16

With this in place we can see that there are two next-hops shown on lsw2-a8 when we look at the route for lsw3-a8's loopback:

   A:lsw2-a8-codfw# show network-instance default route-table ipv4-unicast route 10.192.252.37
   -------------------------------------------------------------------------------------------
   IPv4 unicast route table of network instance default
   -------------------------------------------------------------------------------------------
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+
   |          Prefix          |  ID   | Route Type |     Route Owner      |  Active  |  Origin  | Metric  |    Pref    |    Next-hop    |    Next-hop    |  Backup Next-  |   Backup Next-hop   |
   |                          |       |            |                      |          | Network  |         |            |     (Type)     |   Interface    |   hop (Type)   |      Interface      |
   |                          |       |            |                      |          | Instance |         |            |                |                |                |                     |
   +==========================+=======+============+======================+==========+==========+=========+============+================+================+================+=====================+
   | 10.192.252.37/32         | 0     | ospfv2     | ospf_mgr             | True     | default  | 28      | 10         | 10.192.253.124 | ethernet-      |                |                     |
   |                          |       |            |                      |          |          |         |            | (direct)       | 1/56.0         |                |                     |
   |                          |       |            |                      |          |          |         |            | 10.192.253.126 | ethernet-      |                |                     |
   |                          |       |            |                      |          |          |         |            | (direct)       | 1/55.0         |                |                     |
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+

We can also examinie this closer in terms of the "state" of the next-hop grouping:

   A:lsw2-a8-codfw# info from state network-instance default route-table ipv4-unicast route 10.192.252.37/32 id 0 route-type ospfv2 route-owner ospf_mgr origin-network-instance default
       network-instance default {
           route-table {
               ipv4-unicast {
                   route 10.192.252.37/32 id 0 route-type ospfv2 route-owner ospf_mgr origin-network-instance default {
                       leakable false
                       metric 28
                       preference 10
                       active true
                       last-app-update "2025-04-16T20:00:33.782Z (a day ago)"
                       next-hop-group 6429212
                       next-hop-group-network-instance default
                       resilient-hash false
                       fib-programming {
                           suppressed false
                           last-successful-operation-type modify
                           last-successful-operation-timestamp "2025-04-16T20:00:33.783Z (a day ago)"
                           pending-operation-type none
                           last-failed-operation-type none
                       }
                   }
               }
           }
       }
   A:lsw2-a8-codfw# info from state network-instance default route-table next-hop-group 6429212
       network-instance default {
           route-table {
               next-hop-group 6429212 {
                   backup-next-hop-group 0
                   fib-programming {
                       last-successful-operation-type add
                       last-successful-operation-timestamp "2025-04-16T19:54:31.333Z (a day ago)"
                       pending-operation-type none
                       last-failed-operation-type none
                   }
                   next-hop 0 {
                       next-hop 6429195
                       resolved not-applicable
                   }
                   next-hop 1 {
                       next-hop 6429208
                       resolved not-applicable
                   }
               }
           }
       }
   A:lsw2-a8-codfw# info from state network-instance default route-table next-hop 6429195
       network-instance default {
           route-table {
               next-hop 6429195 {
                   type direct
                   ip-address 10.192.253.124
                   subinterface ethernet-1/56.0
               }
           }
       }
   A:lsw2-a8-codfw# info from state network-instance default route-table next-hop 6429208
       network-instance default {
           route-table {
               next-hop 6429208 {
                   type direct
                   ip-address 10.192.253.126
                   subinterface ethernet-1/55.0
               }
           }
       }
   --{ + running }--[  ]--

ECMP Routing in overlay network / BGP

The OSPF ECMP ensures that all links between switch A and switch B are used when following a route announced with a next-hop of switch B's loopback.

But we also have the case where multiple switches announce the same route in IBGP, for instance when two Spines propagate the default route they learn from our core routers. In this case we want the BGP RIB to consider the two routes as equal and select both loopbacks as valid destinations for it.

IBGP does this by default on Nokia, we need to enable it for EVPN as follows:

   set / network-instance default protocols ospf instance ospfv2 max-ecmp-paths 32

IPv4

If we look at this route we may think that all traffic following the default route will be sent to 10.192.252.34 (ssw2-a8-codfw):

   A:lsw2-a8-codfw# show network-instance PRODUCTION route-table ipv4-unicast prefix 0.0.0.0/0
   ------------------------------------------------------------------------------------------
   IPv4 unicast route table of network instance PRODUCTION
   ------------------------------------------------------------------------------------------
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+
   |          Prefix          |  ID   | Route Type |     Route Owner      |  Active  |  Origin  | Metric  |    Pref    |    Next-hop    |    Next-hop    |  Backup Next-  |   Backup Next-hop   |
   |                          |       |            |                      |          | Network  |         |            |     (Type)     |   Interface    |   hop (Type)   |      Interface      |
   |                          |       |            |                      |          | Instance |         |            |                |                |                |                     |
   +==========================+=======+============+======================+==========+==========+=========+============+================+================+================+=====================+
   | 0.0.0.0/0                | 0     | bgp-evpn   | bgp_evpn_mgr         | True     | PRODUCTI | 8       | 170        | 10.192.252.34/ |                |                |                     |
   |                          |       |            |                      |          | ON       |         |            | 32 (indirect/v |                |                |                     |
   |                          |       |            |                      |          |          |         |            | xlan)          |                |                |                     |
   |                          |       |            |                      |          |          |         |            | 10.192.252.35/ |                |                |                     |
   |                          |       |            |                      |          |          |         |            | 32 (indirect/v |                |                |                     |
   |                          |       |            |                      |          |          |         |            | xlan)          |                |                |                     |
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+

We can also verify with "info from state" and see the two next-hops are part of the next-hop-group, one going via the uplink to one spine, one via the other.

IPv6

We see the same with IPv6

   A:lsw2-a8-codfw# show network-instance PRODUCTION route-table ipv6-unicast prefix ::/0
   --------------------------------------------------------------------------------------
   IPv6 unicast route table of network instance PRODUCTION
   --------------------------------------------------------------------------------------
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+
   |          Prefix          |  ID   | Route Type |     Route Owner      |  Active  |  Origin  | Metric  |    Pref    |    Next-hop    |    Next-hop    |  Backup Next-  |   Backup Next-hop   |
   |                          |       |            |                      |          | Network  |         |            |     (Type)     |   Interface    |   hop (Type)   |      Interface      |
   |                          |       |            |                      |          | Instance |         |            |                |                |                |                     |
   +==========================+=======+============+======================+==========+==========+=========+============+================+================+================+=====================+
   | ::/0                     | 0     | bgp-evpn   | bgp_evpn_mgr         | True     | PRODUCTI | 8       | 170        | 10.192.252.34/ |                |                |                     |
   |                          |       |            |                      |          | ON       |         |            | 32 (indirect/v |                |                |                     |
   |                          |       |            |                      |          |          |         |            | xlan)          |                |                |                     |
   |                          |       |            |                      |          |          |         |            | 10.192.252.35/ |                |                |                     |
   |                          |       |            |                      |          |          |         |            | 32 (indirect/v |                |                |                     |
   |                          |       |            |                      |          |          |         |            | xlan)          |                |                |                     |
   +--------------------------+-------+------------+----------------------+----------+----------+---------+------------+----------------+----------------+----------------+---------------------+

ARP Supression

ND Suppression

DHCP Relay & Option 82 insertion

DHCP Relay

Option 82

IP Filters on Routed interface

IPv4

IPv6

IP Filters on IRB interface

IPv4

IPv6

Filter access to RE/CPU/Device Services

Failover Tests

Spine Switch Failure

Management Tests

User Account Creation

SSH Access to Management

SSH Key Auth

Management VRF

SNMP RO Access

NTP

LLDP

Server Side

Switch Side

Per-interface LLDP control

sFlow Export

Prometheus Export

Puppet Agent

Automation Tests

RESTCONF

Partial config replace