VXLAN-EVPN Network Testing - Eqiad Expansion

From Wikitech

This page documents various tests done to validate the correct operation of networking devices in the new cage (rows E/F) in Eqiad (Equinix, Ashburn). The switches in these rows has been set up differently to those in other rows, following an updated deisgn.

The principal change in the new design is the use of per-rack Vlans/subnets where possible, with end systems connected to an overlay network built with VXLAN/EVPN. The use of VXLAN/EVPN ensures that any edge-cases requiring layer-2 adjacency between racks can be addressed, if needed. The preference is to minimize where and when Vlans need to be extended between racks, however.

Test Topology

Tests were carried out with the above physical topology. Purple/blue links to the CR routers use 100GBase-CWDM4 and terminate into the overlay network (vrf) at layer-3 on the switches. The green links connect devices in the VXLAN fabric together, again using 100GBase-CWDM4, and provide the underlay network transport. Finally the red links are 10G DAC connections within the racks from server to switch, and terminate into overlay Vlans on the switches.

4 Dell servers, which will ultimately be the first 3 hosts provisioned in the new cage, were commandeered temporarily to perform the testing. A vanilla debian install (from debian ISO image) was used for most of the testing.

As can be seen the full Clos topology as described in the design was not ready before launch. The Juniper QFX5120-32C Spine-layer switches were unavailable due to issues with the supplier / global chip shortage. As a result of this two of the LEAF layer devices (lsw1-e1/f1, QFX5120-48Y) were configured to temporarily operate as both access and aggregation devices. These devices have enough ports to support the 6 remaining Leaf switches installed in phase 1 of the expansion. These devices, E2-E4 and F2-F4, are shown at the bottom and are also QFX5120-48Y switches.

For most tests the temporary topology does not make a difference. It does create a situation where servers connected in separate racks can have a different number of hops to reach each other, and potentially different bandwidth available, but the logical function of the network can still be assessed. Certain tests, mainly relating to failover, need to be initiated from the switches at the bottom, so elastic1093-test was used for those.

Port configurations were varied occasionally to test different features. Where relevant, device configuration snippets are included to show the setup for a test. If not explicitly detailed it can be assumed that configuration for each test has not diverged from the previous.

Functional Tests

L2 Access port

This is a very basic test to validate the normal configuration for an access port works as expected and MAC addresses are learnt by the top-of-rack switch when unicast frames are transmitted by an end host.

Relevant config

ms-fe1012-test was placed into vlan 1031 and the link brought up:

   cmooney@lsw1-e1-eqiad> show configuration vlans private1-e1-eqiad 
   vlan-id 1031;
   l3-interface irb.1031;
   vxlan {
       vni 2001031;
   }
   cmooney@lsw1-e1-eqiad> show configuration interfaces interface-range vlan-private1-e1-eqiad 
   member xe-0/0/7;
   mtu 9192;
   unit 0 {
       family ethernet-switching {
           interface-mode access;
           vlan {
               members private1-e1-eqiad;
           }
       }
   }
   cmooney@lsw1-e1-eqiad> show configuration interfaces xe-0/0/7                                  
   description "ms-fe1012 {#5004}";
   root@ms-fe1012-test:~# ip -br link show enp101s0f0np0
   enp101s0f0np0    UP             e4:3d:1a:12:b1:00 <BROADCAST,MULTICAST,UP,LOWER_UP> 

Results

MAC address learnt on port as expected:

   cmooney@lsw1-e1-eqiad> show ethernet-switching table vlan-name private1-e1-eqiad 
   
   MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
              SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)
   
   
   Ethernet switching table : 1 entries, 1 learned
   Routing instance : default-switch
      Vlan                MAC                 MAC      Logical                SVLBNH/      Active
      name                address             flags    interface              VENH Index   source
      private1-e1-eqiad   e4:3d:1a:12:b1:00   D        xe-0/0/7.0

L2 Trunk port

This is a similar basic test to validate the normal configuration for a trunk port works as expected.

Relevant config

an-worker1147 was connected to vlans 1031 (private1-e1-eqiad) and 1039 (analytics1-e1-eqiad) with a 802.1q trunk as follows:

   cmooney@lsw1-e1-eqiad> show configuration vlans analytics1-e1-eqiad         
   vlan-id 1039;
   l3-interface irb.1039;
   vxlan {
       vni 2001039;
   }
   cmooney@lsw1-e1-eqiad> show configuration vlans private1-e1-eqiad     
   vlan-id 1031;
   l3-interface irb.1031;
   vxlan {
       vni 2001031;
   }
   cmooney@lsw1-e1-eqiad> show configuration interfaces xe-0/0/6              
   description an-worker1147;
   native-vlan-id 1039;
   mtu 9192;
   unit 0 {
       family ethernet-switching {
           interface-mode trunk;
           vlan {
               members [ private1-e1-eqiad analytics1-e1-eqiad ];
           }
       }
   }
   root@an-worker1147-test:~# ip -d link show eno2np1
   5: eno2np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc mq state UP mode DEFAULT group default qlen 1000
       link/ether e4:3d:1a:54:14:45 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 60 maxmtu 9600 addrgenmode eui64 numtxqueues 74 numrxqueues 74 gso_max_size 65536 gso_max_segs 65535 portname p1 switchid 441454feff1a3de4 
       altname enp24s0f1np1
   root@an-worker1147-test:~# ip -d link show eno2np1.1031
   6: eno2np1.1031@eno2np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc noqueue state UP mode DEFAULT group default qlen 1000
       link/ether e4:3d:1a:54:14:45 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 
       vlan protocol 802.1Q id 1031 <REORDER_HDR> addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 

Results

MAC addresses learnt on both Vlans as expected:

   cmooney@lsw1-e1-eqiad> show ethernet-switching table interface xe-0/0/6.0 | match xe-0/0/6 
   
   MAC database for interface xe-0/0/6.0
      analytics1-e1-eqiad e4:3d:1a:54:14:45   D        xe-0/0/6.0           
      private1-e1-eqiad   e4:3d:1a:54:14:45   D        xe-0/0/6.0      


Traffic on non-native vlan received by end host with tags:

   root@an-worker1147-test:~# tcpdump -i eno2np1 -l -p -nn -e port not 22 and net 10.64.0.0/10
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   13:08:13.276275 a4:e1:1a:81:3a:80 > e4:3d:1a:54:14:45, ethertype 802.1Q (0x8100), length 102: vlan 1031, p 0, ethertype IPv4 (0x0800), 10.64.130.1 > 10.64.130.11: ICMP echo request, id 6677, seq 0, length 64
   13:08:13.276315 e4:3d:1a:54:14:45 > a4:e1:1a:81:3a:80, ethertype 802.1Q (0x8100), length 102: vlan 1031, p 0, ethertype IPv4 (0x0800), 10.64.130.11 > 10.64.130.1: ICMP echo reply, id 6677, seq 0, length 64
   13:08:14.276074 a4:e1:1a:81:3a:80 > e4:3d:1a:54:14:45, ethertype 802.1Q (0x8100), length 102: vlan 1031, p 0, ethertype IPv4 (0x0800), 10.64.130.1 > 10.64.130.11: ICMP echo request, id 6677, seq 1, length 64
   13:08:14.276107 e4:3d:1a:54:14:45 > a4:e1:1a:81:3a:80, ethertype 802.1Q (0x8100), length 102: vlan 1031, p 0, ethertype IPv4 (0x0800), 10.64.130.11 > 10.64.130.1: ICMP echo reply, id 6677, seq 1, length 64
   13:08:36.097539 e4:3d:1a:54:14:45 > a4:e1:1a:81:3a:80, ethertype IPv4 (0x0800), length 81: 10.64.138.11.51007 > 10.3.0.1.53: 59784+ A? 1.debian.pool.ntp.org. (39)
   13:08:36.097559 e4:3d:1a:54:14:45 > a4:e1:1a:81:3a:80, ethertype IPv4 (0x0800), length 81: 10.64.138.11.51007 > 10.3.0.1.53: 40077+ AAAA? 1.debian.pool.ntp.org. (39)

Routed port in VRF

Very simple test to confirm link-layer IP connectivity of a routed port terminated into an overlay VRF.

Relevant config

   cmooney@lsw1-e1-eqiad> show configuration interfaces et-0/0/48 unit 100                
   description "cr1-eqiad et-1/0/2.100 - Production VRF";
   vlan-id 100;
   family inet {
       address 10.66.0.9/31;
   }
   family inet6 {
       address 2620:0:861:fe07::2/64;
   }
   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION | display set | match et-0/0/48
   set routing-instances PRODUCTION interface et-0/0/48.100

Results

IPv4

   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 10.66.0.8 size 9000 do-not-fragment count 2            
   PING 10.66.0.8 (10.66.0.8): 9000 data bytes
   9008 bytes from 10.66.0.8: icmp_seq=0 ttl=64 time=2.722 ms
   9008 bytes from 10.66.0.8: icmp_seq=1 ttl=64 time=2.896 ms
   
   --- 10.66.0.8 ping statistics ---
   2 packets transmitted, 2 packets received, 0% packet loss
   round-trip min/avg/max/stddev = 2.722/2.809/2.896/0.087 ms
   

IPv6

   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 2620:0:861:fe07::1 size 9000 do-not-fragment count 2 source 2620:0:861:fe07::2 
   PING6(9048=40+8+9000 bytes) 2620:0:861:fe07::2 --> 2620:0:861:fe07::1
   9008 bytes from 2620:0:861:fe07::1, icmp_seq=0 hlim=64 time=2.915 ms
   9008 bytes from 2620:0:861:fe07::1, icmp_seq=1 hlim=64 time=2.719 ms
   
   --- 2620:0:861:fe07::1 ping6 statistics ---
   2 packets transmitted, 2 packets received, 0% packet loss
   round-trip min/avg/max/std-dev = 2.719/2.817/2.915/0.098 ms

Standard IP gateway / IRB in VRF

This test confirms routing via an IRB interface within an overlay routing instance works as expected.

Relevant Config

   cmooney@lsw1-e1-eqiad> show configuration interfaces irb.1031 
   description private1-e1-eqiad;
   family inet {
       address 10.64.130.1/24;
   }
   family inet6 {
       address 2620:0:861:109::1/64;
   }
   root@ms-fe1012-test:~# ip -br addr show enp101s0f0np0
   enp101s0f0np0    UP             10.64.130.10/24 2620:0:861:109::10/64 fe80::e63d:1aff:fe12:b100/64 

Results

IPv4

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 10.64.130.0    
   
   PRODUCTION.inet.0: 34 destinations, 36 routes (34 active, 0 holddown, 0 hidden)
   @ = Routing Use Only, # = Forwarding Use Only
   + = Active Route, - = Last Active, * = Both
   
   10.64.130.0/24     *[Direct/0] 5d 20:50:06
                       >  via irb.1031
   cmooney@lsw1-e1-eqiad> show arp interface irb.1031 no-resolve 
   MAC Address       Address         Interface                Flags
   e4:3d:1a:12:b1:00 10.64.130.10    irb.1031 [xe-0/0/7.0]    permanent remote
   root@ms-fe1012-test:~# ping -c 2 10.64.130.1 
   PING 10.64.130.1 (10.64.130.1) 56(84) bytes of data.
   64 bytes from 10.64.130.1: icmp_seq=1 ttl=64 time=5.55 ms
   64 bytes from 10.64.130.1: icmp_seq=2 ttl=64 time=6.90 ms
   root@ms-fe1012-test:~# ip -4 neigh show
   10.64.130.1 dev enp101s0f0np0 lladdr a4:e1:1a:81:3a:80 REACHABLE


IPv6

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 2620:0:861:fe07:: 
   
   PRODUCTION.inet6.0: 37 destinations, 39 routes (37 active, 0 holddown, 0 hidden)
   + = Active Route, - = Last Active, * = Both
   
   2620:0:861:fe07::/64
                      *[Direct/0] 4d 14:45:08
                       >  via et-0/0/48.100
   cmooney@lsw1-e1-eqiad> show ipv6 neighbors interface irb.1031 
   IPv6 Address                            Linklayer Address  State       Exp   Rtr  Secure  Interface               
   2620:0:861:109::10                       e4:3d:1a:12:b1:00  reachable   0     no   no      irb.1031 [xe-0/0/7.0]   
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 2620:0:861:109::10 source 2620:0:861:109::1    
   PING6(56=40+8+8 bytes) 2620:0:861:109::1 --> 2620:0:861:109::10
   16 bytes from 2620:0:861:109::10, icmp_seq=0 hlim=64 time=0.633 ms
   16 bytes from 2620:0:861:109::10, icmp_seq=1 hlim=64 time=0.511 ms


   root@ms-fe1012-test:~# ping -c 2 2620:0:861:109::1
   PING 2620:0:861:109::1(2620:0:861:109::1) 56 data bytes
   64 bytes from 2620:0:861:109::1: icmp_seq=1 ttl=64 time=0.475 ms
   64 bytes from 2620:0:861:109::1: icmp_seq=2 ttl=64 time=0.470 ms
   root@ms-fe1012-test:~# ip -6 neigh show
   2620:0:861:109::1 dev enp101s0f0np0 lladdr a4:e1:1a:81:3a:80 STALE

Remote MAC Learning on Vlan/VNI

This test validates MAC address distribution across multiple switches, using BGP EVPN. In order to perform the test Vlan private1-f1-eqiad was added temporarily to switch LSW1-E1, and trunked out to an-worker1147-test.

Relevant Config

   cmooney@lsw1-e1-eqiad> show configuration interfaces xe-0/0/6     
   description an-worker1147;
   native-vlan-id 1039;
   mtu 9192;
   unit 0 {
       family ethernet-switching {
           interface-mode trunk;
           vlan {
               members [ private1-e1-eqiad private1-f1-eqiad analytics1-e1-eqiad ];
           }
       }
   }

Results

MAC address is learnt locally on lsw-e1 on port xe-0/0/6.0:

   cmooney@lsw1-e1-eqiad> show ethernet-switching table vlan-name private1-f1-eqiad            
   
   MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
              SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)
   
   
   Ethernet switching table : 3 entries, 3 learned
   Routing instance : default-switch
      Vlan                MAC                 MAC      Logical                SVLBNH/      Active
      name                address             flags    interface              VENH Index   source
      private1-f1-eqiad   00:00:5e:11:fa:ce   DRP      esi.1797               1796         05:00:00:fd:2a:00:1e:88:8b:00 
      private1-f1-eqiad   a4:e1:1a:81:9e:80   DR       vtep.32770                          10.64.128.7                   
      private1-f1-eqiad   e4:3d:1a:54:14:45   D        xe-0/0/6.0


BGP EVPN type 2 route containing this MAC address is received on lsw-f1 as a result. MAC is learnt on VNI 200135, with protocol next-hop of lsw-e1 loopback/VTEP IP.

   cmooney@lsw1-f1-eqiad> show route protocol bgp table bgp.evpn.0 evpn-mac-address e4:3d:1a:54:14:45 extensive    
   
   2:10.64.128.3:64810::2001035::e4:3d:1a:54:14:45/304 MAC/IP (1 entry, 1 announced)
   TSI:
   Page 0 idx 0, (group EVPN_RR_CLIENTS type Internal) Type 1 val 0xe3c20a4 (adv_entry)
      Advertised metrics:
        Nexthop: 10.64.128.3
        Localpref: 100
        AS path: [64810] I
        Communities: target:64810:1035 encapsulation:vxlan(0x8)
        Cluster ID: 10.64.128.0
        Originator ID: 10.64.128.3
       Advertise: 0000003f
   Path 2:10.64.128.3:64810::2001035::e4:3d:1a:54:14:45
   from 10.64.128.3
   Vector len 4.  Val: 0
           *BGP    Preference: 170/-101
                   Route Distinguisher: 10.64.128.3:64810
                   Next hop type: Indirect, Next hop index: 0
                   Address: 0xc6864c8
                   Next-hop reference count: 109
                   Source: 10.64.128.3
                   Protocol next hop: 10.64.128.3
                   Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                   State: <Active Int Ext>
                   Local AS: 64810 Peer AS: 64810
                   Age: 1:14:23 	Metric2: 8 
                   Validation State: unverified 
                   Task: BGP_64810.10.64.128.3
                   Announcement bits (1): 1-BGP_RT_Background 
                   AS path: I 
                   Communities: target:64810:1035 encapsulation:vxlan(0x8)
                   Import Accepted
                   Route Label: 2001035
                   ESI: 00:00:00:00:00:00:00:00:00:00
                   Localpref: 100
                   Router ID: 10.64.128.3
                   Secondary Tables: default-switch.evpn.0
                   Thread: junos-main 
                   Indirect next hops: 1
                           Protocol next hop: 10.64.128.3 Metric: 8
                           Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                           Indirect path forwarding next hops: 1
                                   Next hop type: Router
                                   Next hop: 10.64.129.6 via et-0/0/52.0
                                   Session Id: 0x0
                                   10.64.128.3/32 Originating RIB: inet.0
                                     Metric: 8 Node path count: 1
                                     Forwarding nexthops: 1
                                           Next hop type: Router
                                           Next hop: 10.64.129.6 via et-0/0/52.0
                                           Session Id: 0x0

The MAC is placed into the Vlan forwarding table as expected:

   cmooney@lsw1-f1-eqiad> show ethernet-switching table vlan-name private1-f1-eqiad                                
   
   MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
              SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)
   
   
   Ethernet switching table : 4 entries, 4 learned
   Routing instance : default-switch
      Vlan                MAC                 MAC      Logical                SVLBNH/      Active
      name                address             flags    interface              VENH Index   source
      private1-f1-eqiad   00:00:5e:11:fa:ce   DRP      esi.1780               1779         05:00:00:fd:2a:00:1e:88:8b:00 
      private1-f1-eqiad   a4:e1:1a:81:3a:80   DR       vtep.32769                          10.64.128.3                   
      private1-f1-eqiad   e4:3d:1a:54:14:45   DR       vtep.32769                          10.64.128.3
      private1-f1-eqiad   e4:3d:1a:54:ab:a7   D        xe-0/0/6.0

Client to Client L2 Unicast forwarding

These tests verify that end hosts can communicate directly amongst themselves within a particular Vlan.

Same Switch (Pure L2)

Relevant Config

   cmooney@lsw1-e1-eqiad> show configuration interfaces xe-0/0/6         
   description an-worker1147;
   native-vlan-id 1039;
   mtu 9192;
   unit 0 {
       family ethernet-switching {
           interface-mode trunk;
           vlan {
               members [ private1-e1-eqiad private1-f1-eqiad analytics1-e1-eqiad ];
           }
       }
   }
   cmooney@lsw1-e1-eqiad> show configuration interfaces interface-range vlan-private1-e1-eqiad 
   member xe-0/0/7;
   mtu 9192;
   unit 0 {
       family ethernet-switching {
           interface-mode access;
           vlan {
               members private1-e1-eqiad;
           }
       }
   }
   cmooney@lsw1-e1-eqiad> show configuration interfaces xe-0/0/7    
   description "ms-fe1012 {#5004}";
   root@ms-fe1012-test:~# ip -d link show enp101s0f0np0
   2: enp101s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc mq state UP mode DEFAULT group default qlen 1000
   link/ether e4:3d:1a:12:b1:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 60 maxmtu 9600 addrgenmode eui64 numtxqueues 74 numrxqueues 74 gso_max_size 65536 gso_max_segs 65535 portname p0 switchid 00b112feff1a3de4 
   root@an-worker1147-test:~# ip -d link show eno2np1.1031
   6: eno2np1.1031@eno2np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc noqueue state UP mode DEFAULT group default qlen 1000
   link/ether e4:3d:1a:54:14:45 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 
   vlan protocol 802.1Q id 1031 <REORDER_HDR> addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 


Results

   cmooney@lsw1-e1-eqiad> show ethernet-switching table vlan-name private1-e1-eqiad 
   
   MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
              SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)
   
   
   Ethernet switching table : 2 entries, 2 learned
   Routing instance : default-switch
      Vlan                MAC                 MAC      Logical                SVLBNH/      Active
      name                address             flags    interface              VENH Index   source
      private1-e1-eqiad   e4:3d:1a:12:b1:00   D        xe-0/0/7.0           
      private1-e1-eqiad   e4:3d:1a:54:14:45   D        xe-0/0/6.0   


   root@an-worker1147-test:~# ping -c 2 10.64.130.10
   PING 10.64.130.10 (10.64.130.10) 56(84) bytes of data.
   64 bytes from 10.64.130.10: icmp_seq=1 ttl=64 time=0.172 ms
   64 bytes from 10.64.130.10: icmp_seq=2 ttl=64 time=0.209 ms
   root@ms-fe1012-test:~# tcpdump -i enp101s0f0np0 -l -p -nn -e icmp
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on enp101s0f0np0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   20:41:55.359338 e4:3d:1a:54:14:45 > e4:3d:1a:12:b1:00, ethertype IPv4 (0x0800), length 98: 10.64.130.11 > 10.64.130.10: ICMP echo request, id 54201, seq 20, length 64
   20:41:55.359373 e4:3d:1a:12:b1:00 > e4:3d:1a:54:14:45, ethertype IPv4 (0x0800), length 98: 10.64.130.10 > 10.64.130.11: ICMP echo reply, id 54201, seq 20, length 64


Remote Switch (VXLAN tunneled)

Relevant Config

Same as for test 2.5: Remote MAC Learning on Vlan/VNI

Results

Same MAC tables as for test 2.5: Remote MAC Learning on Vlan/VNI

   root@an-worker1148-test:~# ping 10.64.134.72 
   PING 10.64.134.72 (10.64.134.72) 56(84) bytes of data.
   64 bytes from 10.64.134.72: icmp_seq=1 ttl=64 time=19.3 ms
   64 bytes from 10.64.134.72: icmp_seq=2 ttl=64 time=0.180 ms
   root@an-worker1147-test:~# tcpdump -i eno2np1.1035 -e -l -p -nn icmp
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1.1035, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   20:48:52.448479 e4:3d:1a:54:ab:a7 > e4:3d:1a:54:14:45, ethertype IPv4 (0x0800), length 98: 10.64.134.11 > 10.64.134.72: ICMP echo request, id 62059, seq 28, length 64
   20:48:52.448505 e4:3d:1a:54:14:45 > e4:3d:1a:54:ab:a7, ethertype IPv4 (0x0800), length 98: 10.64.134.72 > 10.64.134.11: ICMP echo reply, id 62059, seq 28, length 64

Client to Client broadcast forwarding / ingress replication

These test verify broadcasts are correctly flooded to all ports in the Vlan, both local or remote.

Same Switch (Pure L2)

Test involves generating a broadcast on ms-fe1012-test into Vlan1031 / private-e1-eqiad, and confirm it is received on an-worker1147-test. Both of these hosts are connected to lsw-e1-eqiad, so the frame will be processed as a regular L2 frame between local ports in the same Vlan.

Transmit side:

   root@ms-fe1012-test:~# echo "test broadcast in vlan 1031 private-e1-eqiad" | nc -u -b 255.255.255.255 12101 

Receive side:

   root@an-worker1147-test:~# tcpdump -X -i eno2np1.1031 -l -p -nn -e
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1.1031, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   21:22:29.263680 e4:3d:1a:12:b1:00 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 86: 208.80.154.226.46774 > 255.255.255.255.12101: UDP, length 44
   	0x0000:  4500 0048 903c 4000 4011 3f36 d050 9ae2  E..H.<@.@.?6.P..
   	0x0010:  ffff ffff b6b6 2f45 0034 0d9a 7465 7374  ....../E.4..test
   	0x0020:  2062 726f 6164 6361 7374 2069 6e20 766c  .broadcast.in.vl
   	0x0030:  616e 2031 3033 3120 7072 6976 6174 652d  an.1031.private-
   	0x0040:  6531 2d65 7169 6164                      e1-eqiad.


Remote Switch (VXLAN tunneled)

Test involves generating a broadcast on an-worker1148, connected to lsw1-f1-eqiad, and confirming receipt on an-worker1147-test, connected to lsw-e1-eqiad. Test frame will be transmitted in Vlan1035 / private-f1-eqiad, which is configured on both switches for the purposes of testing.

Transmit side:

root@an-worker1148-test:~# echo "test broadcast in vlan 1035 private-f1-eqiad" | nc -u -b 255.255.255.255 12345

Receive side:

   root@an-worker1147-test:~# tcpdump -X -i eno2np1.1035 -l -p -nn -e
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1.1035, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   21:36:49.189166 e4:3d:1a:54:ab:a7 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 87: 10.64.134.11.48282 > 255.255.255.255.12345: UDP, length 45
   	0x0000:  4500 0049 0a29 4000 4011 a030 0a40 860b  E..I.)@.@..0.@..
   	0x0010:  ffff ffff bc9a 3039 0035 d2a7 7465 7374  ......09.5..test
   	0x0020:  2062 726f 6164 6361 7374 2069 6e20 766c  .broadcast.in.vl
   	0x0030:  616e 2031 3033 3520 7072 6976 6174 652d  an.1035.private-
   	0x0040:  6631 2d65 7169 6164 0a                   f1-eqiad.

Client to Client L2 multicast forwarding / ingress replication

Same Switch (Pure L2)

Config the same as 2.7.1 - Client to Client broadcast forwarding / ingress replication

Results

   root@ms-fe1012-test:~# echo "test multicast" | nc -q 1 -u -b 224.0.0.26 12345 
   root@an-worker1147-test:~# tcpdump -e -i eno2np1.1031 -l -nn net 224.0.0.0/8
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1.1031, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   14:43:32.138562 e4:3d:1a:12:b1:00 > 01:00:5e:00:00:1a, ethertype IPv4 (0x0800), length 60: 10.64.130.10.59844 > 224.0.0.26.12345: UDP, length 15


Remote Switch (VXLAN tunneled)

Config the same as 2.7.2 - Client to Client broadcast forwarding / ingress replication

Results

   root@an-worker1148-test:~# echo "test multicast evpn type 3" | nc -q 1 -u -b 224.0.0.26 12345
   root@an-worker1147-test:~# tcpdump -e -i eno2np1.1035 -l -nn net 224.0.0.0/8   
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1.1035, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   17:20:22.025485 e4:3d:1a:54:ab:a7 > 01:00:5e:00:00:1a, ethertype IPv4 (0x0800), length 69: 10.64.134.11.46904 > 224.0.0.26.12345: UDP, length 27

Anycast IP gateway / IRB across multiple switches

When Vlans are stretched across multiple devices 'anycast gws' are configured on each top of rack, with every one acting as gateway for directly connected devices on the Vlan. This test confirms that this is working and the different switches are all responding for this IP.

Config

Tests were done on Vlan1035 / private-f1-eqiad, which is connected to both an-worker1148-test (10.64.134.11) on lsw1-f1-eqiad, and an-worker1147-test (10.64.134.72) on lsw1-e1-eqiad. Layer-2 config the same as in 2.5: Remote MAC Learning on Vlan/VNI.

The IRB interface was configured as follows to provide Anycast GW support:

   cmooney@lsw1-e1-eqiad> show configuration interfaces irb unit 1035 
   virtual-gateway-accept-data;
   description private1-f1-eqiad;
   family inet {
       address 10.64.134.253/24 {
           preferred;
           virtual-gateway-address 10.64.134.1;
       }
   }
   family inet6 {
       address 2620:0:861:10d::253/64 {
           preferred;
           virtual-gateway-address 2620:0:861:10d::1;
       }
   }
   virtual-gateway-v4-mac 00:00:5e:11:fa:ce;
   virtual-gateway-v6-mac 00:00:5e:11:fa:ce;
   cmooney@lsw1-f1-eqiad> show configuration interfaces irb unit 1035 
   virtual-gateway-accept-data;
   description private1-f1-eqiad;
   family inet {
       address 10.64.134.254/24 {
           preferred;
           virtual-gateway-address 10.64.134.1;
       }
   }
   family inet6 {
       address 2620:0:861:10d::254/64 {
           preferred;
           virtual-gateway-address 2620:0:861:10d::1;
       }
   }
   virtual-gateway-v4-mac 00:00:5e:11:fa:ce;
   virtual-gateway-v6-mac 00:00:5e:11:fa:ce;


IPv4

Results

The 3 IPs (one VIP, and 2 per-switch IPs) were pinged from each an-worker1147-test and got a response from each

   root@an-worker1147-test:~# ping -c 1 10.64.134.1
   PING 10.64.134.1 (10.64.134.1) 56(84) bytes of data.
   64 bytes from 10.64.134.1: icmp_seq=1 ttl=64 time=0.677 ms
   root@an-worker1147-test:~# ping -c 1 10.64.134.253
   PING 10.64.134.253 (10.64.134.253) 56(84) bytes of data.
   64 bytes from 10.64.134.253: icmp_seq=1 ttl=64 time=7.93 ms
   root@an-worker1147-test:~# ping -c 1 10.64.134.254
   PING 10.64.134.254 (10.64.134.254) 56(84) bytes of data.
   64 bytes from 10.64.134.254: icmp_seq=1 ttl=64 time=6.06 ms


Doing a monitor on packets processed by lsw1-e1-eqiad we can see that the response from the VIP was processed by it, as well as the ping to the unicast IP:

   cmooney@lsw1-e1-eqiad> monitor traffic interface irb.1035 matching "icmp" no-resolve    
   verbose output suppressed, use <detail> or <extensive> for full protocol decode
   Address resolution is OFF.
   Listening on irb.1035, capture size 96 bytes
   
   12:25:51.348223  In IP 10.64.134.72 > 10.64.134.1: ICMP echo request, id 46233, seq 1, length 64
   12:25:51.348236 Out IP truncated-ip - 52 bytes missing! 10.64.134.1 > 10.64.134.72: ICMP echo reply, id 46233, seq 1, length 64
   12:25:57.886469  In IP 10.64.134.72 > 10.64.134.253: ICMP echo request, id 2678, seq 1, length 64
   12:25:57.886482 Out IP truncated-ip - 52 bytes missing! 10.64.134.253 > 10.64.134.72: ICMP echo reply, id 2678, seq 1, length 64


The ping to .254, unicast IP on lsw1-f1-eqiad, did not show up in that capture, but instead appears in an equivalent capture on lsw1-f1-eqiad:

   cmooney@lsw1-f1-eqiad> monitor traffic interface irb.1035 matching "icmp" no-resolve 
   verbose output suppressed, use <detail> or <extensive> for full protocol decode
   Address resolution is OFF.
   Listening on irb.1035, capture size 96 bytes
   
   12:26:00.938116  In IP 10.64.134.72 > 10.64.134.254: ICMP echo request, id 16304, seq 1, length 64


The reverse is the case when the pings are done from an-worker1148-test:

   root@an-worker1148-test:~# ping -c 1 10.64.134.1
   PING 10.64.134.1 (10.64.134.1) 56(84) bytes of data.
   64 bytes from 10.64.134.1: icmp_seq=1 ttl=64 time=14.5 ms
   root@an-worker1148-test:~# ping -c 1 10.64.134.253
   PING 10.64.134.253 (10.64.134.253) 56(84) bytes of data.
   64 bytes from 10.64.134.253: icmp_seq=1 ttl=64 time=14.9 ms
   root@an-worker1148-test:~# ping -c 1 10.64.134.254
   PING 10.64.134.254 (10.64.134.254) 56(84) bytes of data.
   64 bytes from 10.64.134.254: icmp_seq=1 ttl=64 time=6.25 ms
   cmooney@lsw1-e1-eqiad> monitor traffic interface irb.1035 matching "icmp" no-resolve    
   verbose output suppressed, use <detail> or <extensive> for full protocol decode
   Address resolution is OFF.
   Listening on irb.1035, capture size 96 bytes
   
   12:29:47.721948  In IP 10.64.134.11 > 10.64.134.253: ICMP echo request, id 52386, seq 1, length 64
   cmooney@lsw1-f1-eqiad> monitor traffic interface irb.1035 matching "icmp" no-resolve    
   verbose output suppressed, use <detail> or <extensive> for full protocol decode
   Address resolution is OFF.
   Listening on irb.1035, capture size 96 bytes
   
   12:29:42.998551  In IP 10.64.134.11 > 10.64.134.1: ICMP echo request, id 375, seq 1, length 64
   12:29:42.998564 Out IP truncated-ip - 52 bytes missing! 10.64.134.1 > 10.64.134.11: ICMP echo reply, id 375, seq 1, length 64
   12:29:50.194240  In IP 10.64.134.11 > 10.64.134.254: ICMP echo request, id 8113, seq 1, length 64
   12:29:50.194252 Out IP truncated-ip - 52 bytes missing! 10.64.134.254 > 10.64.134.11: ICMP echo reply, id 8113, seq 1, length 64


ARP entries appear as expected on both hosts, the VIP shows with the same MAC on both devices:

   root@an-worker1147-test:~# ip neigh show dev eno2np1.1035
   10.64.134.1 lladdr 00:00:5e:11:fa:ce STALE
   10.64.134.11 lladdr e4:3d:1a:54:ab:a7 STALE
   10.64.134.253 lladdr a4:e1:1a:81:3a:80 STALE
   10.64.134.254 lladdr a4:e1:1a:81:9e:80 STALE


   root@an-worker1148-test:~# ip neigh show 
   10.64.134.1 dev eno2np1 lladdr 00:00:5e:11:fa:ce STALE
   10.64.134.10 dev eno2np1 lladdr e4:3d:1a:54:14:45 STALE
   10.64.134.72 dev eno2np1 lladdr e4:3d:1a:54:14:45 STALE
   10.64.134.253 dev eno2np1 lladdr a4:e1:1a:81:3a:80 STALE
   10.64.134.254 dev eno2np1 lladdr a4:e1:1a:81:9e:80 DELAY


IPv6

Results

   root@an-worker1147-test:~# ping -c 1 2620:0:861:10d::1
   PING 2620:0:861:10d::1(2620:0:861:10d::1) 56 data bytes
   64 bytes from 2620:0:861:10d::1: icmp_seq=1 ttl=64 time=0.454 ms
   root@an-worker1147-test:~# ping -c 1 2620:0:861:10d::253
   PING 2620:0:861:10d::253(2620:0:861:10d::253) 56 data bytes
   64 bytes from 2620:0:861:10d::253: icmp_seq=1 ttl=64 time=11.1 ms
   root@an-worker1147-test:~# ping -c 1 2620:0:861:10d::254
   PING 2620:0:861:10d::254(2620:0:861:10d::254) 56 data bytes
   64 bytes from 2620:0:861:10d::254: icmp_seq=1 ttl=64 time=11.2 ms
   cmooney@lsw1-e1-eqiad> monitor traffic interface irb.1035 matching "icmp6" no-resolve     
   verbose output suppressed, use <detail> or <extensive> for full protocol decode
   Address resolution is OFF.
   Listening on irb.1035, capture size 96 bytes
   
   12:35:29.299922  In IP6 2620:0:861:10d::72 > 2620:0:861:10d::1: ICMP6, echo request, seq 1, length 64
   12:35:29.299943 Out [|ip6]
   12:35:58.340742  In IP6 2620:0:861:10d::72 > 2620:0:861:10d::253: ICMP6, echo request, seq 1, length 64
   12:35:58.340766 Out [|ip6]
   cmooney@lsw1-f1-eqiad> monitor traffic interface irb.1035 matching "icmp6" no-resolve   
   verbose output suppressed, use <detail> or <extensive> for full protocol decode
   Address resolution is OFF.
   Listening on irb.1035, capture size 96 bytes
   
   12:36:03.678690  In IP6 2620:0:861:10d::72 > 2620:0:861:10d::254: ICMP6, echo request, seq 1, length 64

The same situation is reversed, as expected, doing the same pings from the other host:

   root@an-worker1148-test:~# ping -c 1 2620:0:861:10d::1
   PING 2620:0:861:10d::1(2620:0:861:10d::1) 56 data bytes
   64 bytes from 2620:0:861:10d::1: icmp_seq=1 ttl=64 time=110 ms
   root@an-worker1148-test:~# ping -c 1 2620:0:861:10d::253
   PING 2620:0:861:10d::253(2620:0:861:10d::253) 56 data bytes
   64 bytes from 2620:0:861:10d::253: icmp_seq=1 ttl=64 time=11.3 ms
   root@an-worker1148-test:~# ping -c 1 2620:0:861:10d::254
   PING 2620:0:861:10d::254(2620:0:861:10d::254) 56 data bytes
   64 bytes from 2620:0:861:10d::254: icmp_seq=1 ttl=64 time=11.2 ms
   cmooney@lsw1-e1-eqiad> monitor traffic interface irb.1035 matching "icmp6" no-resolve    
   verbose output suppressed, use <detail> or <extensive> for full protocol decode
   Address resolution is OFF.
   Listening on irb.1035, capture size 96 bytes
   
   12:40:55.072630  In IP6 2620:0:861:10d::11 > 2620:0:861:10d::253: ICMP6, echo request, seq 1, length 64
   cmooney@lsw1-f1-eqiad> monitor traffic interface irb.1035 matching "icmp6" no-resolve    
   verbose output suppressed, use <detail> or <extensive> for full protocol decode
   Address resolution is OFF.
   Listening on irb.1035, capture size 96 bytes
   
   12:40:49.343230  In IP6 2620:0:861:10d::11 > 2620:0:861:10d::1: ICMP6, echo request, seq 1, length 64
   12:40:49.433829 Out [|ip6]
   12:40:58.669189  In IP6 2620:0:861:10d::11 > 2620:0:861:10d::254: ICMP6, echo request, seq 1, length 64
   12:40:58.669212 Out [|ip6]

IPv6 Neighbors are as expected on both hosts:

   root@an-worker1147-test:~# ip -6 neigh show dev eno2np1.1035
   2620:0:861:10d::1 lladdr 00:00:5e:11:fa:ce router STALE
   2620:0:861:10d::253 lladdr a4:e1:1a:81:3a:80 router STALE
   2620:0:861:10d::254 lladdr a4:e1:1a:81:9e:80 router STALE
   root@an-worker1148-test:~# ip -6 neigh show 
   2620:0:861:10d::1 dev eno2np1 lladdr 00:00:5e:11:fa:ce router STALE
   2620:0:861:10d::253 dev eno2np1 lladdr a4:e1:1a:81:3a:80 router STALE
   2620:0:861:10d::254 dev eno2np1 lladdr a4:e1:1a:81:9e:80 router STALE

Inter-Vlan/subnet routing via IRB interfaces on same switch

Tests carried out on lsw1-e1-eqiad, between the IRB interfaces assocatied with private1-e1-eqiad and analytics-e1-eqiad (no filter in place for test).

Relevant Configuration


Relevant Config'

private1-e1-eqiad / ms-fe1012-test:

   cmooney@lsw1-e1-eqiad> show configuration interfaces irb.1031 
   description private1-e1-eqiad;
   family inet {
       address 10.64.130.1/24;
   }
   family inet6 {
       address 2620:0:861:109::1/64;
   }
   root@ms-fe1012-test:~# ip -br addr show dev enp101s0f0np0
   enp101s0f0np0    UP             10.64.130.10/24 2620:0:861:109::10/64 fe80::e63d:1aff:fe12:b100/64 
   root@ms-fe1012-test:~# ip route show | grep default
   default via 10.64.130.1 dev enp101s0f0np0 
   root@ms-fe1012-test:~# ip -6 route show | grep default
   default via 2620:0:861:109::1 dev enp101s0f0np0 metric 1024 pref medium


analytics1-e1-eqiad / an-worker1148-test

   cmooney@lsw1-e1-eqiad> show configuration interfaces irb.1039 
   description analytics1-e1-eqiad;
   family inet {
       address 10.64.138.1/24;
   }
   family inet6 {
       address 2620:0:861:100::1/64;
   }
   root@an-worker1147-test:/etc# ip -br addr show dev eno2np1
   eno2np1          UP             10.64.138.11/24 2620:0:861:100::11/64 fe80::e63d:1aff:fe54:1445/64 
   root@an-worker1147-test:/etc# ip route show | grep default
   default via 10.64.138.1 dev eno2np1 
   root@an-worker1147-test:/etc# ip -6 route show | grep default
   default via 2620:0:861:100::1 dev eno2np1 metric 1024 pref medium

IPv4

Results

   root@ms-fe1012-test:~# mtr --address 10.64.130.10 -b -w -c 5 10.64.138.11
   Start: 2022-02-10T17:07:22+0000
   HOST: ms-fe1012-test                                   Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1031.lsw1-e1-eqiad.eqiad.wmnet (10.64.130.1)  0.0%     5    6.2   6.0   2.3   8.2   2.2
     2.|-- 10.64.138.11                                      0.0%     5    0.3   0.2   0.2   0.3   0.0
   root@an-worker1147-test:~# mtr --address 10.64.138.11 -b -w -c 5 10.64.130.10
   Start: 2022-02-10T17:06:33+0000
   HOST: an-worker1147-test Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1039.lsw1-e1-eqiad.eqiad.wmnet (10.64.138.1)  0.0%     5    5.7   4.5   0.8   9.9   3.6
     2.|-- 10.64.130.10        0.0%     5    0.2   0.2   0.2   0.2   0.0


IPv6

Results

   root@ms-fe1012-test:~# mtr --address 2620:0:861:109::10 -b -w -c 5 2620:0:861:100::11
   Start: 2022-02-10T17:08:11+0000
   HOST: ms-fe1012-test                                         Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1031.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:109::1)  0.0%     5    0.5   0.5   0.5   0.6   0.0
     2.|-- 2620:0:861:100::11                                      0.0%     5    0.2   0.2   0.2   0.2   0.0
   root@an-worker1147-test:~# mtr --address 2620:0:861:100::11 -b -w -c 5 2620:0:861:109::10
   Start: 2022-02-10T17:08:21+0000
   HOST: an-worker1147-test                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1039.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:100::1)  0.0%     5    0.6   0.5   0.5   0.6   0.0
     2.|-- 2620:0:861:109::10                                      0.0%     5    0.2   0.2   0.2   0.2   0.0

Inter-Vlan/subnet routing via IRB interfaces on separate switches

Tests carried out between ms-fe1012-test, connected to private1-e1-eqiad on lsw1-e1-eqiad, and an-worker1148-test, connected to analytics1-f1-eqiad on lsw1-f1-eqiad.

Relevant Config

private1-e1-eqiad / ms-fe1012-test:

   cmooney@lsw1-e1-eqiad> show configuration interfaces irb.1031 
   description private1-e1-eqiad;
   family inet {
       address 10.64.130.1/24;
   }
   family inet6 {
       address 2620:0:861:109::1/64;
   }
   root@ms-fe1012-test:~# ip -br addr show dev enp101s0f0np0
   enp101s0f0np0    UP             10.64.130.10/24 2620:0:861:109::10/64 fe80::e63d:1aff:fe12:b100/64 
   root@ms-fe1012-test:~# ip route show | grep default
   default via 10.64.130.1 dev enp101s0f0np0 
   root@ms-fe1012-test:~# ip -6 route show | grep default
   default via 2620:0:861:109::1 dev enp101s0f0np0 metric 1024 pref medium


analytics1-f1-eqiad / an-worker1148-test:

   cmooney@lsw1-f1-eqiad> show configuration interfaces irb.1043  
   description analytics1-f1-eqiad;
   family inet {
       address 10.64.142.1/24;
   }
   family inet6 {
       address 2620:0:861:114::1/64;
   }
   root@an-worker1148-test:~# ip -br addr show dev eno2np1.1043
   eno2np1.1043@eno2np1 UP             10.64.142.10/24 2620:0:861:114::10/64 fe80::e63d:1aff:fe54:aba7/64 
   root@an-worker1148-test:~# ip route show | grep 10.0.0.0
   10.0.0.0/8 via 10.64.142.1 dev eno2np1.1043 
   root@an-worker1148-test:~# ip -6 route show | grep default
   default via 2620:0:861:114::1 dev eno2np1.1043 metric 1024 pref medium

IPv4

Results

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 10.64.142.0 detail 
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
   10.64.142.0/24 (1 entry, 1 announced)
           *EVPN   Preference: 170/-101
                   Next hop type: Indirect, Next hop index: 0
                   Address: 0xc686b6c
                   Next-hop reference count: 15
                   Next hop type: Router, Next hop index: 1765
                   Next hop: 10.64.129.7 via et-0/0/52.0, selected
                   Session Id: 0x0
                   Protocol next hop: 10.64.128.7
                   Composite next hop: 0xbd31530 1760 INH Session ID: 0x0
                     VXLAN tunnel rewrite:
                       MTU: 0, Flags: 0x0
                       Encap table ID: 0, Decap table ID: 10
                       Encap VNI: 3005000, Decap VNI: 3005000
                       Source VTEP: 10.64.128.3, Destination VTEP: 10.64.128.7
                       SMAC: a4:e1:1a:81:3a:80, DMAC: a4:e1:1a:81:9e:80
                   Indirect next hop: 0xc816e84 524299 INH Session ID: 0x0
                   State: <Active Int Ext>
                   Age: 2d 0:43:19 	Metric2: 8 
                   Validation State: unverified 
                   Task: PRODUCTION-EVPN-L3-context
                   Announcement bits (2): 2-KRT 4-BGP_RT_Background 
                   AS path: I 
                   Thread: junos-main 
   cmooney@lsw1-f1-eqiad> show route table PRODUCTION.inet.0 10.64.130.0 detail 
   
   PRODUCTION.inet.0: 34 destinations, 37 routes (34 active, 0 holddown, 0 hidden)
   10.64.130.0/24 (1 entry, 1 announced)
           *EVPN   Preference: 170/-101
                   Next hop type: Indirect, Next hop index: 0
                   Address: 0xc68733c
                   Next-hop reference count: 21
                   Next hop type: Router, Next hop index: 1748
                   Next hop: 10.64.129.6 via et-0/0/52.0, selected
                   Session Id: 0x0
                   Protocol next hop: 10.64.128.3
                   Composite next hop: 0xbd309a0 1751 INH Session ID: 0x0
                     VXLAN tunnel rewrite:
                       MTU: 0, Flags: 0x0
                       Encap table ID: 0, Decap table ID: 8
                       Encap VNI: 3005000, Decap VNI: 3005000
                       Source VTEP: 10.64.128.7, Destination VTEP: 10.64.128.3
                       SMAC: a4:e1:1a:81:9e:80, DMAC: a4:e1:1a:81:3a:80
                   Indirect next hop: 0xc815b04 524294 INH Session ID: 0x0
                   State: <Active Int Ext>
                   Age: 6d 22:20:29 	Metric2: 8 
                   Validation State: unverified 
                   Task: PRODUCTION-EVPN-L3-context
                   Announcement bits (2): 2-KRT 3-BGP_RT_Background 
                   AS path: I 
                   Thread: junos-main 
   root@ms-fe1012-test:~# mtr --address 10.64.130.10 -c 5 -b -w 10.64.142.10
   Start: 2022-02-10T17:18:41+0000
   HOST: ms-fe1012-test                                   Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1031.lsw1-e1-eqiad.eqiad.wmnet (10.64.130.1)  0.0%     5    3.1   4.7   1.4   7.6   2.6
     2.|-- irb-1043.lsw1-f1-eqiad.eqiad.wmnet (10.64.142.1)  0.0%     5    2.2   3.6   0.9   6.9   2.3
     3.|-- 10.64.142.10                                      0.0%     5    0.2   0.2   0.2   0.2   0.0
   root@an-worker1148-test:~# mtr --address 10.64.142.10 -c 5 -b -w 10.64.130.10 
   Start: 2022-02-10T16:58:30+0000
   HOST: an-worker1148-test                               Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1043.lsw1-f1-eqiad.eqiad.wmnet (10.64.142.1)  0.0%     5    3.7   6.8   3.7   8.4   1.9
     2.|-- irb-1031.lsw1-e1-eqiad.eqiad.wmnet (10.64.130.1)  0.0%     5    7.2   4.5   1.2   7.2   2.5
     3.|-- 10.64.130.10                                      0.0%     5    0.2   0.2   0.2   0.3   0.0

IPv6

Results

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 2620:0:861:114:: detail 
   
   PRODUCTION.inet6.0: 40 destinations, 44 routes (40 active, 0 holddown, 0 hidden)
   2620:0:861:114::/64 (1 entry, 1 announced)
           *EVPN   Preference: 170/-101
                   Next hop type: Indirect, Next hop index: 0
                   Address: 0xc6862d4
                   Next-hop reference count: 13
                   Next hop type: Router, Next hop index: 1765
                   Next hop: 10.64.129.7 via et-0/0/52.0, selected
                   Session Id: 0x0
                   Protocol next hop: 10.64.128.7
                   Composite next hop: 0xbd314e0 1759 INH Session ID: 0x0
                     VXLAN tunnel rewrite:
                       MTU: 0, Flags: 0x0
                       Encap table ID: 0, Decap table ID: 10
                       Encap VNI: 3005000, Decap VNI: 3005000
                       Source VTEP: 10.64.128.3, Destination VTEP: 10.64.128.7
                       SMAC: a4:e1:1a:81:3a:80, DMAC: a4:e1:1a:81:9e:80
                   Indirect next hop: 0xc816d04 524298 INH Session ID: 0x0
                   State: <Active Int Ext>
                   Age: 2d 0:46:31 	Metric2: 8 
                   Validation State: unverified 
                   Task: PRODUCTION-EVPN-L3-context
                   Announcement bits (2): 2-KRT 4-BGP_RT_Background 
                   AS path: I 
                   Thread: junos-main 
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 2620:0:861:114:: detail 
   
   PRODUCTION.inet6.0: 37 destinations, 41 routes (37 active, 0 holddown, 0 hidden)
   2620:0:861:109::/64 (1 entry, 1 announced)
           *EVPN   Preference: 170/-101
                   Next hop type: Indirect, Next hop index: 0
                   Address: 0xc6865f4
                   Next-hop reference count: 19
                   Next hop type: Router, Next hop index: 1748
                   Next hop: 10.64.129.6 via et-0/0/52.0, selected
                   Session Id: 0x0
                   Protocol next hop: 10.64.128.3
                   Composite next hop: 0xbd30950 1750 INH Session ID: 0x0
                     VXLAN tunnel rewrite:
                       MTU: 0, Flags: 0x0
                       Encap table ID: 0, Decap table ID: 8
                       Encap VNI: 3005000, Decap VNI: 3005000
                       Source VTEP: 10.64.128.7, Destination VTEP: 10.64.128.3
                       SMAC: a4:e1:1a:81:9e:80, DMAC: a4:e1:1a:81:3a:80
                   Indirect next hop: 0xc815984 524293 INH Session ID: 0x0
                   State: <Active Int Ext>
                   Age: 6d 22:23:45 	Metric2: 8 
                   Validation State: unverified 
                   Task: PRODUCTION-EVPN-L3-context
                   Announcement bits (2): 2-KRT 3-BGP_RT_Background 
                   AS path: I 
                   Thread: junos-main 
   root@ms-fe1012-test:~# mtr --address 2620:0:861:109::10 -b -w -c 5 2620:0:861:114::10
   Start: 2022-02-10T17:18:04+0000
   HOST: ms-fe1012-test                                         Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1031.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:109::1)  0.0%     5    0.5   0.5   0.5   0.5   0.0
     2.|-- irb-1043.lsw1-f1-eqiad.eqiad.wmnet (2620:0:861:114::1)  0.0%     5    0.5   0.5   0.4   0.6   0.1
     3.|-- 2620:0:861:114::10                                      0.0%     5    0.3   0.3   0.3   0.3   0.0
   root@an-worker1148-test:~# mtr --address 2620:0:861:114::10 -c 5 -b -w 2620:0:861:109::10
   Start: 2022-02-10T17:00:30+0000
   HOST: an-worker1148-test                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1043.lsw1-f1-eqiad.eqiad.wmnet (2620:0:861:114::1)  0.0%     5    0.7   9.0   0.5  42.6  18.8
     2.|-- irb-1031.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:109::1)  0.0%     5    0.6   0.5   0.5   0.6   0.0
     3.|-- 2620:0:861:109::10                                      0.0%     5    0.2   0.2   0.2   0.3   0.0

BGP Peering on Vlan segment to end device

These tests verify that end hosts can successfully create a BGP peering to the switch IRB interface of their connected subnet. This first set of tests validates the normal scenario where there is a per-rack subnet, and thus an IRB interface with only a single unicast IP.

Tests were done with ms-fe1012-test, connected to lsw1-e1-eqiad on Vlan1031 / private1-e1-eqiad. A public IP addresses (208.80.154.226/32 and 2620:0:861:ed1a::3/128) were allocated from the LVS public range and the peering defined as a standard LVS one in automation.

Relevant Config

   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group PyBal 
   type external;
   hold-time 30;
   import LVS_IMPORT;
   family inet {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown 20;
           }
       }
   }
   family inet6 {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown 20;
           }
       }
   }
   export NONE;
   peer-as 64600;
   neighbor 10.64.130.10 {
       description ms-fe1012-test;
   }

IPv4 only

   cmooney@lsw1-e1-eqiad> show bgp neighbor 10.64.130.10           
   
   Peer: 10.64.130.10+41482 AS 64600 Local: 10.64.130.1+179 AS 64810
     Description: ms-fe1012-test
     Group: PyBal                 Routing-Instance: PRODUCTION
     Forwarding routing-instance: PRODUCTION  
     Type: External    State: Established    Flags: <Sync>
     Last State: OpenConfirm   Last Event: RecvKeepAlive
     Last Error: None
     Export: [ NONE ] Import: [ LVS_IMPORT ]
     Options: <Preference HoldTime AddressFamily PeerAS PrefixLimit LocalAS Refresh>
     Options: <GracefulShutdownRcv>
     Address families configured: inet-unicast inet6-unicast
     Holdtime: 30 Preference: 170
     Graceful Shutdown Receiver local-preference: 0
     Local AS: 64810 Local System AS: 64810
     Number of flaps: 5
     Last flap event: RecvNotify
     Error: 'Cease' Sent: 0 Recv: 5
     Peer ID: 208.80.154.226  Local ID: 10.64.130.1       Active Holdtime: 30
     Keepalive Interval: 10         Group index: 10   Peer index: 0    SNMP index: 28    
     I/O Session Thread: bgpio-0 State: Enabled
     BFD: disabled, down
     Local Interface: irb.1031                         
     NLRI for restart configured on peer: inet-unicast inet6-unicast
     NLRI advertised by peer: inet-unicast
     NLRI for this session: inet-unicast
     Peer supports Refresh capability (2)
     Stale routes from peer are kept for: 300
     Peer does not support Restarter functionality
     NLRI that restart is negotiated for: inet-unicast
     NLRI of received end-of-rib markers: inet-unicast
     NLRI of all end-of-rib markers sent: inet-unicast
     Peer does not support LLGR Restarter or Receiver functionality
     Peer supports 4 byte AS extension (peer-as 64600)
     NLRI's for which peer can receive multiple paths: inet-unicast
     Table PRODUCTION.inet.0 Bit: 90001
       RIB State: BGP restart is complete
       RIB State: VPN restart is complete
       Send state: in sync
       Active prefixes:              1
       Received prefixes:            1
       Accepted prefixes:            1
       Suppressed due to damping:    0
       Advertised prefixes:          0
     Last traffic (seconds): Received 2    Sent 2    Checked 252 
     Input messages:  Total 29	Updates 2	Refreshes 0 	Octets 668
     Output messages: Total 28	Updates 0	Refreshes 0 	Octets 536
     Output Queue[8]: 0            (PRODUCTION.inet.0, inet-unicast)
   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.130.10 table PRODUCTION.inet.0 
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.226/32       10.64.130.10         0                  64600 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION source 10.64.130.1 count 100 rapid 208.80.154.226    
   PING 208.80.154.226 (208.80.154.226): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   --- 208.80.154.226 ping statistics ---
   100 packets transmitted, 100 packets received, 0% packet loss
   round-trip min/avg/max/stddev = 0.298/7.792/26.546/2.866 ms


IPv4 carrying IPv4 & IPv6 address families

   cmooney@lsw1-e1-eqiad> show bgp neighbor 10.64.130.10                                                                                  
   
   Peer: 10.64.130.10+41484 AS 64600 Local: 10.64.130.1+179 AS 64810
     Description: ms-fe1012-test
     Group: PyBal                 Routing-Instance: PRODUCTION
     Forwarding routing-instance: PRODUCTION  
     Type: External    State: Established    Flags: <Sync>
     Last State: OpenConfirm   Last Event: RecvKeepAlive
     Last Error: None
     Export: [ NONE ] Import: [ LVS_IMPORT ]
     Options: <Preference HoldTime AddressFamily PeerAS PrefixLimit LocalAS Refresh>
     Options: <GracefulShutdownRcv>
     Address families configured: inet-unicast inet6-unicast
     Holdtime: 30 Preference: 170
     Graceful Shutdown Receiver local-preference: 0
     Local AS: 64810 Local System AS: 64810
     Number of flaps: 6
     Last flap event: RecvNotify
     Error: 'Cease' Sent: 0 Recv: 6
     Peer ID: 208.80.154.226  Local ID: 10.64.130.1       Active Holdtime: 30
     Keepalive Interval: 10         Group index: 10   Peer index: 0    SNMP index: 28    
     I/O Session Thread: bgpio-0 State: Enabled
     BFD: disabled, down
     Local Interface: irb.1031                         
     NLRI for restart configured on peer: inet-unicast inet6-unicast
     NLRI advertised by peer: inet-unicast inet6-unicast
     NLRI for this session: inet-unicast inet6-unicast
     Peer supports Refresh capability (2)
     Stale routes from peer are kept for: 300
     Peer does not support Restarter functionality
     NLRI that restart is negotiated for: inet-unicast inet6-unicast
     NLRI of received end-of-rib markers: inet-unicast inet6-unicast
     NLRI of all end-of-rib markers sent: inet-unicast inet6-unicast
     Peer does not support LLGR Restarter or Receiver functionality
     Peer supports 4 byte AS extension (peer-as 64600)
     NLRI's for which peer can receive multiple paths: inet-unicast inet6-unicast
     Table PRODUCTION.inet.0 Bit: 90001
       RIB State: BGP restart is complete
       RIB State: VPN restart is complete
       Send state: in sync
       Active prefixes:              1
       Received prefixes:            1
       Accepted prefixes:            1
       Suppressed due to damping:    0
       Advertised prefixes:          0
     Table PRODUCTION.inet6.0 Bit: a0001
       RIB State: BGP restart is complete
       RIB State: VPN restart is complete
       Send state: in sync
       Active prefixes:              0
       Received prefixes:            0
       Accepted prefixes:            0
       Suppressed due to damping:    0     
       Advertised prefixes:          0
     Last traffic (seconds): Received 3    Sent 4    Checked 13  
     Input messages:  Total 6	Updates 3 	Refreshes 0 	Octets 253
     Output messages: Total 4	Updates 0 	Refreshes 0 	Octets 91
     Output Queue[8]: 0            (PRODUCTION.inet.0, inet-unicast)
     Output Queue[9]: 0            (PRODUCTION.inet6.0, inet6-unicast)


   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.130.10 table PRODUCTION.inet.0            
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.226/32       10.64.130.10         0                  64600 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION source 10.64.130.1 count 100 rapid 208.80.154.226    
   PING 208.80.154.226 (208.80.154.226): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.130.10 table PRODUCTION.inet6.0           
   
   PRODUCTION.inet6.0: 40 destinations, 44 routes (40 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
     2620:0:861:ed1a::3/128
   *                         2620:0:861:109::10   0                  64600 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION source 2620:0:861:109::1 count 100 rapid 2620:0:861:ed1a::3 
   PING6(56=40+8+8 bytes) 2620:0:861:109::1 --> 2620:0:861:ed1a::3
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


IPv4 & IPv6 each carrying their own address family

These tests were instead carried out from an-worker1147-test to lsw1-e1-eqiad. Two BGP peerings are defined towards the host, one in the 'Kubernetes4' group and the other in 'Kubernetes6'. Peering was within Vlan1039 / analytics1-e1-eqiad for both address types.

Relevant Config

   cmooney@lsw1-e1-eqiad> show configuration interfaces irb.1039 
   description analytics1-e1-eqiad;
   family inet {
       address 10.64.138.1/24;
   }
   family inet6 {
       address 2620:0:861:100::1/64;
   }
   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group Kubernetes4
   type external;
   multihop {
       ttl 2;
   }
   hold-time 30;
   import kubernetes_import;
   family inet {
       unicast {
           prefix-limit {
               maximum 2000;
               teardown 80;
           }
       }
   }
   export NONE;
   peer-as 64601;
   multipath;
   neighbor 10.64.138.11 {
       description an-worker1147-test;
   }
   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group Kubernetes6   
   type external;
   multihop {
       ttl 2;
   }
   hold-time 30;
   import kubernetes_import;
   family inet6 {
       unicast {
           prefix-limit {
               maximum 2000;
               teardown 80;
           }
       }
   }
   export NONE;
   peer-as 64601;
   multipath;
   neighbor 2620:0:861:100::11 {
       description an-worker1147-test;
   }

Results

   cmooney@lsw1-e1-eqiad> show bgp neighbor 10.64.138.11  
   
   Peer: 10.64.138.11+60646 AS 64601 Local: 10.64.138.1+179 AS 64810
     Description: an-worker1147-test
     Group: Kubernetes4           Routing-Instance: PRODUCTION
     Forwarding routing-instance: PRODUCTION  
     Type: External    State: Established    Flags: <Sync>
     Last State: OpenConfirm   Last Event: Refresh
     Last Error: None
     Export: [ NONE ] Import: [ kubernetes_import ]
     Options: <Multihop Preference HoldTime Ttl AddressFamily PeerAS Multipath PrefixLimit LocalAS Refresh>
     Options: <GracefulShutdownRcv>
     Address families configured: inet-unicast
     Holdtime: 30 Preference: 170
     Graceful Shutdown Receiver local-preference: 0
     Local AS: 64810 Local System AS: 64810
     Number of flaps: 0
     Peer ID: 208.80.154.228  Local ID: 10.64.130.1       Active Holdtime: 30
     Keepalive Interval: 10         Group index: 12   Peer index: 0    SNMP index: 29    
     I/O Session Thread: bgpio-0 State: Enabled
     BFD: disabled, down
     NLRI for restart configured on peer: inet-unicast
     NLRI advertised by peer: inet-unicast
     NLRI for this session: inet-unicast
     Peer supports Refresh capability (2)
     Stale routes from peer are kept for: 300
     Peer does not support Restarter functionality
     NLRI that restart is negotiated for: inet-unicast
     NLRI of received end-of-rib markers: inet-unicast
     NLRI of all end-of-rib markers sent: inet-unicast
     Peer does not support LLGR Restarter or Receiver functionality
     Peer supports 4 byte AS extension (peer-as 64601)
     NLRI's for which peer can receive multiple paths: inet-unicast
     Table PRODUCTION.inet.0 Bit: 90002
       RIB State: BGP restart is complete
       RIB State: VPN restart is complete
       Send state: in sync
       Active prefixes:              0
       Received prefixes:            1
       Accepted prefixes:            0
       Suppressed due to damping:    0
       Advertised prefixes:          0
     Last traffic (seconds): Received 5    Sent 6    Checked 615 
     Input messages:  Total 69	Updates 5	Refreshes 1 	Octets 1520
     Output messages: Total 69	Updates 0	Refreshes 0 	Octets 1315
     Output Queue[8]: 0            (PRODUCTION.inet.0, inet-unicast)
   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.138.11 table PRODUCTION.inet.0                  
   
   PRODUCTION.inet.0: 35 destinations, 39 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
     208.80.154.228/32       10.64.138.11         0                  64601 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION source 10.64.138.1 count 100 rapid 208.80.154.228     
   PING 208.80.154.228 (208.80.154.228): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


   cmooney@lsw1-e1-eqiad> show bgp neighbor 2620:0:861:100::11 
   
   Peer: 2620:0:861:100::11+41120 AS 64601 Local: 2620:0:861:100::1+179 AS 64810
     Description: an-worker1147-test
     Group: Kubernetes6           Routing-Instance: PRODUCTION
     Forwarding routing-instance: PRODUCTION  
     Type: External    State: Established    Flags: <Sync>
     Last State: OpenConfirm   Last Event: Refresh
     Last Error: Open Message Error
     Export: [ NONE ] Import: [ kubernetes_import ]
     Options: <Multihop Preference HoldTime Ttl AddressFamily PeerAS Multipath PrefixLimit LocalAS Refresh>
     Options: <GracefulShutdownRcv>
     Address families configured: inet6-unicast
     Holdtime: 30 Preference: 170
     Graceful Shutdown Receiver local-preference: 0
     Local AS: 64810 Local System AS: 64810
     Number of flaps: 0
     Error: 'Open Message Error' Sent: 4 Recv: 0
     Peer ID: 208.80.154.228  Local ID: 10.64.130.1       Active Holdtime: 30
     Keepalive Interval: 10         Group index: 13   Peer index: 0    SNMP index: 30    
     I/O Session Thread: bgpio-0 State: Enabled
     BFD: disabled, down
     NLRI for restart configured on peer: inet6-unicast
     NLRI advertised by peer: inet6-unicast
     NLRI for this session: inet6-unicast
     Peer supports Refresh capability (2)
     Stale routes from peer are kept for: 300
     Peer does not support Restarter functionality
     NLRI that restart is negotiated for: inet6-unicast
     NLRI of received end-of-rib markers: inet6-unicast
     NLRI of all end-of-rib markers sent: inet6-unicast
     Peer does not support LLGR Restarter or Receiver functionality
     Peer supports 4 byte AS extension (peer-as 64601)
     NLRI's for which peer can receive multiple paths: inet6-unicast
     Table PRODUCTION.inet6.0 Bit: a0002
       RIB State: BGP restart is complete
       RIB State: VPN restart is complete
       Send state: in sync
       Active prefixes:              0
       Received prefixes:            1
       Accepted prefixes:            0
       Suppressed due to damping:    0
       Advertised prefixes:          0
     Last traffic (seconds): Received 8    Sent 1    Checked 348 
     Input messages:  Total 43	Updates 6	Refreshes 1 	Octets 1256
     Output messages: Total 39	Updates 0	Refreshes 0 	Octets 752
     Output Queue[9]: 0            (PRODUCTION.inet6.0, inet6-unicast)
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 receive-protocol bgp 2620:0:861:100::11                                                                                                               
   
   PRODUCTION.inet6.0: 41 destinations, 45 routes (41 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
     2620:0:861:ed1a::4/128
   *                         2620:0:861:100::11   0                  64601 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION source 2620:0:861:100::1 count 100 rapid 2620:0:861:ed1a::4   
   PING6(56=40+8+8 bytes) 2620:0:861:100::1 --> 2620:0:861:ed1a::4
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

BGP Peering on Vlan segment from Anycast GW IP

BGP peering for these tests was carried out from an-worker1147-test, connected to lsw1-e1-eqiad. Vlan private1-f1-eqiad was extended to this switch, and an Anycast GW configuration added to the IRB interface to support this.

Relevant Config

   cmooney@lsw1-e1-eqiad> show configuration interfaces irb.1035    
   virtual-gateway-accept-data;
   description private1-f1-eqiad;
   family inet {
       address 10.64.134.253/24 {
           preferred;
           virtual-gateway-address 10.64.134.1;
       }
   }
   family inet6 {
       address 2620:0:861:10d::253/64 {
           preferred;
           virtual-gateway-address 2620:0:861:10d::1;
       }
   }
   virtual-gateway-v4-mac 00:00:5e:11:fa:ce;
   virtual-gateway-v6-mac 00:00:5e:11:fa:ce;


IPv4 only

Peering was done by adding the IP address of an-worker1147-test on Vlan1135 to the "Anycast4" BGP group:

   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group Anycast4 
   type external;
   multihop {
       ttl 193;
   }
   damping;
   import ANYCAST_IMPORT;
   family inet {
       unicast {
           prefix-limit {
               maximum 50;
               teardown 80;
           }
       }
   }
   export NONE;
   peer-as 64605;
   multipath;
   bfd-liveness-detection {
       minimum-interval 300;
   }
   neighbor 10.64.134.72 {
       description an-worker1147-test;
   }

Connection was established from the an-worker1147-test IP on Vlan1135, to Anycast IP 10.64.134.1 (using FRR BGP stack), public IP was announced:

   an-worker1147-test# show bgp ipv4 unicast neighbors 10.64.134.1 advertised-routes 
   BGP table version is 4, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64605
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 208.80.154.228/32
                       0.0.0.0                  0         32768 ?


Route is accepted by the top of rack and public IP is pingable:

   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.134.72 table PRODUCTION.inet.0                  
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.228/32       10.64.134.72         0                  64605 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION rapid count 100 source 10.64.134.1 208.80.154.228
   PING 208.80.154.228 (208.80.154.228): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -z -b -w -c 10 208.80.154.228
   Start: 2022-02-14T17:18:04+0000
   HOST: nbgw                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6830   176.61.34.1                                                       0.0%    10   14.5  15.0  11.9  20.5   2.8
     2. AS6830   109.255.255.254                                                   0.0%    10   11.3  16.4   7.6  55.0  13.8
     3. AS6830   ie-dub01a-rc1-ae-31-0.aorta.net (84.116.238.42)                   0.0%    10   10.1  15.9  10.1  30.4   6.0
     4. AS6830   ie-dub02a-ri1-ae-73-0.aorta.net (84.116.134.110)                  0.0%    10   10.6  15.4   9.0  45.5  10.8
     5. AS1299   dln-b2-link.ip.twelve99.net (62.115.172.136)                      0.0%    10   19.1  13.3   7.7  20.4   4.2
     6. AS1299   ldn-bb1-link.ip.twelve99.net (62.115.120.100)                     0.0%    10   20.3  21.5  15.7  26.3   3.0
     7. AS1299   nyk-bb2-link.ip.twelve99.net (62.115.113.20)                      0.0%    10   94.7  97.3  94.4 100.2   2.4
     8. AS1299   ash-bb2-link.ip.twelve99.net (62.115.136.201)                     0.0%    10   99.2 101.2  97.1 105.5   3.1
     9. AS1299   ash-b1-link.ip.twelve99.net (62.115.143.121)                      0.0%    10  101.7 103.3 100.6 108.5   2.5
    10. AS1299   wikimedia-ic308845-ash-b1.ip.twelve99-cust.net (80.239.132.226)   0.0%    10  101.4  99.7  95.6 103.2   2.4
    11. AS14907  ae0.cr1-eqiad.wikimedia.org (208.80.154.193)                      0.0%    10  105.5 103.1  97.0 110.5   5.0
    12. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    13. AS14907  208.80.154.228                                                    0.0%    10  101.4  99.4  94.8 104.6   3.0


IPv4 carrying IPv4 & IPv6 address families

For this test the IP of an-worker1147-test on Vlan1035 was placed into the 'PyBal' group in the overlay BGP config, which means the switch will announce the capability for IPv4 and IPv6 address families over the single IPv4 peering.

   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group PyBal 
   type external;
   hold-time 30;
   import LVS_IMPORT;
   family inet {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown 20;
           }
       }
   }
   family inet6 {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown 20;
           }
       }
   }
   export NONE;
   peer-as 64600;
   neighbor 10.64.134.72 {
       description an-worker1147-test;
   }


Host was configured to send both types of route over the session:

   an-worker1147-test# show bgp ipv4 neighbors 10.64.134.1 advertised-routes 
   BGP table version is 4, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64600
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 208.80.154.228/32
                       0.0.0.0                  0         32768 ?
   an-worker1147-test# show bgp ipv6 neighbors 10.64.134.1 advertised-routes 
   BGP table version is 3, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64600
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 2620:0:861:ed1a::4/128
                       ::                       0         32768 ?


IPv4 received and works:

   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.134.72 table PRODUCTION.inet.0    
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.228/32       10.64.134.72         0                  64600 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 208.80.154.228 rapid count 100 source 10.64.134.253              
   PING 208.80.154.228 (208.80.154.228): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -z -b -w -c 10 208.80.154.228
   Start: 2022-02-14T17:42:02+0000
   HOST: nbgw                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6830   176.61.34.1                                                       0.0%    10   13.5  16.0  10.3  26.0   4.8
     2. AS6830   109.255.255.254                                                   0.0%    10   17.2  13.3   9.8  17.2   2.4
     3. AS6830   ie-dub01a-rc1-ae-31-0.aorta.net (84.116.238.42)                   0.0%    10   11.6  12.3   8.9  17.1   2.9
     4. AS6830   ie-dub02a-ri1-ae-73-0.aorta.net (84.116.134.110)                  0.0%    10   28.3  13.4   9.0  28.3   5.7
     5. AS1299   dln-b2-link.ip.twelve99.net (62.115.172.136)                      0.0%    10   13.2  11.8   9.1  14.3   1.8
     6. AS1299   ldn-bb1-link.ip.twelve99.net (62.115.120.100)                     0.0%    10   22.4  23.3  19.0  33.0   3.7
     7. AS1299   nyk-bb2-link.ip.twelve99.net (62.115.113.20)                      0.0%    10   96.4  97.9  92.7 105.9   4.0
     8. AS1299   ash-bb2-link.ip.twelve99.net (62.115.136.201)                     0.0%    10  100.2 102.6  93.4 125.7   9.2
     9. AS1299   ash-b1-link.ip.twelve99.net (62.115.143.121)                      0.0%    10   98.5 102.6  97.1 106.7   3.1
    10. AS1299   wikimedia-ic308845-ash-b1.ip.twelve99-cust.net (80.239.132.226)   0.0%    10  100.0 101.5  98.3 105.5   2.8
    11. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    12. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    13. AS14907  208.80.154.228                                                    0.0%    10   98.0  98.5  94.9 106.3   3.5


IPv6 also received and works:

   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.134.72 table PRODUCTION.inet6.0   
   
   PRODUCTION.inet6.0: 41 destinations, 45 routes (41 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
     2620:0:861:ed1a::4/128
   *                         2620:0:861:10d::72   0                  64600 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 2620:0:861:ed1a::4 rapid count 100 source 2620:0:861:10d::253 
   PING6(56=40+8+8 bytes) 2620:0:861:10d::253 --> 2620:0:861:ed1a::4
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -b -w -z -c 10 2620:0:861:ed1a::4
   Start: 2022-02-14T17:39:50+0000
   HOST: nbgw                                                                        Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6939   tunnel650354.tunnel.tserv5.lon1.ipv6.he.net (2001:470:1f08:32c::1)  10.0%    10   23.9  28.5  22.1  43.7   7.8
     2. AS6939   e0-19.core2.lon2.he.net (2001:470:0:67::1)                           0.0%    10   35.8  41.3  23.8 110.6  28.7
     3. AS6939   100ge11-2.core1.lon2.he.net (2001:470:0:541::1)                      0.0%    10   24.7  29.0  20.1  61.5  14.2
     4. AS6939   100ge4-1.core1.nyc4.he.net (2001:470:0:2cf::2)                       0.0%    10  103.7  88.7  81.4 103.7   6.1
     5. AS6939   100ge11-1.core1.nyc5.he.net (2001:470:0:20a::2)                      0.0%    10   90.4  93.2  84.8 138.5  16.2
     6. AS???    ???                                                                 100.0    10    0.0   0.0   0.0   0.0   0.0
     7. AS6939   100ge1-2.core1.ash1.he.net (2001:470:0:277::1)                       0.0%    10  108.1 114.2  94.7 188.9  26.9
     8. AS6939   xe-5-3-3-500.cr1-eqiad.wikimedia.org (2001:470:0:1c0::2)             0.0%    10   96.3 101.0  90.3 130.5  14.0
     9. AS14907  et-0-0-48-100.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:fe07::2)        10.0%    10   94.6 101.7  92.6 155.8  20.3
    10. AS14907  2620:0:861:ed1a::4                                                   0.0%    10  110.5 103.3  89.6 140.9  16.9


IPv4 & IPv6 each carrying their own address family

For the final test separate BGP peering sessions to an-worker1147-test were established from lsw1-e1-eqiad, over both IPv4 and IPv6, exchanging just the routes for a single address family over each.

   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group Kubernetes4  
   type external;
   multihop {
       ttl 2;
   }
   hold-time 30;
   import kubernetes_import;
   family inet {
       unicast {
           prefix-limit {
               maximum 2000;
               teardown 80;
           }
       }
   }
   export NONE;
   peer-as 64601;
   multipath;
   neighbor 10.64.134.72 {
       description an-worker1147-test;
   }
   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group Kubernetes6    
   type external;
   multihop {
       ttl 2;
   }
   hold-time 30;
   import kubernetes_import;
   family inet6 {
       unicast {
           prefix-limit {
               maximum 2000;
               teardown 80;
           }
       }
   }
   export NONE;
   peer-as 64601;
   multipath;
   neighbor 2620:0:861:10d::72 {
       description an-worker1147-test;
   }

IPv4 results:

   an-worker1147-test# show bgp ipv4 neighbors 10.64.134.1 advertised-routes 
   BGP table version is 4, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64601
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 208.80.154.228/32
                       0.0.0.0                  0         32768 ?
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 receive-protocol bgp 10.64.134.72 
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.228/32       10.64.134.72         0                  64601 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 208.80.154.228 rapid count 100 source 10.64.134.253    
   PING 208.80.154.228 (208.80.154.228): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -z -b -w -c 10 208.80.154.228
   Start: 2022-02-14T18:10:25+0000
   HOST: nbgw                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6830   176.61.34.1                                                       0.0%    10   18.3  15.4  10.1  20.6   3.8
     2. AS6830   109.255.255.254                                                   0.0%    10   16.0  18.2   8.5  50.2  12.2
     3. AS6830   ie-dub01a-rc1-ae-31-0.aorta.net (84.116.238.42)                   0.0%    10   15.2  19.1  10.4  51.6  12.3
     4. AS6830   ie-dub02a-ri1-ae-73-0.aorta.net (84.116.134.110)                  0.0%    10   12.9  17.8   8.5  53.9  13.0
     5. AS1299   dln-b2-link.ip.twelve99.net (62.115.172.136)                      0.0%    10   15.9  14.6   9.5  19.5   3.0
     6. AS1299   ldn-bb1-link.ip.twelve99.net (62.115.120.100)                     0.0%    10   23.3  28.3  20.1  53.7  10.0
     7. AS1299   nyk-bb2-link.ip.twelve99.net (62.115.113.20)                      0.0%    10  106.9  98.9  94.2 106.9   4.2
     8. AS1299   ash-bb2-link.ip.twelve99.net (62.115.136.201)                     0.0%    10  112.7 105.3  98.4 120.2   6.9
     9. AS1299   ash-b1-link.ip.twelve99.net (62.115.143.121)                      0.0%    10  104.5 106.9 101.0 125.8   7.6
    10. AS1299   wikimedia-ic308845-ash-b1.ip.twelve99-cust.net (80.239.132.226)   0.0%    10  111.7 106.2  96.1 112.7   6.0
    11. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    12. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    13. AS14907  208.80.154.228                                                    0.0%    10  104.1 106.2  93.8 125.3  11.0


IPv6 routes:

   an-worker1147-test# show bgp ipv6 neighbors 2620:0:861:10d::1 advertised-routes 
   BGP table version is 3, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64601
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 2620:0:861:ed1a::4/128
                       ::                       0         32768 ?
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 receive-protocol bgp 2620:0:861:10d::72   
   
   PRODUCTION.inet6.0: 41 destinations, 45 routes (41 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
     2620:0:861:ed1a::4/128
   *                         2620:0:861:10d::72   0                  64601 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 2620:0:861:ed1a::4 rapid count 100 source 2620:0:861:10d::253    
   PING6(56=40+8+8 bytes) 2620:0:861:10d::253 --> 2620:0:861:ed1a::4
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -b -w -z -c 10 2620:0:861:ed1a::4
   Start: 2022-02-14T18:11:17+0000
   HOST: nbgw                                                                        Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6939   tunnel650354.tunnel.tserv5.lon1.ipv6.he.net (2001:470:1f08:32c::1)  10.0%    10   25.2  30.0  22.3  41.5   7.8
     2. AS6939   e0-19.core2.lon2.he.net (2001:470:0:67::1)                          10.0%    10  124.9  43.1  23.1 124.9  32.7
     3. AS6939   100ge11-2.core1.lon2.he.net (2001:470:0:541::1)                      0.0%    10   36.2  31.4  20.5  59.4  11.4
     4. AS6939   100ge4-1.core1.nyc4.he.net (2001:470:0:2cf::2)                       0.0%    10   92.6  97.4  85.0 147.7  18.6
     5. AS6939   100ge11-1.core1.nyc5.he.net (2001:470:0:20a::2)                      0.0%    10   93.9  98.6  85.1 142.3  16.6
     6. AS???    ???                                                                 100.0    10    0.0   0.0   0.0   0.0   0.0
     7. AS6939   100ge1-2.core1.ash1.he.net (2001:470:0:277::1)                       0.0%    10   94.1 109.7  93.2 193.9  30.8
     8. AS6939   xe-5-3-3-500.cr1-eqiad.wikimedia.org (2001:470:0:1c0::2)             0.0%    10  116.2 100.1  90.1 130.0  12.8
     9. AS14907  et-0-0-48-100.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:fe07::2)         0.0%    10  109.6 101.6  92.6 126.9  10.4
    10. AS14907  2620:0:861:ed1a::4                                                   0.0%    10   93.6 100.2  84.7 134.9  15.9
   

NOTE: This proves we can peer from the Anycast IP, but the switches will normally try to form the adjacency from their individual IP on the subnet. We can control this with "update source" on the switch if we ever need to, for the purpose of this test the only requirement was to confirm it will work if we need to do it.

BGP Peering on Vlan segment with Anycast GW, from Unicast IP

The purpose of this test is to validate that the QFX will allow a BGP adjacency to form to a device, connected via an access or trunk port, to its unique IRB interface IP on an int that has an anycast GW configured also.

Typically we want to peer directly with the GW. But if we ever run MC-LAG between two devices this would be problematic, two switches would be sharing this IP. Adjacency would form, but it would be random and only to a single device. So if we ever have that scenario we instead want to peer to each switch separately, using their unique IPs on that subnet rather than the shared VIP.

Relevant Config

The 3 scenarios are identical to in the previous set of tests (peering to the anycast VIP). Zero changes have been made on the QFX side, just the IP address configured as BGP peer has been adjusted on the test hosts in each case.

IPv4 Only

   an-worker1147-test# show bgp ipv4 unicast neighbors 10.64.134.253 advertised-routes 
   BGP table version is 4, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64605
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 208.80.154.228/32
                       0.0.0.0                  0         32768 ?


Route is accepted by the top of rack and public IP is pingable:

   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.134.72 table PRODUCTION.inet.0                  
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.228/32       10.64.134.72         0                  64605 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION rapid count 100 source 10.64.134.1 208.80.154.228
   PING 208.80.154.228 (208.80.154.228): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -z -b -w -c 10 208.80.154.228
   Start: 2022-02-15T17:31:07+0000
   HOST: nbgw                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6830   176.61.34.1                                                       0.0%    10   14.2  15.0  11.2  20.5   2.8
     2. AS6830   109.255.255.254                                                   0.0%    10   11.1  16.4   7.6  55.0  13.8
     3. AS6830   ie-dub01a-rc1-ae-31-0.aorta.net (84.116.238.42)                   0.0%    10   10.1  15.9  10.1  30.4   6.0
     4. AS6830   ie-dub02a-ri1-ae-73-0.aorta.net (84.116.134.110)                  0.0%    10   10.6  15.4   9.0  45.5  10.8
     5. AS1299   dln-b2-link.ip.twelve99.net (62.115.172.136)                      0.0%    10   19.0  13.3   7.7  20.4   4.2
     6. AS1299   ldn-bb1-link.ip.twelve99.net (62.115.120.100)                     0.0%    10   20.3  21.5  15.7  26.3   3.0
     7. AS1299   nyk-bb2-link.ip.twelve99.net (62.115.113.20)                      0.0%    10   94.1  97.3  94.4 100.2   2.4
     8. AS1299   ash-bb2-link.ip.twelve99.net (62.115.136.201)                     0.0%    10   99.2 101.2  97.1 105.5   3.1
     9. AS1299   ash-b1-link.ip.twelve99.net (62.115.143.121)                      0.0%    10  101.7 103.3 100.6 108.5   2.5
    10. AS1299   wikimedia-ic308845-ash-b1.ip.twelve99-cust.net (80.239.132.226)   0.0%    10  101.4  99.7  95.6 103.2   2.4
    11. AS14907  ae0.cr1-eqiad.wikimedia.org (208.80.154.193)                      0.0%    10  105.5 103.1  97.0 110.5   5.0
    12. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    13. AS14907  208.80.154.228                                                    0.0%    10  101.4  99.4  94.8 104.6   3.0


IPv4 carrying IPv4 & IPv6 address families

Host was configured to send both types of route over the session:

   an-worker1147-test# show bgp ipv4 neighbors 10.64.134.253 advertised-routes 
   BGP table version is 4, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64600
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 208.80.154.228/32
                       0.0.0.0                  0         32768 ?
   an-worker1147-test# show bgp ipv6 neighbors 10.64.134.253 advertised-routes 
   BGP table version is 3, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64600
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 2620:0:861:ed1a::4/128
                       ::                       0         32768 ?


IPv4 received and works:

   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.134.72 table PRODUCTION.inet.0    
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.228/32       10.64.134.72         0                  64600 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 208.80.154.228 rapid count 100 source 10.64.134.253              
   PING 208.80.154.228 (208.80.154.228): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -z -b -w -c 10 208.80.154.228
   Start: 2022-02-14T18:05:01+0000
   HOST: nbgw                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6830   176.61.34.1                                                       0.0%    10   12.5  15.0  10.3  26.0   4.8
     2. AS6830   109.255.255.254                                                   0.0%    10   17.2  13.3   9.8  17.2   2.4
     3. AS6830   ie-dub01a-rc1-ae-31-0.aorta.net (84.116.238.42)                   0.0%    10   11.6  12.3   8.9  17.1   2.9
     4. AS6830   ie-dub02a-ri1-ae-73-0.aorta.net (84.116.134.110)                  0.0%    10   28.3  13.4   9.0  28.3   5.7
     5. AS1299   dln-b2-link.ip.twelve99.net (62.115.172.136)                      0.0%    10   13.2  11.8   9.1  14.3   1.8
     6. AS1299   ldn-bb1-link.ip.twelve99.net (62.115.120.100)                     0.0%    10   22.4  23.3  19.0  33.0   3.7
     7. AS1299   nyk-bb2-link.ip.twelve99.net (62.115.113.20)                      0.0%    10   96.4  97.8  92.7 105.9   4.0
     8. AS1299   ash-bb2-link.ip.twelve99.net (62.115.136.201)                     0.0%    10  100.2 102.6  93.4 125.7   9.2
     9. AS1299   ash-b1-link.ip.twelve99.net (62.115.143.121)                      0.0%    10   98.5 102.6  97.1 106.7   3.1
    10. AS1299   wikimedia-ic308845-ash-b1.ip.twelve99-cust.net (80.239.132.226)   0.0%    10  100.0 101.5  98.3 105.5   2.8
    11. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    12. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    13. AS14907  208.80.154.228                                                    0.0%    10   98.0  98.5  94.9 106.3   3.5


IPv6 also received and works:

   cmooney@lsw1-e1-eqiad> show route receive-protocol bgp 10.64.134.72 table PRODUCTION.inet6.0   
   
   PRODUCTION.inet6.0: 41 destinations, 45 routes (41 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
     2620:0:861:ed1a::4/128
   *                         2620:0:861:10d::72   0                  64600 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION 2620:0:861:ed1a::4 rapid count 100 source 2620:0:861:10d::253 
   PING6(56=40+8+8 bytes) 2620:0:861:10d::253 --> 2620:0:861:ed1a::4
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -b -w -z -c 10 2620:0:861:ed1a::4
   Start: 2022-02-14T17:39:50+0000
   HOST: nbgw                                                                        Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6939   tunnel650354.tunnel.tserv5.lon1.ipv6.he.net (2001:470:1f08:32c::1)  10.0%    10   23.9  28.5  22.1  43.7   7.8
     2. AS6939   e0-19.core2.lon2.he.net (2001:470:0:67::1)                           0.0%    10   35.8  41.3  23.8 110.6  28.7
     3. AS6939   100ge11-2.core1.lon2.he.net (2001:470:0:541::1)                      0.0%    10   24.7  29.0  20.1  61.5  14.2
     4. AS6939   100ge4-1.core1.nyc4.he.net (2001:470:0:2cf::2)                       0.0%    10  103.7  88.7  81.4 103.7   6.1
     5. AS6939   100ge11-1.core1.nyc5.he.net (2001:470:0:20a::2)                      0.0%    10   90.4  93.2  84.8 138.5  16.2
     6. AS???    ???                                                                 100.0    10    0.0   0.0   0.0   0.0   0.0
     7. AS6939   100ge1-2.core1.ash1.he.net (2001:470:0:277::1)                       0.0%    10  108.1 114.2  94.7 188.9  26.9
     8. AS6939   xe-5-3-3-500.cr1-eqiad.wikimedia.org (2001:470:0:1c0::2)             0.0%    10   96.3 101.0  90.3 130.5  14.0
     9. AS14907  et-0-0-48-100.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:fe07::2)        10.0%    10   94.6 101.7  92.6 155.8  20.3
    10. AS14907  2620:0:861:ed1a::4                                                   0.0%    10  110.5 103.3  89.6 140.9  16.9


IPv4 & IPv6 each carrying their own address family

IPv4 session up and route working:

   an-worker1147-test# show bgp ipv4 neighbors 10.64.134.253 advertised-routes 
   BGP table version is 4, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64601
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 208.80.154.228/32
                       0.0.0.0                  0         32768 ?
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 receive-protocol bgp 10.64.134.72                                                                                                                      
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.228/32       10.64.134.72         0                  64601 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION source 10.64.134.1 rapid count 100 208.80.154.228     
   PING 208.80.154.228 (208.80.154.228): 56 data bytes
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -z -b -w -c 10 208.80.154.228
   Start: 2022-02-15T17:19:44+0000
   HOST: nbgw                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6830   176.61.34.1                                                       0.0%    10   12.7  12.5   8.4  19.3   3.5
     2. AS6830   109.255.255.254                                                   0.0%    10   11.5  15.2   9.5  32.9   6.9
     3. AS6830   ie-dub01a-rc1-ae-31-0.aorta.net (84.116.238.42)                   0.0%    10   13.5  14.6  10.6  20.4   3.1
     4. AS6830   ie-dub02a-ri1-ae-73-0.aorta.net (84.116.134.110)                  0.0%    10   13.6  13.8   8.8  18.2   3.1
     5. AS1299   dln-b2-link.ip.twelve99.net (62.115.172.136)                      0.0%    10   11.3  13.7  10.3  17.7   2.6
     6. AS1299   ldn-bb1-link.ip.twelve99.net (62.115.120.100)                     0.0%    10   21.3  24.5  19.2  31.8   4.0
     7. AS1299   nyk-bb2-link.ip.twelve99.net (62.115.113.20)                      0.0%    10   99.5  96.3  92.8 101.2   2.8
     8. AS1299   ash-bb2-link.ip.twelve99.net (62.115.136.201)                     0.0%    10  100.0 100.6  97.6 106.0   2.7
     9. AS1299   ash-b1-link.ip.twelve99.net (62.115.143.121)                      0.0%    10  100.8 104.7 100.8 119.1   5.5
    10. AS1299   wikimedia-ic308845-ash-b1.ip.twelve99-cust.net (80.239.132.226)   0.0%    10   99.8 102.0  96.5 109.1   3.9
    11. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    12. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
    13. AS14907  208.80.154.228                                                    0.0%    10  104.1  98.3  94.5 104.1   3.1


IPv6 route ok also:

   an-worker1147-test# show bgp ipv6 neighbors 2620:0:861:10d::253 advertised-routes 
   BGP table version is 3, local router ID is 208.80.154.228, vrf id 0
   Default local pref 100, local AS 64601
   Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                  i internal, r RIB-failure, S Stale, R Removed
   Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
   Origin codes:  i - IGP, e - EGP, ? - incomplete
   RPKI validation codes: V valid, I invalid, N Not found
   
      Network          Next Hop            Metric LocPrf Weight Path
   *> 2620:0:861:ed1a::4/128
                       ::                       0         32768 ?
   
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 receive-protocol bgp 2620:0:861:10d::72 
   
   PRODUCTION.inet6.0: 41 destinations, 45 routes (41 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
     2620:0:861:ed1a::4/128
   *                         2620:0:861:10d::72   0                  64601 ?
   cmooney@lsw1-e1-eqiad> ping routing-instance PRODUCTION source 2620:0:861:10d::1 rapid count 100 2620:0:861:10d::72 
   PING6(56=40+8+8 bytes) 2620:0:861:10d::1 --> 2620:0:861:10d::72
   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   cathal@nbgw:~$ mtr -b -w -z -c 10 2620:0:861:ed1a::4
   Start: 2022-02-15T17:22:42+0000
   HOST: nbgw                                                                        Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS6939   tunnel650354.tunnel.tserv5.lon1.ipv6.he.net (2001:470:1f08:32c::1)   0.0%    10   32.2  31.5  22.6  61.3  11.3
     2. AS6939   e0-19.core2.lon2.he.net (2001:470:0:67::1)                           0.0%    10   28.8  29.7  23.7  45.3   6.7
     3. AS6939   100ge11-2.core1.lon2.he.net (2001:470:0:541::1)                      0.0%    10   25.8  39.4  19.9  93.0  25.3
     4. AS6939   100ge4-1.core1.nyc4.he.net (2001:470:0:2cf::2)                      10.0%    10   86.2  94.6  86.2 140.8  17.5
     5. AS6939   100ge11-1.core1.nyc5.he.net (2001:470:0:20a::2)                      0.0%    10   92.0  97.1  86.9 127.6  15.7
     6. AS???    ???                                                                 100.0    10    0.0   0.0   0.0   0.0   0.0
     7. AS6939   100ge1-2.core1.ash1.he.net (2001:470:0:277::1)                      10.0%    10   95.5  99.1  90.0 116.7   9.4
     8. AS6939   xe-5-3-3-500.cr1-eqiad.wikimedia.org (2001:470:0:1c0::2)             0.0%    10   94.9  99.0  89.4 135.3  13.5
     9. AS14907  et-0-0-48-100.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:fe07::2)         0.0%    10   93.3 104.5  92.2 145.2  19.3
    10. AS14907  2620:0:861:ed1a::4                                                   0.0%    10   98.0  98.6  89.9 126.4  12.3

BGP Re-establishment if device peered to Anycast GW moved to new switch (i.e. VM live motion)

This will need to be re-visited when we have a Ganeti cluster in the new cage. Without that it could not be tested.

The idea of this test is to confirm that a VM, with a BGP peering to its default gateway, will restore the BGP session in reasonable time if it's migrated to a new hypervisor host. As the new hypervisor will be connected to a different switch, the VM will get TCP RST back from the new top-of-rack when it sends TCP packets relating to the its BGP connection to the old switch. At this point it should re-initiate the session with the new top-of-rack and traffic should begin to flow again.

This was tested with vQFX lab devices and interruptions were less than 10 seconds, so we can be confident there won't be any issue.

eBGP Peering in VRF to external device

This test simulates doing an eBGP peering from the switch fabric overlay over a normal routed link. In production an example would be the connections from Spine switch to core routers. Assumption is it works similar to peering to a device on a connected Vlan from an IRB, but need to validate.

Relevant Config

Config in the test was to our CR routers in Eqiad. Separate peerings were established over IPv4 and IPv6 to exchange routes of each type:

   cmooney@lsw1-e1-eqiad> show configuration interfaces et-0/0/48 
   description "Core: cr1-eqiad:et-1/0/2";
   vlan-tagging;
   mtu 9192;
   encapsulation flexible-ethernet-services;
   unit 100 {
       description "cr1-eqiad et-1/0/2.100 - Production VRF";
       vlan-id 100;
       family inet {
           address 10.66.0.9/31;
       }
       family inet6 {
           address 2620:0:861:fe07::2/64;
       }
   }
   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group EXTERNAL4  
   type external;
   hold-time 30;
   import DEFAULT4;
   family inet {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown;
           }
       }
   }
   export EXT_OUT4;
   peer-as 14907;
   neighbor 10.66.0.8 {
       description cr1-eqiad;
   }
   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group EXTERNAL6    
   type external;
   hold-time 30;
   import DEFAULT6;
   family inet6 {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown;
           }
       }
   }
   export EXT_OUT6;
   peer-as 14907;
   neighbor 2620:0:861:fe07::1 {
       description cr1-eqiad;
   }

IPv4

IPv4 peering works and routes exchanged:

   cmooney@lsw1-e1-eqiad> show bgp summary | match 14907 
   10.66.0.8             14907     106565     112060       0       6 1w4d 18:48:37 Establ
   2620:0:861:fe07::1       14907     106564     112039       0       4 1w4d 18:48:26 Establ
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 receive-protocol bgp 10.66.0.8 
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 0.0.0.0/0               10.66.0.8                               14907 I
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 advertising-protocol bgp 10.66.0.8 
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 10.64.130.0/24          Self                                    I
   * 10.64.131.0/24          Self                                    I
   * 10.64.132.0/24          Self                                    I
   * 10.64.134.0/24          Self                                    I
   * 10.64.138.0/24          Self                                    I
   * 10.64.142.0/24          Self                                    I
   * 208.80.154.226/32       Self                                    64600 ?
   * 208.80.154.228/32       Self                                    64601 ?
   

IPv6

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 receive-protocol bgp 2620:0:861:fe07::1    
   
   PRODUCTION.inet6.0: 41 destinations, 45 routes (41 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * ::/0                    2620:0:861:fe07::1                      14907 I
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 advertising-protocol bgp 2620:0:861:fe07::1 
   
   PRODUCTION.inet6.0: 41 destinations, 45 routes (41 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 2620:0:861:100::/64     Self                                    I
   * 2620:0:861:109::/64     Self                                    I
   * 2620:0:861:10a::/64     Self                                    I
   * 2620:0:861:10b::/64     Self                                    I
   * 2620:0:861:10d::/64     Self                                    I
   * 2620:0:861:114::/64     Self                                    I
     2620:0:861:ed1a::3/128
   *                         Self                                    64600 ?
     2620:0:861:ed1a::4/128
   *                         Self                                    64601 ?
     2620:0:861:fe07::/64
   *                         Self                                    I
     2620:0:861:fe08::/64
   *                         Self                                    I

BGP Route propagaton from unicast peer into EVPN and into remote VRF table

This test validates routes learnt from unicast BGP peers in the overlay are properly propagated as an EVPN type 5 routes between devices in the switch fabric, and pushed to the correct routing table/VRF on remote devices based on the parameters in the EVPN route.

Routes learnt on Spine1:

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 receive-protocol bgp 10.66.0.8 
   
   PRODUCTION.inet.0: 35 destinations, 38 routes (35 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 0.0.0.0/0               10.66.0.8                               14907 I
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 receive-protocol bgp 2620:0:861:fe07::1    
   
   PRODUCTION.inet6.0: 41 destinations, 45 routes (41 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * ::/0                    2620:0:861:fe07::1                      14907 I

Routes learnt on Spine2:

   cmooney@lsw1-f1-eqiad> show route table PRODUCTION.inet.0 receive-protocol bgp 10.66.0.10     
   
   PRODUCTION.inet.0: 32 destinations, 35 routes (32 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 0.0.0.0/0               10.66.0.10                              14907 I
   cmooney@lsw1-f1-eqiad> show route table PRODUCTION.inet6.0 receive-protocol bgp 2620:0:861:fe08::1 
   
   PRODUCTION.inet6.0: 38 destinations, 42 routes (38 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * ::/0                    2620:0:861:fe08::1                      14907 I


These are exported as EVPN type 5 routes, announced and received by downstream LEAF switch lsw1-e3-eqiad:

   cmooney@lsw1-e3-eqiad> show route table bgp.evpn.0 aspath-regex ".* 14907$"    
   
   bgp.evpn.0: 79 destinations, 147 routes (79 active, 0 holddown, 0 hidden)
   + = Active Route, - = Last Active, * = Both
   
   5:10.64.128.3:5000::0::0.0.0.0::0/248               
                      *[BGP/170] 1w4d 19:04:01, localpref 100, from 10.64.128.3
                         AS path: 14907 I, validation-state: unverified
                       >  to 10.64.129.2 via et-0/0/54.0
                       [BGP/170] 1w4d 19:04:00, localpref 100, from 10.64.128.7
                         AS path: 14907 I, validation-state: unverified
                       >  to 10.64.129.2 via et-0/0/54.0
   5:10.64.128.7:5000::0::0.0.0.0::0/248               
                      *[BGP/170] 1w4d 23:37:43, localpref 100, from 10.64.128.7
                         AS path: 14907 I, validation-state: unverified
                       >  to 10.64.129.16 via et-0/0/55.0
                       [BGP/170] 1w4d 23:37:43, localpref 100, from 10.64.128.3
                         AS path: 14907 I, validation-state: unverified
                       >  to 10.64.129.16 via et-0/0/55.0
   5:10.64.128.3:5000::0::::::0/248               
                      *[BGP/170] 1w4d 19:03:49, localpref 100, from 10.64.128.3
                         AS path: 14907 I, validation-state: unverified
                       >  to 10.64.129.2 via et-0/0/54.0
                       [BGP/170] 1w4d 19:03:49, localpref 100, from 10.64.128.7
                         AS path: 14907 I, validation-state: unverified
                       >  to 10.64.129.2 via et-0/0/54.0
   5:10.64.128.7:5000::0::::::0/248               
                      *[BGP/170] 1w4d 23:37:39, localpref 100, from 10.64.128.7
                         AS path: 14907 I, validation-state: unverified
                       >  to 10.64.129.16 via et-0/0/55.0
                       [BGP/170] 1w4d 23:37:39, localpref 100, from 10.64.128.3
                         AS path: 14907 I, validation-state: unverified
                       >  to 10.64.129.16 via et-0/0/55.0


These are imported into the local EVPN table as shown here:

   cmooney@lsw1-e3-eqiad> show evpn ip-prefix-database prefix 0.0.0.0/0         
   L3 context: PRODUCTION
   
   EVPN->IPv4 Imported Prefixes
   Prefix                                       Etag
   0.0.0.0/0                                    0       
     Route distinguisher    VNI/Label  Router MAC         Nexthop/Overlay GW/ESI
     10.64.128.3:5000       3005000    a4:e1:1a:81:3a:80  10.64.128.3
     10.64.128.7:5000       3005000    a4:e1:1a:81:9e:80  10.64.128.7
   cmooney@lsw1-e3-eqiad> show evpn ip-prefix-database prefix ::/0              
   L3 context: PRODUCTION
   
   EVPN->IPv6 Imported Prefixes
   Prefix                                       Etag
   ::/0                                         0       
     Route distinguisher    VNI/Label  Router MAC         Nexthop/Overlay GW/ESI
     10.64.128.3:5000       3005000    a4:e1:1a:81:3a:80  10.64.128.3
     10.64.128.7:5000       3005000    a4:e1:1a:81:9e:80  10.64.128.7


The IPv4 route is imported into the routing table for the 'PRODUCTION' routing-instance, which is what we expect based on the configuration:

   cmooney@lsw1-e3-eqiad> show route table PRODUCTION.inet.0 0.0.0.0/0 exact                
   
   PRODUCTION.inet.0: 30 destinations, 36 routes (30 active, 0 holddown, 0 hidden)
   @ = Routing Use Only, # = Forwarding Use Only
   + = Active Route, - = Last Active, * = Both
   
   0.0.0.0/0          @[EVPN/170] 1w4d 22:56:36
                       >  to 10.64.129.2 via et-0/0/54.0
                       [EVPN/170] 1w5d 03:30:18
                       >  to 10.64.129.16 via et-0/0/55.0
                      #[Multipath/255] 1w4d 22:56:36, metric2 8
                       >  to 10.64.129.2 via et-0/0/54.0
                          to 10.64.129.16 via et-0/0/55.0


This is also true for IPv6:

   cmooney@lsw1-f2-eqiad> show route table PRODUCTION.inet6.0 ::/0 exact 
   
   PRODUCTION.inet6.0: 37 destinations, 44 routes (37 active, 0 holddown, 0 hidden)
   @ = Routing Use Only, # = Forwarding Use Only
   + = Active Route, - = Last Active, * = Both
   
   ::/0               @[EVPN/170] 1w5d 22:23:30
                       >  to 10.64.129.8 via et-0/0/54.0
                       [EVPN/170] 1w5d 19:39:18
                       >  to 10.64.129.22 via et-0/0/55.0
                      #[Multipath/255] 1w5d 19:39:18, metric2 8
                       >  to 10.64.129.8 via et-0/0/54.0
                          to 10.64.129.22 via et-0/0/55.0

BGP Route propagation to external hosts from VRF [IPv4]

The purpose of this test is to show that routes originating in the EVPN overlay / VRF are properly advertised to external peers.

The check is mainly that those routes are announced from the Spine switches to our CRs correctly.

IPv4

Relevant Config

   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group EXTERNAL4  
   type external;
   hold-time 30;
   import DEFAULT4;
   family inet {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown;
           }
       }
   }
   export EXT_OUT4;
   peer-as 14907;
   neighbor 10.66.0.8 {
       description cr1-eqiad;
   }
   cmooney@lsw1-e1-eqiad> show configuration policy-options policy-statement EXT_OUT4        
   /* Controls export of routes from VRF to external peers */
   term LVS {
       from {
           protocol [ bgp evpn ];
           as-path LOCAL_LVS;
       }
       then accept;
   }
   term K8_ANYCAST {
       from {
           protocol [ bgp evpn ];
           as-path LOCAL_K8_ANYCAST;
       }
       then accept;
   }
   term NETWORKS {
       from {
           protocol [ direct evpn ];
           route-filter 0.0.0.0/0 prefix-length-range /0-/29;
       }
       then accept;
   }
   then reject;


Results:

Top-of-rack switch subnets are propagated as expected (and, somewhat importantly, /32 host routes from EVPN type 2 sources / ARP snooping are not announced):

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 advertising-protocol bgp 10.66.0.8    
   
   PRODUCTION.inet.0: 30 destinations, 31 routes (30 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 10.64.130.0/24          Self                                    I
   * 10.64.131.0/24          Self                                    I
   * 10.64.132.0/24          Self                                    I
   * 10.64.134.0/24          Self                                    I
   * 10.64.138.0/24          Self                                    I
   * 10.64.142.0/24          Self                                    I
   * 208.80.154.226/32       Self                                    64600 ?
   * 208.80.154.229/32       Self                                    64600 ?

IPv6

   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group EXTERNAL6    
   type external;
   hold-time 30;
   import DEFAULT6;
   family inet6 {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown;
           }
       }
   }
   export EXT_OUT6;
   peer-as 14907;
   neighbor 2620:0:861:fe07::1 {
       description cr1-eqiad;
   }
   cmooney@lsw1-e1-eqiad> show configuration policy-options policy-statement EXT_OUT6                      
   /* Controls export of routes from VRF to external peers. */
   term LVS {
       from {
           protocol [ bgp evpn ];
           as-path LOCAL_LVS;
       }
       then accept;
   }
   term EVPN_K8_ANYCAST {
       from {
           protocol [ bgp evpn ];
           as-path LOCAL_K8_ANYCAST;
       }
       then accept;
   }
   term NETWORKS {
       from {
           protocol [ direct evpn ];
           route-filter ::/0 prefix-length-range /0-/125;
       }
       then accept;
   }
   then reject;
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet6.0 advertising-protocol bgp 2620:0:861:fe07::1  
   
   PRODUCTION.inet6.0: 36 destinations, 37 routes (36 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 2620:0:861:100::/64     Self                                    I
   * 2620:0:861:109::/64     Self                                    I
   * 2620:0:861:10a::/64     Self                                    I
   * 2620:0:861:10b::/64     Self                                    I
   * 2620:0:861:114::/64     Self                                    I
     2620:0:861:ed1a::3/128
   *                         Self                                    64600 ?
     2620:0:861:ed1a::6/128
   *                         Self                                    64600 ?
     2620:0:861:fe07::/64
   *                         Self                                    I
     2620:0:861:fe08::/64
   *                         Self                                    I

BGP Peering to external host from VRF using local-as 14907

If possible we want to avoid having to configure end-hosts with the ASN of the top-of-rack switch they are connected to. Instead our preference is to "fake" the AS used for peering from the switch side, always using the WMF public AS 14907, so that this piece of the configuration can be kept constant on the server side.

In order to do this the switch needs to support some form of "local-as" override. We need to be careful that:

  • BGP session establishes.
  • AS14907 is not placed in the AS-path when prefixes are propagated onwards (i.e. sent to CRs).
  • The device using fake 'local-as' 14907 can still establish and learn routes from an eBGP peer using that ASN (i.e. learnt from CRs).
  • If announcing routes over the BGP session only the the configured 'local-as' is included in the AS-path, the switch's default ASN is not present.

Relevant Config

The previous configuration for the PyBal group was modified with the 'local-as' configuration as shown:

   cmooney@lsw1-e1-eqiad> show configuration routing-instances PRODUCTION protocols bgp group PyBal             
   type external;
   hold-time 30;
   import LVS_IMPORT;
   family inet {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown 20;
           }
       }
   }
   family inet6 {
       unicast {
           prefix-limit {
               maximum 1000;
               teardown 20;
           }
       }
   }
   export NONE;
   peer-as 64600;
   local-as 14907 loops 2 private no-prepend-global-as;
   neighbor 10.64.130.10 {
       description ms-fe1012-test;
   }


To break down the meaning of these commands:

   local-as:  Determines what AS to use for session
   loops 2:   With 'local-as' set to 14907 the switch will not accept that AS in a received path, "loops 2" allows routes with 1 instance of it in.
   private:   This tells JunOS the 'local-as' is only relevant to the peering, it should use its normal ASN when propagating routes learnt on it.
   no-prepend-global-as:   This tells JunOS to not put its own, globally-configured ASN, into the as-path if announcing routes to this peer.

Results

Adjacency forms from test-host with peer-as set to 14907:

   ms-fe1012-test# show bgp ipv4 unicast summary 
   BGP router identifier 208.80.154.226, local AS number 64600 vrf-id 0
   BGP table version 19
   RIB entries 3, using 552 bytes of memory
   Peers 1, using 723 KiB of memory
   
   Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
   10.64.130.1     4      14907      3256      2968        0    0    0 06:46:07            0        1 N/A

The route being announced is accepted and learnt with as-path of the "PyBal" peer (64600):

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 receive-protocol bgp 10.64.130.10 
   
   PRODUCTION.inet.0: 30 destinations, 31 routes (30 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 208.80.154.226/32       10.64.130.10         0                  64600 ?
  

This is correctly being advertised externally to the CR, which we are peering to on its AS 14907:

   cmooney@lsw1-e1-eqiad> show bgp summary | match 14907 
   10.66.0.8             14907       2147       2266       0       0     5:41:29 Establ
   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 advertising-protocol bgp 10.66.0.8 | match 208.80.154.226     
   * 208.80.154.226/32       Self                                    64600 ?

On the CR the route is learnt with the switch's ASN followed by the PyBal one, as desired:

   cmooney@re0.cr1-eqiad> show route receive-protocol bgp 10.66.0.9 | match 208.80.154.226 
   * 208.80.154.226/32       10.66.0.9                               64810 64600 ?

On the switch we still learn and accept the route sent by the CR (with AS14907 in the path):

   cmooney@lsw1-e1-eqiad> show route table PRODUCTION.inet.0 receive-protocol bgp 10.66.0.8                               
   
   PRODUCTION.inet.0: 30 destinations, 31 routes (30 active, 0 holddown, 0 hidden)
     Prefix		  Nexthop	       MED     Lclpref    AS path
   * 0.0.0.0/0               10.66.0.8                               14907 I

Inernal routing from device connected to access vlan

This tests validates connectivity between hosts in the new racks and devices connected to VC-based switches in other rows.

IPv4

   root@elastic1093-test:~# mtr --address 10.64.132.2 -b -w -c 10 10.64.16.8
   Start: 2022-02-17T21:13:05+0000
   HOST: elastic1093-test                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1033.lsw1-e3-eqiad.eqiad.wmnet (10.64.132.1)  0.0%    10    4.4   4.7   2.0   8.4   2.4
     2.|-- irb-1035.lsw1-f1-eqiad.eqiad.wmnet (10.64.134.1)  0.0%    10    1.3   3.4   0.9   6.2   1.9
     3.|-- et-1-0-2-100.cr2-eqiad.eqiad.wmnet (10.66.0.10)   0.0%    10    0.3   1.0   0.3   6.1   1.8
     4.|-- phab1001.eqiad.wmnet (10.64.16.8)                 0.0%    10    0.2   0.3   0.2   0.3   0.0

IPv6

   root@elastic1093-test:~# mtr --address 2620:0:861:10b:10:64:132:2 -b -w -c 10  2620:0:861:102:10:64:16:8
   Start: 2022-02-17T21:12:33+0000
   HOST: elastic1093-test                                        Loss%   Snt   Last   Avg  Best  Wrst StDev
     1.|-- irb-1033.lsw1-e3-eqiad.eqiad.wmnet (2620:0:861:10b::1)   0.0%    10    0.6   0.5   0.5   0.6   0.0
     2.|-- irb-1039.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:100::1)   0.0%    10    0.6   0.6   0.5   0.7   0.1
     3.|-- et-1-0-2-100.cr1-eqiad.eqiad.wmnet (2620:0:861:fe07::1)  0.0%    10    0.4   0.6   0.4   1.8   0.4
     4.|-- phab1001.eqiad.wmnet (2620:0:861:102:10:64:16:8)         0.0%    10    0.3   0.3   0.3   0.3   0.0

External routing from device connected to access vlan

This tests validates connectivity between hosts in the new racks and devices connected to VC-based switches in other rows.

IPv4

   root@elastic1093-test:~# mtr -4 --address 208.80.154.229 -z -b -w -c 10 www.ietf.org
   Start: 2022-02-17T21:16:38+0000
   HOST: elastic1093-test                                          Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS???    irb-1033.lsw1-e3-eqiad.eqiad.wmnet (10.64.132.1)   0.0%    10    7.1   5.2   0.7  13.3   3.9
     2. AS???    irb-1035.lsw1-f1-eqiad.eqiad.wmnet (10.64.134.1)   0.0%    10    1.8   4.5   1.4   7.8   2.3
     3. AS???    et-1-0-2-100.cr2-eqiad.eqiad.wmnet (10.66.0.10)    0.0%    10    0.4   0.5   0.4   0.6   0.1
     4. AS???    13335.ash.equinix.com (206.126.237.30)            60.0%    10    1.8   7.7   1.8  13.3   6.4
     5. AS13335  172.70.172.2                                       0.0%    10    1.7   3.0   1.2  11.5   3.2
     6. AS13335  104.16.45.99                                       0.0%    10    1.2   1.2   1.1   1.3   0.0

IPv6

   root@elastic1093-test:~# mtr -6 --address 2620:0:861:ed1a::6 -z -b -w -c 10 www.ietf.org
   Start: 2022-02-17T21:17:12+0000
   HOST: elastic1093-test                                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS14907  irb-1033.lsw1-e3-eqiad.eqiad.wmnet (2620:0:861:10b::1)    0.0%    10    0.6   0.5   0.4   0.6   0.1
     2. AS14907  irb-1039.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:100::1)    0.0%    10    7.9   5.2   0.5  40.2  12.5
     3. AS14907  et-1-0-2-100.cr1-eqiad.eqiad.wmnet (2620:0:861:fe07::1)   0.0%    10    0.6   2.4   0.4  18.9   5.8
     4. AS14907  ae0.cr2-eqiad.wikimedia.org (2620:0:861:fe00::2)          0.0%    10    0.3   0.4   0.3   0.5   0.0
     5. AS???    13335.ash.equinix.com (2001:504:0:2:0:1:3335:1)          60.0%    10    1.3   5.4   1.3  17.0   7.8
     6. AS13335  2400:cb00:350:3::                                         0.0%    10    1.3   4.2   1.1  15.2   5.3
     7. AS13335  2606:4700::6810:2c63                                      0.0%    10    0.4   0.5   0.4   0.8   0.1


Worth observing that hop 2 is different in each case, as lsw1-e3 has two upstream "spines" (lsw1-e1 and lsw1-f1), and load-balances traffic across them.

ARP Supression

ARP suppression is a feature in EVPN layer-2 networks, which aims to minimise the amount of layer-2 broadcast traffic being sent, and thus support larger layer-2 segments. It is made possible through the inclusion of IP as well as MAC information in EVPN type 2 routes. The IP data is inserted by switches based on either on local ARP tables or snooping on traffic. As this is distributed in BGP/EVPN, devices become aware of remote MAC/IP bindings, and can answer ARP requests from their own hosts if they have the answer.

Checks

Ping Anycast GW 10.64.134.1 of lsw1-e1 from directly connected 10.64.131.72.

   root@an-worker1147-test:~# ping -I 10.64.134.72 -c 2 10.64.134.1
   PING 10.64.134.1 (10.64.134.1) from 10.64.134.72 : 56(84) bytes of data.
   64 bytes from 10.64.134.1: icmp_seq=1 ttl=64 time=5.28 ms
   64 bytes from 10.64.134.1: icmp_seq=2 ttl=64 time=2.71 ms

ARP entry is populated on lsw1-e1 as desired:

   cmooney@lsw1-e1-eqiad> show arp no-resolve expiration-time | match 10.64.134.72 
   e4:3d:1a:54:14:45 10.64.134.72    irb.1035 [xe-0/0/6.0]           permanent remote

In turn an EVPN route is learnt on lsw1-f1 containing this information:

   cmooney@lsw1-f1-eqiad> show route protocol bgp table bgp.evpn.0 terse | match 10.64.134.72         
       2:10.64.128.3:64810::2001035::e4:3d:1a:54:14:45::10.64.134.72/304 MAC/IP 

EVPN database shows it in more detail:

   cmooney@lsw1-e1-eqiad> show evpn database extensive mac-address e4:3d:1a:54:14:45  
   Instance: default-switch
   
   VN Identifier: 2001035, MAC address: e4:3d:1a:54:14:45
     State: 0x0
     Source: xe-0/0/6.0, Rank: 1, Status: Active
       Mobility sequence number: 0 (minimum origin address 10.64.128.3)
       Timestamp: Feb 17 21:38:12.700143 (0x620ec044)
       State: <Local-MAC-Only Local-To-Remote-Adv-Allowed>
       MAC advertisement route status: Created
       IP address: 10.64.134.72
       Flags: <Local-Adv>
         L3 route: 10.64.134.72/32, L3 context: PRODUCTION (irb.1035)
       IP address: 2620:0:861:10d::72
       Flags: <Local-Adv>
         L3 route: 2620:0:861:10d::72/128, L3 context: PRODUCTION (irb.1035)
       IP address: fe80::e63d:1aff:fe54:1445
       Flags: <Local-Adv>
         L3 route: fe80::e63d:1aff:fe54:1445/128, L3 context: PRODUCTION (irb.1035)
       History db: 
         Time                       Event
         Feb 17 12:36:30.139 2022   xe-0/0/6.0 : Created
         Feb 17 12:36:30.139 2022   Updating output state (change flags 0x1 <ESI-Added>)
         Feb 17 12:36:30.139 2022   Active ESI changing (not assigned -> xe-0/0/6.0)
         Feb 17 12:36:30.139 2022   xe-0/0/6.0 : Updating output state (change flags 0x200 <IP-Added>)
         Feb 17 12:37:40.161 2022   xe-0/0/6.0 : Updating output state (change flags 0x200 <IP-Added>)
         Feb 17 21:35:36.597 2022   xe-0/0/6.0 : 10.64.134.72 Selected IRB interface nexthop
         Feb 17 21:35:36.597 2022   xe-0/0/6.0 : 2620:0:861:10d::72 Selected IRB interface nexthop
         Feb 17 21:38:12.700 2022   xe-0/0/6.0 : Updating output state (change flags 0x200 <IP-Added>)
         Feb 17 21:38:12.700 2022   xe-0/0/6.0 : fe80::e63d:1aff:fe54:1445 Selected IRB interface nexthop

Ultimately this causes an ARP entry to be added on the lsw1-f1 binding the same MAC/IP:

   cmooney@lsw1-f1-eqiad> show arp no-resolve expiration-time | match 10.64.134.72   
   e4:3d:1a:54:14:45 10.64.134.72    irb.1035 [vtep.32769]           permanent remote


Test

Now that the IP/MAC info for 10.64.134.72 is populated across the network we need to check what happens when we send an ARP request for this IP from a host on the same Vlan, but connected to a different switch.

To begin I connected a host port with no IP address configured to a port on lsw1-f1, port was configured as an access port in Vlan 1035 / private1-f1-eqiad. Running 'date' a lot so the timing is clearer, I first added an IP on the hosts interface, making this the first moment it could participate in ARP:

   root@an-worker1148-test:/etc/systemd# date && ip addr add 10.64.134.11/24 dev eno2np1
   Thu 17 Feb 22:54:59 GMT 2022

I then pinged the Anycast GW IP (which will be lsw1-f1 as that's what we're connected to):

   root@an-worker1148-test:/etc/systemd# date && ping 10.64.134.1
   Thu 17 Feb 22:55:29 GMT 2022
   PING 10.64.134.1 (10.64.134.1) 56(84) bytes of data.
   64 bytes from 10.64.134.1: icmp_seq=1 ttl=64 time=98.0 ms
   64 bytes from 10.64.134.1: icmp_seq=2 ttl=64 time=1.14 ms

And now ping for the other host, connected to remote switch lsw1-e1:

   root@an-worker1148-test:/etc/systemd# date && ping 10.64.134.72
   Thu 17 Feb 22:55:33 GMT 2022
   PING 10.64.134.72 (10.64.134.72) 56(84) bytes of data.
   64 bytes from 10.64.134.72: icmp_seq=1 ttl=64 time=14.5 ms
   64 bytes from 10.64.134.72: icmp_seq=2 ttl=64 time=0.230 ms

Looking over packet captures running on this device, it sent ARPs as expected for both IPs pinged, and got answers for both:

   root@an-worker1148-test:~# date && tcpdump -i eno2np1 -l -nn arp  && date
   Thu 17 Feb 22:54:58 GMT 2022
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   22:55:29.065596 ARP, Request who-has 10.64.134.1 tell 10.64.134.11, length 28
   22:55:29.075964 ARP, Reply 10.64.134.1 is-at 00:00:5e:11:fa:ce, length 46
   22:55:33.324530 ARP, Request who-has 10.64.134.72 tell 10.64.134.11, length 28
   22:55:33.338783 ARP, Reply 10.64.134.72 is-at e4:3d:1a:54:14:45, length 46

A tcpdump on 10.64.134.72 reveals, however, that no ARP request hit the server from 10.64.134.11. The first packets we see from the other IP are the ICMP packets:

   root@an-worker1147-test:/etc/systemd# date && tcpdump -i eno2np1.1035 -l -nn host 10.64.134.11 or arp && date 
   Thu 17 Feb 22:54:54 GMT 2022
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1.1035, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   22:55:33.338005 IP 10.64.134.11 > 10.64.134.72: ICMP echo request, id 16418, seq 1, length 64
   22:55:33.338049 IP 10.64.134.72 > 10.64.134.11: ICMP echo reply, id 16418, seq 1, length 64
   22:55:34.325397 IP 10.64.134.11 > 10.64.134.72: ICMP echo request, id 16418, seq 2, length 64
   22:55:34.325424 IP 10.64.134.72 > 10.64.134.11: ICMP echo reply, id 16418, seq 2, length 64

We can thus conclude the ARP request an-worker1147 sent from 10.64.134.71 was not broadcast to all devices across the network, and instead lsw1-f1 generated the appropriate response when it received it, using the information from EVPN. This is an adequate, if not a very sophisticated, way to verify this. Unfortunately the QFX doesn't give any stats or counters on it, and Juniper's own test instructions are similarly based on doing packet captures.

ND Suppression

ND suppression in IPv6 is very similar to the concept of ARP suppression in IPv4. Once one switch has the IP/MAC binding it's included in EVPN type 2 routes, which mean the whole network should have this info. If other hosts perform neighbor discovery for a given IP, the packets are not multicast in the Vlan but the switches should generate a fake response, with the correct info learnt from EVPN, reducing the level of BUM traffic.

Configuration / vlan is the same as before.

Results

Again 'date' used to make the sequence clearer. First the IP address is added to the interface:

   root@an-worker1148-test:~# date && ip addr add 2620:0:861:10d::11/64 dev eno2np1
   Thu 17 Feb 23:34:12 GMT 2022

Then we ping the Anycast GW (will hit lsw1-f1 where we're connected):

   root@an-worker1148-test:~# date && ping 2620:0:861:10d::1
   Thu 17 Feb 23:34:22 GMT 2022
   PING 2620:0:861:10d::1(2620:0:861:10d::1) 56 data bytes
   64 bytes from 2620:0:861:10d::1: icmp_seq=1 ttl=64 time=12.2 ms
   64 bytes from 2620:0:861:10d::1: icmp_seq=2 ttl=64 time=0.491 ms

Lastly we ping an-worker1147, connected in the same Vlan on a remote switch:

   root@an-worker1148-test:~# date && ping 2620:0:861:10d::72
   Thu 17 Feb 23:34:29 GMT 2022
   PING 2620:0:861:10d::72(2620:0:861:10d::72) 56 data bytes
   64 bytes from 2620:0:861:10d::72: icmp_seq=1 ttl=64 time=21.2 ms
   64 bytes from 2620:0:861:10d::72: icmp_seq=2 ttl=64 time=0.232 ms

A packet capture on the device doing the pings shows several ND requests, including for an-worker1148:

   root@an-worker1148-test:~# date && tcpdump -i eno2np1 -l -nn icmp6
   Thu 17 Feb 23:33:40 GMT 2022
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   23:34:12.918598 IP6 :: > ff02::1:ff00:11: ICMP6, neighbor solicitation, who has 2620:0:861:10d::11, length 32
   23:34:22.706295 IP6 2620:0:861:10d::11 > ff02::1:ff00:1: ICMP6, neighbor solicitation, who has 2620:0:861:10d::1, length 32
   23:34:22.710987 IP6 2620:0:861:10d::1 > 2620:0:861:10d::11: ICMP6, neighbor advertisement, tgt is 2620:0:861:10d::1, length 32
   23:34:22.711020 IP6 2620:0:861:10d::11 > 2620:0:861:10d::1: ICMP6, echo request, id 10401, seq 1, length 64
   23:34:22.718416 IP6 2620:0:861:10d::1 > 2620:0:861:10d::11: ICMP6, echo reply, id 10401, seq 1, length 64
   23:34:23.707671 IP6 2620:0:861:10d::11 > 2620:0:861:10d::1: ICMP6, echo request, id 10401, seq 2, length 64
   23:34:23.708132 IP6 2620:0:861:10d::1 > 2620:0:861:10d::11: ICMP6, echo reply, id 10401, seq 2, length 64
   23:34:29.649454 IP6 2620:0:861:10d::11 > ff02::1:ff00:72: ICMP6, neighbor solicitation, who has 2620:0:861:10d::72, length 32
   23:34:29.659961 IP6 2620:0:861:10d::72 > 2620:0:861:10d::11: ICMP6, neighbor advertisement, tgt is 2620:0:861:10d::72, length 32
   23:34:29.659993 IP6 2620:0:861:10d::11 > 2620:0:861:10d::72: ICMP6, echo request, id 38872, seq 1, length 64
   23:34:29.670614 IP6 2620:0:861:10d::72 > 2620:0:861:10d::11: ICMP6, echo reply, id 38872, seq 1, length 64
   23:34:30.650792 IP6 2620:0:861:10d::11 > 2620:0:861:10d::72: ICMP6, echo request, id 38872, seq 2, length 64
   23:34:30.650994 IP6 2620:0:861:10d::72 > 2620:0:861:10d::11: ICMP6, echo reply, id 38872, seq 2, length 64

Capture on an-worker1148 shows, however, that the ND message never hit it:

   root@an-worker1147-test:~# tcpdump -i eno2np1.1035 -l -nn icmp6
   tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
   listening on eno2np1.1035, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   23:34:29.656794 IP6 2620:0:861:10d::11 > 2620:0:861:10d::72: ICMP6, echo request, id 38872, seq 1, length 64
   23:34:29.667235 IP6 2620:0:861:10d::72 > 2620:0:861:10d::11: ICMP6, echo reply, id 38872, seq 1, length 64
   23:34:30.647585 IP6 2620:0:861:10d::11 > 2620:0:861:10d::72: ICMP6, echo request, id 38872, seq 2, length 64
   23:34:30.647613 IP6 2620:0:861:10d::72 > 2620:0:861:10d::11: ICMP6, echo reply, id 38872, seq 2, length 64

Similar to the ARP test this shows that the switches are generating responses to ND requests locally, reducing BUM traffic in the Vlan.

DHCP Relay & Option 82 insertion

We format DHCP requests from end servers over unicast IPv4 to our 'install' servers in each datacenter. Those machines run ISC's dhcpd software and return IP addresses to end hosts if configured to do so for the specific end host. Determination of which host is requesting is based on the "option 82" information present in the DHCP DISCOVER messages from the host. This information is inserted by the top-of-rack switch when the packet goes through it, allowing this to take place.

Relevant Config

In our case the DHCP requests will arrive from IRB interface in an overlay VRF/routing-instance, and be relayed to the install server via the same instance. So the configuration is done in that part:

   cmooney@lsw1-e3-eqiad> show configuration routing-instances PRODUCTION forwarding-options               
   dhcp-relay {
       relay-option-82 {
           circuit-id {
               prefix {
                   host-name;
               }
           }
       }
       server-group {
           INSTALL-SERVER {
               208.80.154.32;
           }
       }
       group DHCP-RELAY {
           active-server-group INSTALL-SERVER;
           interface irb.1033;
           interface irb.1041;
       }
   }


Results

We can then send a DHCP request from a connected host, for instance elastic1093-test connected to lsw1-e3 xe-0/0/21.


DHCP DISCOVER is received on install1003 shortly after, with option 82 information populated as expected:

   cmooney@install1003:~$ sudo tcpdump -i ens5 -l -p -vvv -nn port 67 or port 68 and not src host 208.80.154.199
   tcpdump: listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
   23:56:56.244662 IP (tos 0x0, ttl 62, id 6901, offset 0, flags [none], proto UDP (17), length 408)
       10.64.132.1.67 > 208.80.154.32.67: [udp sum ok] BOOTP/DHCP, Request from e4:3d:1a:cb:5c:21, length 380, hops 1, xid 0xc24c6629, Flags [none] (0x0000)
   	  Gateway-IP 10.64.132.1
   	  Client-Ethernet-Address e4:3d:1a:cb:5c:21
   	  Vendor-rfc1048 Extensions
   	    Magic Cookie 0x63825363
   	    DHCP-Message Option 53, length 1: Discover
   	    Hostname Option 12, length 16: "elastic1093-test"
   	    Parameter-Request Option 55, length 13: 
   	      Subnet-Mask, BR, Time-Zone, Default-Gateway
   	      Domain-Name, Domain-Name-Server, Option 119, Hostname
   	      Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
   	      NTP
   	    Agent-Information Option 82, length 78: 
   	      Circuit-ID SubOption 1, length 43: lsw1-e3-eqiad:xe-0/0/21.0:private1-e3-eqiad
   	      Unknown SubOption 12, length 31: 
   		0x0000:  0002 0000 0000 0583 0100 0000 6134 3a65
   		0x0010:  313a 3161 3a38 313a 6435 3a38 3000 00
   	    END Option 255, length 0
   	    PAD Option 0, length 0, occurs 23

IPv6 Router Advertisement Generation

Although we statically configure end-host IPs in /etc/network/interfaces, the IPv6 address that is added there needs to be derived on a given host from a SLAAC-assigned address (giving us the subnet) and the IPv4 address (which we use as host portion of v6 address).

IPv6 router advertisements must be generated on all IRB/Vlan interfaces and sent to end hosts so they can configure an IPv6 global unicast address via SLAAC.

Relevant Config

   cmooney@lsw1-e3-eqiad> show configuration interfaces irb.1033 
   description private1-e3-eqiad;
   family inet {
       address 10.64.132.1/24;
   }
   family inet6 {
       address 2620:0:861:10b::1/64;
   }
   cmooney@lsw1-e3-eqiad> show configuration protocols router-advertisement     
   interface irb.1033 {
       max-advertisement-interval 30;
       default-lifetime 600;
       prefix 2620:0:861:10b::1/64;
   }

Results

RAs are received on end hosts connected to Vlan1033 as expected:

   cmooney@elastic1093:~$ sudo tcpdump -e -v -l -nn -i enp59s0f0np0 icmp6
   tcpdump: listening on enp59s0f0np0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
   13:28:55.906194 a4:e1:1a:81:d5:80 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 110: (hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::a6e1:1a04:981:d580 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 56
   	hop limit 64, Flags [none], pref medium, router lifetime 600s, reachable time 0ms, retrans timer 0ms
   	  source link-address option (1), length 8 (1): a4:e1:1a:81:d5:80
   	  prefix info option (3), length 32 (4): 2620:0:861:10b::/64, Flags [onlink, auto], valid time 2592000s, pref. time 604800s

DHCP Relay On Stretched Vlan / IRB with Anycast GW

Similar to in the previous case, we send a DHCP request from an-worker1148, connected to lsw1-f1 on port xe-0/0/6.

IRB is configured with Anycast GW config like this:

   cmooney@lsw1-f1-eqiad> show configuration interfaces irb.1035   
   virtual-gateway-accept-data;
   description private1-f1-eqiad;
   family inet {
       address 10.64.134.254/24 {
           preferred;
           virtual-gateway-address 10.64.134.1;
       }
   }
   family inet6 {
       address 2620:0:861:10d::254/64 {
           preferred;
           virtual-gateway-address 2620:0:861:10d::1;
       }
   }
   virtual-gateway-v4-mac 00:00:5e:11:fa:ce;
   virtual-gateway-v6-mac 00:00:5e:11:fa:ce;


When we run the DHCP client on an-worker1148 we see this on the install server. Worth noting in the source IP is lsw1-e1-eqiad's unique IP on the subnet, rather than the VIP shared by all devices (this is important to ensure replies go back to correct device):

   cmooney@install1003:~$ sudo tcpdump -i ens5 -l -p -vvv -nn net 10.64.134.0/24
   tcpdump: listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
   00:09:47.764115 IP (tos 0x0, ttl 63, id 23228, offset 0, flags [none], proto UDP (17), length 407)
       10.64.134.254.67 > 208.80.154.32.67: [udp sum ok] BOOTP/DHCP, Request from e4:3d:1a:54:ab:a7, length 379, hops 1, xid 0x8bee627f, Flags [none] (0x0000)
   	  Gateway-IP 10.64.134.254
   	  Client-Ethernet-Address e4:3d:1a:54:ab:a7
   	  Vendor-rfc1048 Extensions
   	    Magic Cookie 0x63825363
   	    DHCP-Message Option 53, length 1: Discover
   	    Hostname Option 12, length 18: "an-worker1148-test"
   	    Parameter-Request Option 55, length 13: 
   	      Subnet-Mask, BR, Time-Zone, Default-Gateway
   	      Domain-Name, Domain-Name-Server, Option 119, Hostname
   	      Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
   	      NTP
   	    Agent-Information Option 82, length 77: 
   	      Circuit-ID SubOption 1, length 42: lsw1-f1-eqiad:xe-0/0/6.0:private1-f1-eqiad
   	      Unknown SubOption 12, length 31: 
   		0x0000:  0002 0000 0000 0583 0100 0000 6134 3a65
   		0x0010:  313a 3161 3a38 313a 3965 3a38 3000 00
   	    END Option 255, length 0
   	    PAD Option 0, length 0, occurs 21


NOTE: In addition to the above packet, the install server also received a relayed DHCP request from lsw1-e1-eqiad, the only other switch in the fabric with Vlan1035 (private1-f1-eqiad) configured, also with an Anycast GW.

What happens is the DHCP discover, as well as being relayed to the install server by the receiving switch, is also processed as a regular broadcast within the Vlan. When it is demuxed by other switches it ultimately hits their equivalent IRB interface, triggering the same DHCP relay config. So the remote switches also end up relaying the packet, causing multiple to hit the install server. They insert different Option 82 info, however, which won't ever match a config block on the install server:

   Circuit-ID SubOption 1, length 42: lsw1-e1-eqiad:'vtep.32770':private1-f1-eqiad

So despite this traffic being unwanted, it won't cause any problems. Having looked briefly it does not seem to be possible to prevent this, but we'll will have limited, if any, stretched Vlans with Anycast GWs, so it's not a big deal.

IP Filters on Routed interface

This test is to validate that IP packets are correctly filtered on direct routed links (i.e. non-IRB interfaces) in a routing-instance/VRF.

IPv4

Relevant Config

New firewall filter was added to both Spine layer devices as follows:

   cmooney@lsw1-e1-eqiad> show configuration firewall family inet filter BLOCK_MS_FE1012 
   term BLOCK {
       from {
           destination-address {
               208.80.154.226/32;
           }
       }
       then {
           log;
           reject;
       }
   }
   term ALLOW {
       then accept;
   }
   cmooney@lsw1-f1-eqiad> show configuration firewall family inet filter BLOCK_MS_FE1012 
   term BLOCK {
       from {
           destination-address {
               208.80.154.226/32;
           }
       }
       then {
           log;
           reject;
       }
   }
   term ALLOW {
       then accept;
   }


Test

The above filter was applied on each Spine as an input filter on the link from CR routers. The filter is applied both sides as traffic may arrive in via either CR router and thus we want to filter in both places to be sure to catch the packets.

   cmooney@lsw1-e1-eqiad# show | compare 
   [edit interfaces et-0/0/48 unit 100 family inet]
   +       filter {
   +           input BLOCK_MS_FE1012;
   +       }
   
   cmooney@lsw1-f1-eqiad# show | compare 
   [edit interfaces et-0/0/48 unit 100 family inet]
   +       filter {
   +           input BLOCK_MS_FE1012;
   +       }


We then kick off a ping from this IP, and once running we commit the above changes:

   root@ms-fe1012-test:~# ping -I 208.80.154.226 1.1.1.1
   PING 1.1.1.1 (1.1.1.1) from 208.80.154.226 : 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=59 time=0.504 ms
   64 bytes from 1.1.1.1: icmp_seq=2 ttl=59 time=0.562 ms
   64 bytes from 1.1.1.1: icmp_seq=3 ttl=59 time=0.564 ms
   64 bytes from 1.1.1.1: icmp_seq=4 ttl=59 time=0.530 ms
   64 bytes from 1.1.1.1: icmp_seq=5 ttl=59 time=0.602 ms
   ^C
   --- 1.1.1.1 ping statistics ---
   14 packets transmitted, 5 received, 64.2857% packet loss, time 13289ms
   rtt min/avg/max/mdev = 0.504/0.552/0.602/0.033 ms

Looking at device F1 we can see the packets were dropped as expected:

   cmooney@lsw1-f1-eqiad> show firewall log | match 1.1.1.1 
   13:22:28  pfe       R      et-0/0/48.100       ICMP            1.1.1.1                          208.80.154.226
   13:22:27  pfe       R      et-0/0/48.100       ICMP            1.1.1.1                          208.80.154.226
   13:22:26  pfe       R      et-0/0/48.100       ICMP            1.1.1.1                          208.80.154.226
   13:22:25  pfe       R      et-0/0/48.100       ICMP            1.1.1.1                          208.80.154.226
   13:22:24  pfe       R      et-0/0/48.100       ICMP            1.1.1.1                          208.80.154.226

IPv6

Relevant Config

Filter applied on both spine switches:

   cmooney@lsw1-e1-eqiad> show configuration firewall family inet6 filter BLOCK_MS_FE1012_6 
   term BLOCK {
       from {
           destination-address {
               2620:0:861:ed1a::3/128;
           }
       }
       then {
           log;
           reject;
       }
   }
   term ALLOW {
       then accept;
   }
   cmooney@lsw1-f1-eqiad> show configuration firewall family inet6 filter BLOCK_MS_FE1012_6 
   term BLOCK {
       from {
           destination-address {
               2620:0:861:ed1a::3/128;
           }
       }
       then {
           log;
           reject;
       }
   }
   term ALLOW {
       then accept;
   }

Tests

Similar to the v4 test the filter is applied on the Spine device links towards the Eqiad CRs (to block return internet traffic):

   cmooney@lsw1-e1-eqiad# show | compare   
   [edit interfaces et-0/0/48 unit 100 family inet6]
   +       filter {
   +           input BLOCK_MS_FE1012_6;
   +       }
   
   
   cmooney@lsw1-f1-eqiad# show | compare 
   [edit interfaces et-0/0/48 unit 100 family inet6]
   +       filter {
   +           input BLOCK_MS_FE1012_6;
   +       }


Firstly we kick off a ping, and then commit the above config on both devices:

   root@ms-fe1012-test:~# ping -6 -I 2620:0:861:ed1a::3 one.one.one.one 
   PING one.one.one.one(one.one.one.one (2606:4700:4700::1111)) from 2620:0:861:ed1a::3 : 56 data bytes
   64 bytes from one.one.one.one (2606:4700:4700::1111): icmp_seq=1 ttl=59 time=0.390 ms
   64 bytes from one.one.one.one (2606:4700:4700::1111): icmp_seq=2 ttl=59 time=0.457 ms
   64 bytes from one.one.one.one (2606:4700:4700::1111): icmp_seq=3 ttl=59 time=0.471 ms
   64 bytes from one.one.one.one (2606:4700:4700::1111): icmp_seq=4 ttl=59 time=0.374 ms
   64 bytes from one.one.one.one (2606:4700:4700::1111): icmp_seq=5 ttl=59 time=0.422 ms
   ^C
   --- one.one.one.one ping statistics ---
   10 packets transmitted, 5 received, 50% packet loss, time 9154ms
   rtt min/avg/max/mdev = 0.374/0.422/0.471/0.037 ms


The logs on the Spine device show the packets dropped as expected:

   cmooney@lsw1-f1-eqiad> show firewall log | match 2620:0:861:ed1a::3 
   13:38:24  pfe       R      et-0/0/48.100       ICMPv6          2606:4700:4700::1111             2620:0:861:ed1a::3
   13:38:23  pfe       R      et-0/0/48.100       ICMPv6          2606:4700:4700::1111             2620:0:861:ed1a::3
   13:38:22  pfe       R      et-0/0/48.100       ICMPv6          2606:4700:4700::1111             2620:0:861:ed1a::3
   13:38:21  pfe       R      et-0/0/48.100       ICMPv6          2606:4700:4700::1111             2620:0:861:ed1a::3
   13:38:20  pfe       R      et-0/0/48.100       ICMPv6          2606:4700:4700::1111             2620:0:861:ed1a::3

IP Filters on IRB interface

This test is to validate that IP packets sourced from end-hosts connected to a Vlan by access or trunk links, and sending IP packets outside their subnet via an IRB gateway on the top-of-rack, is properly filtered by ACLs/firewall filters applied to the IRB interface.

IPv4

Relevant Configuration

New firewall filter added as follows:

   cmooney@lsw1-e3-eqiad> show configuration firewall family inet filter BLOCK_ELASTIC1093 
   term BLOCK {
       from {
           source-address {
               10.64.132.2/32;
           }
       }
       then {
           log;
           reject;
       }
   }
   term ALLOW {
       then accept;
   }

Tests

The firewall was applied to irb.1033 after a ping was initiated from elastic1093-test

   cmooney@lsw1-e3-eqiad# show | compare 
   [edit interfaces irb unit 1033 family inet]
   +       filter {
   +           input BLOCK_ELASTIC1093;
   +       }
   
   {master:0}[edit]
   cmooney@lsw1-e3-eqiad# commit 
   configuration check succeeds
   commit complete

When the filter was applied the ping stopped and host began to receive ICMP "packet filtered" messages (type 3, code 8) from the switch instead:

   root@elastic1093-test:~# ping -I 10.64.132.2 10.3.0.1
   PING 10.3.0.1 (10.3.0.1) from 10.64.132.2 : 56(84) bytes of data.
   64 bytes from 10.3.0.1: icmp_seq=1 ttl=61 time=0.234 ms
   64 bytes from 10.3.0.1: icmp_seq=2 ttl=61 time=0.246 ms
   64 bytes from 10.3.0.1: icmp_seq=3 ttl=61 time=0.264 ms
   64 bytes from 10.3.0.1: icmp_seq=4 ttl=61 time=0.189 ms
   64 bytes from 10.3.0.1: icmp_seq=5 ttl=61 time=0.257 ms
   From 10.64.132.1 icmp_seq=6 Packet filtered
   From 10.64.132.1 icmp_seq=7 Packet filtered
   From 10.64.132.1 icmp_seq=8 Packet filtered
   From 10.64.132.1 icmp_seq=9 Packet filtered

Switch firewall logs show the dropped packets too:

   cmooney@lsw1-e3-eqiad> show firewall log 
   Log :
   Time      Filter    Action Interface           Protocol        Src Addr                         Dest Addr
   10:48:26  pfe       R      xe-0/0/21.0         TCP             10.64.132.2                      10.64.132.1
   10:48:22  pfe       R      xe-0/0/21.0         TCP             10.64.132.2                      10.64.132.1
   10:48:21  pfe       R      xe-0/0/21.0         TCP             10.64.132.2                      10.64.132.1
   10:48:20  pfe       R      xe-0/0/21.0         TCP             10.64.132.2                      10.64.132.1
   10:48:20  pfe       R      xe-0/0/21.0         TCP             10.64.132.2                      10.64.132.1
   10:48:19  pfe       R      xe-0/0/21.0         TCP             10.64.132.2                      10.64.132.1
   10:48:18  pfe       R      xe-0/0/21.0         ICMP            10.64.132.2                      10.3.0.1
   10:48:17  pfe       R      xe-0/0/21.0         ICMP            10.64.132.2                      10.3.0.1
   10:48:17  pfe       R      xe-0/0/21.0         TCP             10.64.132.2                      10.64.132.1

IPv6

Same sort of test is repeated with IPv6. Test filter:

   cmooney@lsw1-e3-eqiad# run show configuration firewall family inet6 filter BLOCK-ELASTIC1093_6 
   term BLOCK {
       from {
           source-address {
               2620:0:861:10b:10:64:132:2/128;
           }
       }
       then {
           log;
           reject;
       }
   }
   term ALLOW {
       then accept;
   }

Results

Change commited as follows after ping started:

   cmooney@lsw1-e3-eqiad# show | compare 
   [edit interfaces irb unit 1033 family inet6]
   +    filter {
   +        input BLOCK-ELASTIC1093_6;
   +    }
   
   {master:0}[edit interfaces irb unit 1033]
   cmooney@lsw1-e3-eqiad# commit
   configuration check succeeds
   commit complete

Pings stopped on commit as expected, host starts getting ICMP "admin prohibited" (type 1 code 1) back from switch:

   root@elastic1093-test:~# ping -I 2620:0:861:10b:10:64:132:2 authdns1001.wikimedia.org
   PING authdns1001.wikimedia.org(authdns1001.wikimedia.org (2620:0:861:2:208:80:154:134)) from 2620:0:861:10b:10:64:132:2 : 56 data bytes
   64 bytes from authdns1001.wikimedia.org (2620:0:861:2:208:80:154:134): icmp_seq=1 ttl=61 time=0.161 ms
   64 bytes from authdns1001.wikimedia.org (2620:0:861:2:208:80:154:134): icmp_seq=2 ttl=61 time=0.168 ms
   64 bytes from authdns1001.wikimedia.org (2620:0:861:2:208:80:154:134): icmp_seq=3 ttl=61 time=0.176 ms
   64 bytes from authdns1001.wikimedia.org (2620:0:861:2:208:80:154:134): icmp_seq=4 ttl=61 time=0.211 ms
   64 bytes from authdns1001.wikimedia.org (2620:0:861:2:208:80:154:134): icmp_seq=5 ttl=61 time=0.234 ms
   64 bytes from authdns1001.wikimedia.org (2620:0:861:2:208:80:154:134): icmp_seq=6 ttl=61 time=0.180 ms
   From irb-1033.lsw1-e3-eqiad.eqiad.wmnet (2620:0:861:10b::1) icmp_seq=7 Destination unreachable: Administratively prohibited
   From irb-1033.lsw1-e3-eqiad.eqiad.wmnet (2620:0:861:10b::1) icmp_seq=8 Destination unreachable: Administratively prohibited
   From irb-1033.lsw1-e3-eqiad.eqiad.wmnet (2620:0:861:10b::1) icmp_seq=9 Destination unreachable: Administratively prohibited

Firewall logs also show the drops:

   cmooney@lsw1-e3-eqiad> show firewall log               
   Log :
   Time      Filter    Action Interface           Protocol        Src Addr                         Dest Addr
   11:54:14  pfe       R      xe-0/0/21.0         ICMPv6          2620:0:861:10b:10:64:132:2       fe80::a6e1:1a04:981:d580
   11:52:44  pfe       R      xe-0/0/21.0         ICMPv6          2620:0:861:10b:10:64:132:2       2620:0:861:2:208:80:154:134
   11:52:43  pfe       R      xe-0/0/21.0         ICMPv6          2620:0:861:10b:10:64:132:2       2620:0:861:2:208:80:154:134
   11:52:42  pfe       R      xe-0/0/21.0         ICMPv6          2620:0:861:10b:10:64:132:2       2620:0:861:2:208:80:154:134
   11:52:41  pfe       R      xe-0/0/21.0         ICMPv6          2620:0:861:10b:10:64:132:2       2620:0:861:2:208:80:154:134
   11:52:40  pfe       R      xe-0/0/21.0         ICMPv6          2620:0:861:10b:10:64:132:2       2620:0:861:2:208:80:154:134


Failover Tests

The previous set of tests validated the configuration and operation of the network in normal circumstances. We also need to verify that failover works as expected with the given configuration.

Spine Switch Failure

The first case we need to consider it what happens if a Spine switch fails. In our topology the Spine's provide all access to the other datacenter rows, as well as the outside internet. So we need to know that if one of them fails unexpectedly traffic will continue to flow via the other.

Test Setup

A ping to an internet host was run from elastic1093-test, which is connected to Leaf switch LSW1-E3. The destination IP for the ping was one which caused LSW-E3 to ECMP it via LSW-E1, and take the path out to the internet via CR1-EQIAD. While this ping was ongoing, the power cables were removed from LSW1-E1, making the path the ping traffic was taking invalid.

Firstly we verify the traffic is routing out via E1 for the given src + dst IP:

   root@elastic1093-test:~# mtr --address 208.80.154.229 -z -b -w -c 5 1.1.1.1
   Start: 2022-02-18T07:59:07+0000
   HOST: elastic1093-test                                          Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS???    irb-1033.lsw1-e3-eqiad.eqiad.wmnet (10.64.132.1)   0.0%     5    1.4   4.8   1.4   8.4   2.7
     2. AS???    irb-1031.lsw1-e1-eqiad.eqiad.wmnet (10.64.130.1)   0.0%     5    6.1   8.8   3.3  24.7   9.0
     3. AS???    et-1-0-2-100.cr1-eqiad.eqiad.wmnet (10.66.0.8)     0.0%     5    0.5   0.5   0.4   0.7   0.1
     4. AS14907  ae0.cr2-eqiad.wikimedia.org (208.80.154.194)       0.0%     5    0.3   0.8   0.2   2.7   1.1
     5. AS???    13335.ash.equinix.com (206.126.237.30)            60.0%     5    1.3   8.1   1.3  14.8   9.6
     6. AS13335  172.70.172.2                                       0.0%     5    7.5   3.0   1.6   7.5   2.6
     7. AS13335  one.one.one.one (1.1.1.1)                          0.0%     5    0.7   0.7   0.7   0.7   0.0

Then we kick off our ping, asking on-site staff to power down E1 once it's running:

   root@elastic1093-test:~# ping 1.1.1.1
   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=59 time=0.574 ms
   64 bytes from 1.1.1.1: icmp_seq=2 ttl=59 time=0.605 ms
   64 bytes from 1.1.1.1: icmp_seq=3 ttl=59 time=0.625 ms
   64 bytes from 1.1.1.1: icmp_seq=4 ttl=59 time=0.629 ms
   64 bytes from 1.1.1.1: icmp_seq=5 ttl=59 time=0.610 ms
   64 bytes from 1.1.1.1: icmp_seq=6 ttl=59 time=0.630 ms
   64 bytes from 1.1.1.1: icmp_seq=7 ttl=59 time=0.523 ms
   64 bytes from 1.1.1.1: icmp_seq=8 ttl=59 time=0.564 ms
   64 bytes from 1.1.1.1: icmp_seq=9 ttl=59 time=0.642 ms
   64 bytes from 1.1.1.1: icmp_seq=10 ttl=59 time=0.635 ms
   64 bytes from 1.1.1.1: icmp_seq=11 ttl=59 time=0.581 ms
   64 bytes from 1.1.1.1: icmp_seq=12 ttl=59 time=0.544 ms
   64 bytes from 1.1.1.1: icmp_seq=13 ttl=59 time=0.634 ms
   64 bytes from 1.1.1.1: icmp_seq=14 ttl=59 time=0.581 ms
   64 bytes from 1.1.1.1: icmp_seq=15 ttl=59 time=0.586 ms
   64 bytes from 1.1.1.1: icmp_seq=16 ttl=59 time=0.558 ms
   64 bytes from 1.1.1.1: icmp_seq=17 ttl=59 time=0.609 ms
   64 bytes from 1.1.1.1: icmp_seq=18 ttl=59 time=0.557 ms
   64 bytes from 1.1.1.1: icmp_seq=19 ttl=59 time=0.624 ms
   64 bytes from 1.1.1.1: icmp_seq=20 ttl=59 time=0.565 ms
   64 bytes from 1.1.1.1: icmp_seq=21 ttl=59 time=0.536 ms
   64 bytes from 1.1.1.1: icmp_seq=22 ttl=59 time=0.596 ms
   64 bytes from 1.1.1.1: icmp_seq=23 ttl=59 time=0.589 ms
   64 bytes from 1.1.1.1: icmp_seq=24 ttl=59 time=0.609 ms
   64 bytes from 1.1.1.1: icmp_seq=25 ttl=59 time=0.525 ms
   64 bytes from 1.1.1.1: icmp_seq=26 ttl=59 time=0.626 ms
   64 bytes from 1.1.1.1: icmp_seq=27 ttl=59 time=0.584 ms
   64 bytes from 1.1.1.1: icmp_seq=28 ttl=59 time=0.541 ms
   64 bytes from 1.1.1.1: icmp_seq=29 ttl=59 time=0.722 ms
   64 bytes from 1.1.1.1: icmp_seq=30 ttl=59 time=0.628 ms
   64 bytes from 1.1.1.1: icmp_seq=31 ttl=59 time=0.572 ms
   64 bytes from 1.1.1.1: icmp_seq=32 ttl=59 time=0.512 ms
   64 bytes from 1.1.1.1: icmp_seq=33 ttl=59 time=0.633 ms
   64 bytes from 1.1.1.1: icmp_seq=34 ttl=59 time=0.611 ms
   64 bytes from 1.1.1.1: icmp_seq=35 ttl=59 time=0.618 ms
   64 bytes from 1.1.1.1: icmp_seq=36 ttl=59 time=0.579 ms
   64 bytes from 1.1.1.1: icmp_seq=37 ttl=59 time=0.716 ms
   64 bytes from 1.1.1.1: icmp_seq=38 ttl=59 time=0.579 ms
   64 bytes from 1.1.1.1: icmp_seq=39 ttl=59 time=0.561 ms
   64 bytes from 1.1.1.1: icmp_seq=40 ttl=59 time=0.537 ms
   64 bytes from 1.1.1.1: icmp_seq=41 ttl=59 time=0.645 ms
   64 bytes from 1.1.1.1: icmp_seq=42 ttl=59 time=0.521 ms
   64 bytes from 1.1.1.1: icmp_seq=43 ttl=59 time=0.559 ms
   64 bytes from 1.1.1.1: icmp_seq=44 ttl=59 time=0.602 ms
   64 bytes from 1.1.1.1: icmp_seq=45 ttl=59 time=0.526 ms
   64 bytes from 1.1.1.1: icmp_seq=46 ttl=59 time=0.578 ms
   64 bytes from 1.1.1.1: icmp_seq=47 ttl=59 time=0.977 ms
   64 bytes from 1.1.1.1: icmp_seq=48 ttl=59 time=0.481 ms
   64 bytes from 1.1.1.1: icmp_seq=49 ttl=59 time=0.602 ms
   64 bytes from 1.1.1.1: icmp_seq=50 ttl=59 time=0.553 ms
   64 bytes from 1.1.1.1: icmp_seq=51 ttl=59 time=0.708 ms
   64 bytes from 1.1.1.1: icmp_seq=52 ttl=59 time=0.494 ms
   64 bytes from 1.1.1.1: icmp_seq=53 ttl=59 time=0.556 ms
   64 bytes from 1.1.1.1: icmp_seq=54 ttl=59 time=0.513 ms
   64 bytes from 1.1.1.1: icmp_seq=55 ttl=59 time=0.544 ms
   64 bytes from 1.1.1.1: icmp_seq=56 ttl=59 time=0.487 ms
   64 bytes from 1.1.1.1: icmp_seq=57 ttl=59 time=0.481 ms
   64 bytes from 1.1.1.1: icmp_seq=58 ttl=59 time=0.467 ms
   64 bytes from 1.1.1.1: icmp_seq=59 ttl=59 time=0.497 ms
   64 bytes from 1.1.1.1: icmp_seq=60 ttl=59 time=0.490 ms
   64 bytes from 1.1.1.1: icmp_seq=61 ttl=59 time=0.564 ms
   64 bytes from 1.1.1.1: icmp_seq=62 ttl=59 time=0.505 ms
   64 bytes from 1.1.1.1: icmp_seq=63 ttl=59 time=0.529 ms
   64 bytes from 1.1.1.1: icmp_seq=64 ttl=59 time=0.527 ms
   64 bytes from 1.1.1.1: icmp_seq=65 ttl=59 time=0.556 ms
   64 bytes from 1.1.1.1: icmp_seq=66 ttl=59 time=0.592 ms
   64 bytes from 1.1.1.1: icmp_seq=67 ttl=59 time=0.541 ms
   64 bytes from 1.1.1.1: icmp_seq=68 ttl=59 time=0.561 ms
   64 bytes from 1.1.1.1: icmp_seq=69 ttl=59 time=0.475 ms
   64 bytes from 1.1.1.1: icmp_seq=70 ttl=59 time=0.586 ms
   64 bytes from 1.1.1.1: icmp_seq=71 ttl=59 time=0.539 ms
   64 bytes from 1.1.1.1: icmp_seq=72 ttl=59 time=0.525 ms
   64 bytes from 1.1.1.1: icmp_seq=73 ttl=59 time=0.635 ms
   64 bytes from 1.1.1.1: icmp_seq=74 ttl=59 time=0.541 ms
   64 bytes from 1.1.1.1: icmp_seq=75 ttl=59 time=0.538 ms
   64 bytes from 1.1.1.1: icmp_seq=76 ttl=59 time=0.568 ms
   64 bytes from 1.1.1.1: icmp_seq=77 ttl=59 time=0.553 ms
   ^C
   --- 1.1.1.1 ping statistics ---
   77 packets transmitted, 77 received, 0% packet loss, time 77791ms
   rtt min/avg/max/mdev = 0.467/0.576/0.977/0.070 ms


Stopping the ping a short time after the confirmation LSW1-E1 was powered off we see that we have 0 pings lost, a good sign. We can also see in a traceroute that the traffic is now flowing out via LSW1-F1 to CR2-EQIAD:

   root@elastic1093-test:~# mtr --address 208.80.154.229 -z -b -w -c 5 1.1.1.1
   Start: 2022-02-18T08:08:11+0000
   HOST: elastic1093-test                                          Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS???    irb-1033.lsw1-e3-eqiad.eqiad.wmnet (10.64.132.1)   0.0%     5    7.4   3.8   1.4   7.4   2.4
     2. AS???    irb-1035.lsw1-f1-eqiad.eqiad.wmnet (10.64.134.1)   0.0%     5    8.3   5.3   3.2   8.3   2.0
     3. AS???    et-1-0-2-100.cr2-eqiad.eqiad.wmnet (10.66.0.10)    0.0%     5    0.6   0.8   0.4   1.5   0.5
     4. AS???    13335.ash.equinix.com (206.126.237.30)            40.0%     5    1.6   7.7   1.6  19.5  10.2
     5. AS13335  172.70.172.2                                       0.0%     5    2.7  11.2   1.2  45.3  19.1
     6. AS13335  one.one.one.one (1.1.1.1)                          0.0%     5    0.7   1.0   0.7   2.4   0.8

So all looks good. The other thing we want to do is verify things return to normal after power is restored, so we kick off another ping and ask on-site to restore power to LSW1-E1:

   root@elastic1093-test:~# ping 1.1.1.1
   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=59 time=0.504 ms
   64 bytes from 1.1.1.1: icmp_seq=2 ttl=59 time=0.660 ms
   64 bytes from 1.1.1.1: icmp_seq=3 ttl=59 time=0.524 ms
   64 bytes from 1.1.1.1: icmp_seq=4 ttl=59 time=0.607 ms
   64 bytes from 1.1.1.1: icmp_seq=5 ttl=59 time=0.553 ms
   64 bytes from 1.1.1.1: icmp_seq=6 ttl=59 time=0.543 ms
   64 bytes from 1.1.1.1: icmp_seq=7 ttl=59 time=0.566 ms
   64 bytes from 1.1.1.1: icmp_seq=8 ttl=59 time=0.565 ms
   64 bytes from 1.1.1.1: icmp_seq=9 ttl=59 time=0.559 ms
   64 bytes from 1.1.1.1: icmp_seq=10 ttl=59 time=0.593 ms
   64 bytes from 1.1.1.1: icmp_seq=11 ttl=59 time=0.654 ms
   64 bytes from 1.1.1.1: icmp_seq=12 ttl=59 time=0.565 ms
   64 bytes from 1.1.1.1: icmp_seq=13 ttl=59 time=0.671 ms
   64 bytes from 1.1.1.1: icmp_seq=14 ttl=59 time=0.576 ms
   64 bytes from 1.1.1.1: icmp_seq=15 ttl=59 time=0.605 ms
   64 bytes from 1.1.1.1: icmp_seq=16 ttl=59 time=0.565 ms
   64 bytes from 1.1.1.1: icmp_seq=17 ttl=59 time=0.575 ms
   64 bytes from 1.1.1.1: icmp_seq=18 ttl=59 time=0.700 ms
   64 bytes from 1.1.1.1: icmp_seq=19 ttl=59 time=0.573 ms
   64 bytes from 1.1.1.1: icmp_seq=20 ttl=59 time=0.632 ms
   64 bytes from 1.1.1.1: icmp_seq=21 ttl=59 time=0.580 ms
   64 bytes from 1.1.1.1: icmp_seq=22 ttl=59 time=0.609 ms
   64 bytes from 1.1.1.1: icmp_seq=23 ttl=59 time=0.649 ms
   64 bytes from 1.1.1.1: icmp_seq=24 ttl=59 time=0.639 ms
   64 bytes from 1.1.1.1: icmp_seq=25 ttl=59 time=0.579 ms
   64 bytes from 1.1.1.1: icmp_seq=26 ttl=59 time=0.618 ms
   64 bytes from 1.1.1.1: icmp_seq=27 ttl=59 time=0.597 ms
   64 bytes from 1.1.1.1: icmp_seq=28 ttl=59 time=0.639 ms
   64 bytes from 1.1.1.1: icmp_seq=29 ttl=59 time=0.706 ms
   64 bytes from 1.1.1.1: icmp_seq=30 ttl=59 time=0.557 ms
   64 bytes from 1.1.1.1: icmp_seq=31 ttl=59 time=0.547 ms
   64 bytes from 1.1.1.1: icmp_seq=32 ttl=59 time=0.595 ms
   64 bytes from 1.1.1.1: icmp_seq=33 ttl=59 time=0.583 ms
   64 bytes from 1.1.1.1: icmp_seq=34 ttl=59 time=0.611 ms
   64 bytes from 1.1.1.1: icmp_seq=35 ttl=59 time=0.596 ms
   64 bytes from 1.1.1.1: icmp_seq=36 ttl=59 time=0.591 ms
   64 bytes from 1.1.1.1: icmp_seq=37 ttl=59 time=0.624 ms
   64 bytes from 1.1.1.1: icmp_seq=38 ttl=59 time=0.554 ms
   64 bytes from 1.1.1.1: icmp_seq=39 ttl=59 time=0.606 ms
   64 bytes from 1.1.1.1: icmp_seq=40 ttl=59 time=0.588 ms
   64 bytes from 1.1.1.1: icmp_seq=41 ttl=59 time=0.620 ms
   64 bytes from 1.1.1.1: icmp_seq=42 ttl=59 time=0.562 ms
   64 bytes from 1.1.1.1: icmp_seq=43 ttl=59 time=0.602 ms
   64 bytes from 1.1.1.1: icmp_seq=44 ttl=59 time=0.592 ms
   64 bytes from 1.1.1.1: icmp_seq=45 ttl=59 time=0.615 ms
   64 bytes from 1.1.1.1: icmp_seq=46 ttl=59 time=0.606 ms
   64 bytes from 1.1.1.1: icmp_seq=47 ttl=59 time=0.550 ms
   64 bytes from 1.1.1.1: icmp_seq=48 ttl=59 time=0.593 ms
   64 bytes from 1.1.1.1: icmp_seq=49 ttl=59 time=0.621 ms
   64 bytes from 1.1.1.1: icmp_seq=50 ttl=59 time=0.560 ms
   64 bytes from 1.1.1.1: icmp_seq=51 ttl=59 time=0.597 ms
   64 bytes from 1.1.1.1: icmp_seq=52 ttl=59 time=0.626 ms
   64 bytes from 1.1.1.1: icmp_seq=53 ttl=59 time=0.577 ms
   64 bytes from 1.1.1.1: icmp_seq=54 ttl=59 time=0.558 ms
   64 bytes from 1.1.1.1: icmp_seq=55 ttl=59 time=0.588 ms
   64 bytes from 1.1.1.1: icmp_seq=56 ttl=59 time=0.544 ms
   64 bytes from 1.1.1.1: icmp_seq=57 ttl=59 time=0.564 ms
   64 bytes from 1.1.1.1: icmp_seq=58 ttl=59 time=0.643 ms
   64 bytes from 1.1.1.1: icmp_seq=59 ttl=59 time=0.569 ms
   64 bytes from 1.1.1.1: icmp_seq=60 ttl=59 time=0.573 ms
   64 bytes from 1.1.1.1: icmp_seq=61 ttl=59 time=0.572 ms
   64 bytes from 1.1.1.1: icmp_seq=62 ttl=59 time=0.599 ms
   64 bytes from 1.1.1.1: icmp_seq=63 ttl=59 time=0.596 ms
   64 bytes from 1.1.1.1: icmp_seq=64 ttl=59 time=0.580 ms
   64 bytes from 1.1.1.1: icmp_seq=65 ttl=59 time=0.643 ms
   64 bytes from 1.1.1.1: icmp_seq=66 ttl=59 time=0.601 ms
   64 bytes from 1.1.1.1: icmp_seq=67 ttl=59 time=0.549 ms
   64 bytes from 1.1.1.1: icmp_seq=68 ttl=59 time=0.701 ms
   64 bytes from 1.1.1.1: icmp_seq=69 ttl=59 time=0.653 ms
   64 bytes from 1.1.1.1: icmp_seq=70 ttl=59 time=0.631 ms
   64 bytes from 1.1.1.1: icmp_seq=71 ttl=59 time=0.632 ms
   64 bytes from 1.1.1.1: icmp_seq=72 ttl=59 time=0.617 ms
   64 bytes from 1.1.1.1: icmp_seq=73 ttl=59 time=0.601 ms
   64 bytes from 1.1.1.1: icmp_seq=74 ttl=59 time=0.597 ms
   64 bytes from 1.1.1.1: icmp_seq=75 ttl=59 time=0.668 ms
   64 bytes from 1.1.1.1: icmp_seq=76 ttl=59 time=0.572 ms
   64 bytes from 1.1.1.1: icmp_seq=77 ttl=59 time=0.601 ms
   64 bytes from 1.1.1.1: icmp_seq=78 ttl=59 time=0.637 ms
   64 bytes from 1.1.1.1: icmp_seq=79 ttl=59 time=0.589 ms
   64 bytes from 1.1.1.1: icmp_seq=80 ttl=59 time=0.571 ms
   64 bytes from 1.1.1.1: icmp_seq=81 ttl=59 time=0.608 ms
   64 bytes from 1.1.1.1: icmp_seq=82 ttl=59 time=0.674 ms
   64 bytes from 1.1.1.1: icmp_seq=83 ttl=59 time=0.583 ms
   64 bytes from 1.1.1.1: icmp_seq=84 ttl=59 time=0.629 ms
   64 bytes from 1.1.1.1: icmp_seq=85 ttl=59 time=0.629 ms
   64 bytes from 1.1.1.1: icmp_seq=86 ttl=59 time=0.592 ms
   64 bytes from 1.1.1.1: icmp_seq=87 ttl=59 time=0.610 ms
   64 bytes from 1.1.1.1: icmp_seq=88 ttl=59 time=0.542 ms
   ^C
   --- 1.1.1.1 ping statistics ---
   88 packets transmitted, 88 received, 0% packet loss, time 89071ms
   rtt min/avg/max/mdev = 0.504/0.598/0.706/0.039 ms

Again no packets lost, and when we do a trace out we can see the traffic is flowing via lsw1-e1 again:

   root@elastic1093-test:~# mtr --address 208.80.154.229 -z -b -w -c 5 1.1.1.1
   Start: 2022-02-18T08:15:14+0000
   HOST: elastic1093-test                                          Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS???    irb-1033.lsw1-e3-eqiad.eqiad.wmnet (10.64.132.1)   0.0%     5    1.6   5.1   1.6   8.1   3.2
     2. AS???    irb-1031.lsw1-e1-eqiad.eqiad.wmnet (10.64.130.1)   0.0%     5    6.1  13.9   0.8  51.9  21.6
     3. AS???    et-1-0-2-100.cr1-eqiad.eqiad.wmnet (10.66.0.8)     0.0%     5    0.5   0.5   0.4   0.6   0.1
     4. AS14907  ae0.cr2-eqiad.wikimedia.org (208.80.154.194)       0.0%     5    0.3   0.6   0.3   1.6   0.6
     5. AS???    ???                                               100.0     5    0.0   0.0   0.0   0.0   0.0
     6. AS13335  172.70.172.2                                       0.0%     5    2.2   1.8   1.2   2.2   0.4
     7. AS13335  one.one.one.one (1.1.1.1)                          0.0%     5    0.7   0.7   0.7   0.7   0.0

Leaf to Spine Link Failure

This was not tested separately as the 'Spine Link Failure' effectively, from the Leaf switch's point of view, has the same effect as the link to the Spine going down. It has no way to tell if it's a device failure or just the link, so it will behave in the same way. As such there is no need to run a separate test.

Spine to CR Link Failure

In this test we want to pull the cable running from lsw1-e1 to cr1-eqiad, and observe what happens. Testing is again run from elastic1093-test, connected to lsw1-e3. LSW1-E3 will not notice any physical change or difference in IGP topology, however it should get an EVPN BGP WITHDRAW for the default route it was receiving from LSW1-E1, with it's own VTEP IP as next-hop. LSW1-E3 will still receive a default BGP route from LSW1-E1, however this will be the route it knows from LSW1-F1, and the next-hop VTEP will be that of the other switch.

This process should happen very quickly, and all traffic flow from LSW1-E3 to LSW1-F1 shortly after the cable pull and resulting BGP WITHDRAW.

Results

Firstly we establish that our ping routes out from LSW1-E3 via LSW1-E1 for our chosen src and destination:

   root@elastic1093-test:~# mtr --address 2620:0:861:ed1a::6 -z -b -w -c 5 www.ietf.org
   Start: 2022-02-18T08:15:56+0000
   HOST: elastic1093-test                                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS14907  irb-1033.lsw1-e3-eqiad.eqiad.wmnet (2620:0:861:10b::1)    0.0%     5    0.5   0.5   0.5   0.6   0.0
     2. AS14907  irb-1039.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:100::1)    0.0%     5    0.5   0.5   0.5   0.6   0.0
     3. AS14907  et-1-0-2-100.cr1-eqiad.eqiad.wmnet (2620:0:861:fe07::1)   0.0%     5    0.8   0.6   0.4   0.8   0.2
     4. AS14907  ae0.cr2-eqiad.wikimedia.org (2620:0:861:fe00::2)          0.0%     5    2.1   0.9   0.4   2.1   0.7
     5. AS???    13335.ash.equinix.com (2001:504:0:2:0:1:3335:1)          80.0%     5    2.1   2.1   2.1   2.1   0.0
     6. AS13335  2400:cb00:354:3::                                         0.0%     5    0.4   3.9   0.4   8.2   3.5
     7. AS13335  2606:4700::6810:2d63                                      0.0%     5    0.5   1.3   0.4   4.6   1.8

Next we kick off the ping, and ask DC-Ops to pull the cable in the meantime:

   root@elastic1093-test:~# ping6 -I 2620:0:861:ed1a::6 www.ietf.org
   PING www.ietf.org(2606:4700::6810:2d63 (2606:4700::6810:2d63)) from 2620:0:861:ed1a::6 : 56 data bytes
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=1 ttl=59 time=4.69 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=2 ttl=59 time=0.440 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=3 ttl=59 time=0.364 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=4 ttl=59 time=0.434 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=5 ttl=59 time=0.374 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=6 ttl=59 time=0.470 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=7 ttl=59 time=0.454 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=8 ttl=59 time=0.489 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=9 ttl=59 time=0.376 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=10 ttl=59 time=0.709 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=11 ttl=59 time=4.29 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=12 ttl=59 time=1.28 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=13 ttl=59 time=0.458 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=14 ttl=59 time=0.414 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=15 ttl=59 time=14.4 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=16 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=17 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=18 ttl=59 time=0.463 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=19 ttl=59 time=0.446 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=20 ttl=59 time=0.457 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=21 ttl=59 time=0.446 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=22 ttl=59 time=0.432 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=23 ttl=59 time=0.458 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=24 ttl=59 time=0.440 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=25 ttl=59 time=0.426 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=26 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=27 ttl=59 time=0.434 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=28 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=29 ttl=59 time=0.340 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=30 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=31 ttl=59 time=0.461 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=32 ttl=59 time=0.432 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=33 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=34 ttl=59 time=0.427 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=35 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=36 ttl=59 time=0.425 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=37 ttl=59 time=0.346 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=38 ttl=59 time=0.435 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=39 ttl=59 time=0.425 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=40 ttl=59 time=0.439 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=41 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=42 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=43 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=44 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=45 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=46 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=47 ttl=59 time=0.425 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=48 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=49 ttl=59 time=0.427 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=50 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=51 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=52 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=53 ttl=59 time=0.446 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=54 ttl=59 time=0.351 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=55 ttl=59 time=0.437 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=56 ttl=59 time=0.429 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=57 ttl=59 time=0.452 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=58 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=59 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=60 ttl=59 time=0.457 ms
   ^C
   --- www.ietf.org ping statistics ---
   60 packets transmitted, 60 received, 0% packet loss, time 59427ms
   rtt min/avg/max/mdev = 0.340/0.814/14.417/1.917 ms


Zero packet's lost again which is good, let's verify the traffic is taking the new path:

   root@elastic1093-test:~# mtr --address 2620:0:861:ed1a::6 -z -b -w -c 5 www.ietf.org
   Start: 2022-02-18T08:21:40+0000
   HOST: elastic1093-test                                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS14907  irb-1033.lsw1-e3-eqiad.eqiad.wmnet (2620:0:861:10b::1)    0.0%     5    0.5   0.5   0.5   0.6   0.0
     2. AS14907  irb-1035.lsw1-f1-eqiad.eqiad.wmnet (2620:0:861:10d::1)    0.0%     5    0.7  10.2   0.6  48.3  21.3
     3. AS14907  et-1-0-2-100.cr2-eqiad.eqiad.wmnet (2620:0:861:fe08::1)   0.0%     5    0.7   0.5   0.4   0.7   0.1
     4. AS???    13335.ash.equinix.com (2001:504:0:2:0:1:3335:1)          40.0%     5   12.0   7.5   1.5  12.0   5.4
     5. AS13335  2400:cb00:350:3::                                         0.0%     5    4.7   2.0   1.1   4.7   1.5
     6. AS13335  2606:4700::6810:2c63                                      0.0%     5    0.5   0.4   0.4   0.5   0.0

All looking good so we ask on-site to restore the link, and monitor to make sure there are no issues:

   root@elastic1093-test:~# ping6 -I 2620:0:861:ed1a::6 www.ietf.org
   PING www.ietf.org(2606:4700::6810:2d63 (2606:4700::6810:2d63)) from 2620:0:861:ed1a::6 : 56 data bytes
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=1 ttl=59 time=0.370 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=2 ttl=59 time=0.353 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=3 ttl=59 time=0.434 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=4 ttl=59 time=0.442 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=5 ttl=59 time=0.428 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=6 ttl=59 time=0.438 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=7 ttl=59 time=0.341 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=8 ttl=59 time=0.335 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=9 ttl=59 time=0.361 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=10 ttl=59 time=0.425 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=11 ttl=59 time=5.31 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=12 ttl=59 time=0.353 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=13 ttl=59 time=0.448 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=14 ttl=59 time=0.420 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=15 ttl=59 time=0.447 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=16 ttl=59 time=0.425 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=17 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=18 ttl=59 time=0.335 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=19 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=20 ttl=59 time=0.427 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=21 ttl=59 time=0.606 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=22 ttl=59 time=0.475 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=23 ttl=59 time=0.444 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=24 ttl=59 time=0.425 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=25 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=26 ttl=59 time=0.444 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=27 ttl=59 time=0.338 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=28 ttl=59 time=0.342 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=29 ttl=59 time=0.426 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=30 ttl=59 time=0.331 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=31 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=32 ttl=59 time=0.425 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=33 ttl=59 time=0.463 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=34 ttl=59 time=0.496 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=35 ttl=59 time=0.538 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=36 ttl=59 time=0.420 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=37 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=38 ttl=59 time=0.794 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=39 ttl=59 time=0.608 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=40 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=41 ttl=59 time=0.438 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=42 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=43 ttl=59 time=0.545 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=44 ttl=59 time=0.542 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=45 ttl=59 time=0.426 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=46 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=47 ttl=59 time=0.483 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=48 ttl=59 time=0.421 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=49 ttl=59 time=0.446 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=50 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=51 ttl=59 time=0.434 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=52 ttl=59 time=0.440 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=53 ttl=59 time=0.448 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=54 ttl=59 time=0.638 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=55 ttl=59 time=0.452 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=56 ttl=59 time=0.475 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=57 ttl=59 time=0.466 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=58 ttl=59 time=0.401 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=59 ttl=59 time=0.522 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=60 ttl=59 time=1.22 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=61 ttl=59 time=0.441 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=62 ttl=59 time=0.448 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=63 ttl=59 time=0.426 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=64 ttl=59 time=0.340 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=65 ttl=59 time=0.441 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=66 ttl=59 time=0.547 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=67 ttl=59 time=0.362 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=68 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=69 ttl=59 time=0.373 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=70 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=71 ttl=59 time=0.441 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=72 ttl=59 time=0.520 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=73 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=74 ttl=59 time=0.641 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=75 ttl=59 time=0.432 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=76 ttl=59 time=1.78 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=77 ttl=59 time=0.440 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=78 ttl=59 time=0.428 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=79 ttl=59 time=0.421 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=80 ttl=59 time=0.430 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=81 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=82 ttl=59 time=0.380 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=83 ttl=59 time=0.451 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=84 ttl=59 time=0.440 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=85 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=86 ttl=59 time=0.387 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=87 ttl=59 time=2.15 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=88 ttl=59 time=0.473 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=89 ttl=59 time=0.523 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=90 ttl=59 time=0.444 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=91 ttl=59 time=0.421 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=92 ttl=59 time=0.452 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=93 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=94 ttl=59 time=0.457 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=95 ttl=59 time=0.351 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=96 ttl=59 time=0.518 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=97 ttl=59 time=0.551 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=98 ttl=59 time=0.439 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=99 ttl=59 time=0.440 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=100 ttl=59 time=0.446 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=101 ttl=59 time=0.445 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=102 ttl=59 time=0.501 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=103 ttl=59 time=0.455 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=104 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=105 ttl=59 time=0.443 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=106 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=107 ttl=59 time=0.444 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=108 ttl=59 time=0.457 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=109 ttl=59 time=0.443 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=110 ttl=59 time=0.451 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=111 ttl=59 time=0.442 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=112 ttl=59 time=0.335 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=113 ttl=59 time=0.421 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=114 ttl=59 time=0.452 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=115 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=116 ttl=59 time=0.718 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=117 ttl=59 time=0.377 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=118 ttl=59 time=0.416 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=119 ttl=59 time=0.444 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=120 ttl=59 time=3.95 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=121 ttl=59 time=0.529 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=122 ttl=59 time=0.447 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=123 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=124 ttl=59 time=0.444 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=125 ttl=59 time=0.435 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=126 ttl=59 time=0.466 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=127 ttl=59 time=0.451 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=128 ttl=59 time=0.440 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=129 ttl=59 time=0.435 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=130 ttl=59 time=0.457 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=131 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=132 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=133 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=134 ttl=59 time=0.460 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=135 ttl=59 time=0.336 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=136 ttl=59 time=0.460 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=137 ttl=59 time=0.449 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=138 ttl=59 time=0.420 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=139 ttl=59 time=0.421 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=140 ttl=59 time=0.465 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=141 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=142 ttl=59 time=0.435 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=143 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=144 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=145 ttl=59 time=0.462 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=146 ttl=59 time=0.443 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=147 ttl=59 time=0.449 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=148 ttl=59 time=0.442 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=149 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=150 ttl=59 time=0.353 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=151 ttl=59 time=0.463 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=152 ttl=59 time=0.428 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=153 ttl=59 time=0.466 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=154 ttl=59 time=0.441 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=155 ttl=59 time=0.455 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=156 ttl=59 time=0.453 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=157 ttl=59 time=0.422 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=158 ttl=59 time=0.461 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=159 ttl=59 time=0.387 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=160 ttl=59 time=0.434 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=161 ttl=59 time=0.449 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=162 ttl=59 time=0.441 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=163 ttl=59 time=0.364 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=164 ttl=59 time=0.431 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=165 ttl=59 time=0.428 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=166 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=167 ttl=59 time=0.433 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=168 ttl=59 time=0.465 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=169 ttl=59 time=0.439 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=170 ttl=59 time=0.456 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=171 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=172 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=173 ttl=59 time=0.449 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=174 ttl=59 time=0.499 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=175 ttl=59 time=0.449 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=176 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=177 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=178 ttl=59 time=4.20 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=179 ttl=59 time=0.495 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=180 ttl=59 time=0.357 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=181 ttl=59 time=0.433 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=182 ttl=59 time=0.441 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=183 ttl=59 time=0.446 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=184 ttl=59 time=0.354 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=185 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=186 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=187 ttl=59 time=0.434 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=188 ttl=59 time=0.424 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=189 ttl=59 time=0.425 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=190 ttl=59 time=0.481 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=191 ttl=59 time=0.459 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=192 ttl=59 time=0.445 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=193 ttl=59 time=0.423 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=194 ttl=59 time=0.535 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=195 ttl=59 time=0.440 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=196 ttl=59 time=0.456 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=197 ttl=59 time=0.363 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=198 ttl=59 time=0.421 ms
   64 bytes from 2606:4700::6810:2d63 (2606:4700::6810:2d63): icmp_seq=199 ttl=59 time=0.426 ms
   ^C
   --- www.ietf.org ping statistics ---
   199 packets transmitted, 199 received, 0% packet loss, time 199094ms
   rtt min/avg/max/mdev = 0.331/0.521/5.307/0.525 ms
   root@elastic1093-test:~# mtr --address 2620:0:861:ed1a::6 -z -b -w -c 5 www.ietf.org
   Start: 2022-02-18T08:25:48+0000
   HOST: elastic1093-test                                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
     1. AS14907  irb-1033.lsw1-e3-eqiad.eqiad.wmnet (2620:0:861:10b::1)    0.0%     5    0.6   1.7   0.5   6.2   2.5
     2. AS14907  irb-1039.lsw1-e1-eqiad.eqiad.wmnet (2620:0:861:100::1)    0.0%     5    0.7   1.0   0.5   2.8   1.0
     3. AS14907  et-1-0-2-100.cr1-eqiad.eqiad.wmnet (2620:0:861:fe07::1)   0.0%     5    0.4   0.5   0.4   0.6   0.1
     4. AS14907  ae0.cr2-eqiad.wikimedia.org (2620:0:861:fe00::2)          0.0%     5    0.5   0.6   0.3   1.2   0.4
     5. AS???    13335.ash.equinix.com (2001:504:0:2:0:1:3335:1)          80.0%     5   29.6  29.6  29.6  29.6   0.0
     6. AS13335  2400:cb00:354:3::                                         0.0%     5    1.1   1.8   0.5   5.0   1.8
     7. AS13335  2606:4700::6810:2d63                                      0.0%     5    0.5   0.6   0.5   0.8   0.1

Again no lost packets, and we can see that by the time we are finished traffic is back on the original path.