Peering management
Peering management is something with still a large manual process. This is mostly due to communication (for new, changes, or issues) happening over emails.
Finding peering candidates
peering@ email alias
The easy one, as we usually accept all peering requests.
Equinix peering opportunities portal
https://ix.equinix.com/portal/peering/peering-opportunities
Using their own flow data, Equinix brings to light networks we peer with at some of the Equinix IXPs, but where sessions are missing from other IXPs.
Eg We peer with AS X at Equinix Ashburn, and both Wikimedia and X are present at Equinix Chicago but we don't have any BGP session yet.
Netflow
See also Netflow
https://turnilo.wikimedia.org/#wmf_netflow allows to filter/sort through all our external traffic on BGP criteria.
For example https://w.wiki/5mj6 sorts outbound drmrs traffic by AS_PATH (with the first 2 AS the traffic is going to transit through), and the final AS (the final AS of the path).
It's also possible to filter for BGP communities matching peering or transit traffic.
This can help reveal networks (ASNs) that we currently reach through our transit peers. This needs to be used with PeeringDB to identify those peers' IX presences.
PeeringDB
https://www.peeringdb.com/asn/14907
This is the most comprehensive list of networks present at a given IXP, and thus candidates networks.
Peering News
Python script that checks for any new routers at the IXPs we're present. It currently runs weekly on diffscan02 (cloud VPS) with a systemd timer (Puppet profile) and send the output by email to peering@.
IX mailing lists
New IXP members will be announced on the IXP mailing list
Peering workflow
This workflow is quite flexible as peers behaviors varies greatly.
Setting up new sessions
Blue: via cookbooks, yellow: manual.
Notes
- Even though we prefer to not use any MD5 key, some peers require it, this need to be manually added after running the configure cookbook
- If the peer's ETA is long in the future, wait to be close to the date to configure our side to limit the risk of alerting/log noise.
Managing down sessions
Notes
- Icinga BGP status alerts, only applies to WARNING: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=bgp%20status
- It's also possible to check directly the peer's PeeringDB page to check if the peer is still present on the IXP.
- The "configured prefix limit can be seen by running the following command on the alerting router:
show bgp neighbor <peerIP> | match Prefixlimit
(eg. "inet-unicast Limit: 10000")- To be compared with the IPv4/IPv6 Prefixes fields of the ASN's PeeringDB page.
Manually increase the prefix limit
If the current limit is too low, set a new custom limit for that peer. Generally set to PeeringDB limit + 20%: 1. Commands:
configure set protocols bgp group IX4 neighbor <IP> family inet unicast prefix-limit maximum <new_limit> set protocols bgp group IX4 neighbor <IP> family inet unicast prefix-limit teardown 80 set protocols bgp group IX4 neighbor <IP> family inet unicast prefix-limit teardown idle-timeout forever commit exit
NOTE: For an IPv6 peer replace 'IX4' with 'IX6' and 'inet' with 'inet6'
2. Once the new limit is set clear the BGP session to the peer: clear bgp neighbor <IP>
3. After a minute or so the BGP peer should show as status 'established' if things go ok:
cmooney@cr3-eqsin> show bgp summary | match 27.111.228.33 27.111.228.33 4800 201 8 0 6 1:03 Establ
Providers not using emails to manage peering
Google - https://isp.google.com/
Cloudflare - https://peering.cloudflare.com/
Microsoft - https://learn.microsoft.com/en-us/azure/internet-peering/howto-exchange-portal
Netflix - https://openconnect.zendesk.com/hc/en-us/requests/new?ticket_form_id=360001023311
Possible improvements
- Automate extensively the peering candidate search by joining PeeringDB data, Hive/netflow, the list of current peers (from the routers), and a manual list of exceptions.
- An even more advanced version could show the benefits of extending our peering to new IXPs based on the peer's list
- Automate the "BGP sessions down" workflow
- Add a more formal data store/model of all our sessions which is used by Homer to build the complete configuration, and adjust the cookbook to modify this data rather than the device configs.