SONiC/Dell Enterprise Sonic Evaluation

From Wikitech

Background

For many years Wikimedia have used Juniper equipment for all networking requirements (currently edge/WAN routers, datacenter switches, management firewalls). While we are broadly happy with Juniper, it is also imperative to assess alternatives, ensuring the foundation gets value for money and the best performance possible.

Foundation Costs

Recent years have seen the cost of datacenter switches in particular increasing. This has partially been driven by a gradual move from 1G to faster connections to end-hosts, with the newer equipment supporting 10G+ speeds being pricier. But there have also been increased costs for software licenses, which in the past were part of the 'base' system, pushing up the overall price. The supply-chain / chip shortage problems that emerged from 2020 onwards have only accelerated this trend.

Open Source

JunOS, Juniper's operating system, stands out in the foundation as one of the largest closed-source / proprietary software systems in use. The foundation's values, on the other hand, strongly express a preference for open source tools. This is qualified by stating that it is acceptable to use "closed tools (such as software, operating systems, etc.) where there is currently no open-source tool that will effectively meet our needs."

Proprietary operating systems have long been the norm for network devices. These typically use custom ASICs for packet forwarding, and are not based on the largely open x86/amd64 architecture which server operating systems target. The specialized and proprietary nature of such hardware has seen vendors offering "vertically integrated" software/hardware stacks since the dawn of the industry.

White Box

In more recent years there has been some movement away from this. Driven initially by the large web-scalers, disaggregated or white box switching has risen to prominence. In this model the switching hardware is provided by one company, and the operating-system is sourced elsewhere (much like one buys a Dell server and runs Debian or Windows on it, without consulting Dell). Such an approach offers many advantages, like being able to change vendors but keep the same operating system. Or change the OS in use on existing hardware. "White box" switch hardware (from the likes of EdgeCore and Quanta) is typically available for a substantially lower cost than brand-name alternatives. There can be drawbacks, however, such as not having a "one stop shop" for support.

Ecosystem

Another caveat is that a small number of ASIC vendors, notably Broadcom, have the switching market carved up. These vendors often gate access to their designs and SDKs, limiting the scope for independent parties to create software for them. In one famous case Broadcom ceased licensing its SDK to Cumulus Networks, after they were acquired by rival hardware manufacturer nVidia. This left customers forced to choose another hardware supplier, or move to another OS, when it came time to upgrade. The reality right now is that it is not possible to produce an operating system for switching hardware without permission from the ASIC vendor.

Nevertheless the space has opened up and there are several "white box" NOS's available, even if things won't ever be as open as for server hardware. Options include commercial offerings such as PicOS, ArcOS and OcNOS, as well as open-source projects such as DANOS and OpenSwitch.

SONiC

Of the various open-source options SONiC has become one of the most popular, with significant industry support. Initially developed by Microsoft to power their Azure cloud service, it has since been open-sourced and become part of the OCP Networking Project, with software development stewarded by the Linux Foundation

It leverages the Switch Abstraction Interface (SAI) also defined by OCP, to communicate with switching silicon. Significantly Broadcom has contributed a lot to its development, providing an SAI implementation for their ASICs and also committing to continued support for future silicon they develop.

Architecture

SONiC is based on Debian Linux, with the SAI added to provide an interface to the switch hardware. This makes it very easy to get to grips with for SREs who are already familiar with Debian. It is a modular distro in which networking applications (e.g., FRR, LLDP, LACP, NAT etc) run independently in dedicated Docker containers, each of which use Redis as an information source to share configuration and state info.

The modular Linux-based nature make it easy for new applications to be developed or added to the platform, as well as for common Linux automation tooling to be leveraged. It can, for instance, run a standard puppet agent installed from upstream Debian repos, or a Prometheus Node Exporter. It ships with various containerized daemons to provide functionality, most notably employing FRRouting for routing protocols such as OSPF/BGP. While each of these sub-components have their own configuration files and syntax, and various YANG models are defined for specific configuration elements, there is inconsistent coverage between the various ways to configure devices. More recently the Management Framework has been introduced to provide a unified way to configure all these elements. It offers an "industry standard" (i.e. Cisco-like) CLI, as well as REST and gRPC endpoints for the current set of supported YANG models.

It supports a dedicated mnagement VRF for connecting a device's management-only network interface. SSH is supported as one would expect, and it supports the standard SNMP MIBs any other Debian system would. Redis is the ultimate store of the full configuration for all elements, and the DB is written to /etc/sonic/config_db. json for persistence. Network state is synced to the Linux kernel, so standard Linux command line interfaces such as iproute2 can be used to view state. Using such tools to modify state is highly discouraged.

SONiC Support

SONiC's open source nature is in stark contrast with the more traditional network operating systems, which are provided with hardware and software support from the vendor. If you are running SONiC there is no TAC to contact to get assistance if something does not work as expected, or assistance is needed. Certainly if there is a hardware fault with a device you can go back to the HW vendor for replacement, but outside that users are on their own.

Unlike perhaps the situation with server/x86 based platforms, there is a fairly small install-base of SONiC users. This means community support is limited. Many SONiC users, like Microsoft, LinkedIn or Ali Baba, operate at massive scale, contribute to SONiC themselves, and have staff internally who can provide support, bug fixes, diagnostics etc. For smaller enterprises, however, the lack of any support or sufficient internal resources to deal with problems can be challenging. Smaller outfits often also require more or different features than the web-scalers, which SONiC lacked in the early days.

This situation has made some smaller enterprises wary of moving to SONiC, forgoing the support they're accustomed to from their existing vendors. While Juniper support has not been stellar in recent years, SRE netops are broadly of the opinion that moving to a completely new platform without any OS support would represent an unacceptable risk.

Dell Enterprise SONiC

Dell have been producing network devices for several years now. Anecdotally it is common to hear less than favorable opinions from network engineers about the Force-10 OS they ship with these. So perhaps it is not surprising that Dell have decided to offer SONiC as an OS option for some of their switches, and bridge the support and feature gap to make it more attractive to small and medium sized enterprises.

Dell Enterprise Sonic is the result. This initiative has seen them become one of the largest contributors to the SONiC project over the past few years. They offer two variants of the OS, standard and premium. Standard is their build of the upstream open-source project, built and released on a regular schedule. It may contain Dell contributions not yet merged into the upstream project, but does not contain any closed source elements. The premium variant offers more rich analytics and features, such as Mirror on Drop and Inband Flow Analysis. It may also contain closed source features that won't be upstreamed to the open source release.

In terms of WMF requirements and longer-term direction the standard build covers our needs. Each version is available in either a "cloud bundle" or "enterprise bundle". The enterprise bundle is required by WMF, supporting VXLAN/EVPN which is not available in the cloud offering.

Dell Network Switches

Dell Enterprise Sonic runs on only a small subset of the network devices they produce, namely those based on the Broadcom Trident 3 ASIC (similar to Juniper QFX5120 series).

Initially spurred by a desire to explore more open networking platforms, and then by concerns about cost and lead-time for Juniper equipment, SRE Netops arranged with Dell to get some network devices on test. Specifically they delivered 2 of each of these models:

Model Description Juniper Equivalent
Dell S5248F-ON 48xSFP28 + 6xQSFP28 Top-of-Rack / Leaf Switch QFX5120-48Y
Dell S5232F-ON 32xQSFP28 Aggregation / Spine Switch QFX5120-32C

Lab Setup

These were set up in codfw in a basic Spine/Leaf topology, with all links enabled for IP running BGP to exchange routes (underlay) and running iBGP over the top for the EVPN SAFI. More details at SONiC.

EVPN was used to allow for stretched layer-2 segments to be created, similar to the design for the 2022 Eqiad expansion. This is the only supported mechanism for stretched layer-2 segments using SONiC, other than a basic Spanning-Tree/trunking configuration, which is not suitable for a variety of reasons. Similar to the Eqiad EVPN setup running EVPN/VXLAN for L2 extension necessitates the use of overlay VRFs for L3 networking. A single VRF was created for the testing, to which all external L3 connections were terminated.

Test Criteria

Various tests were carried out to validate the data-plane functions required of our top-of-rack and aggregation switches worked as expected on the Dell platform. Tests were done to validate the devices could support both our legacy "row-wide/L2" topology, as well as the newer "per-rack/L3 ToR" designs (as seen in Drmrs and the Eqiad Expansion).

The main elements that were tested are shown in the below table. All functions were validated locally, i.e. between ports on a single device, as well as across the "fabric" between ports connected to different switches.

Title Description
Transceiver Support Test that fs.com optic modules we commonly used are supported and work as expected.
L2 Segmentation Ability to define Vlans and place ports into them to create virtual L2 segments.
L2 Switching Ability for end hosts to exchange traffic directly at layer 2, when connected to the same Vlan. Correct learning of MAC addresses and distribution to remote devices / addition to remote device MAC forwarding tables. Correct forwarding of frames for broadcast, unknown or multicast destinations. Failover works as expected if links go down. Jumbo frame support.
L3 Routing Basic routing in the overlay VRF, i.e. reachability to directly connected networks works ok, routes are correctly propagated to all devices, failover works as expected. All tests validated for both IPv4 and IPv6.
eBGP BGP peering as required to external elements (i.e. CR routers, end-hosts running BGP for load-balancing, anycast etc.). Correct propagation of externally learnt routes to all devices in EVPN fabric. BFD support in VRF.
Anycast Gateway Use of a distributed anycast-gateway to provide a local IP first-hop on every edge device in a stretched L2 Vlan
Required Services Validate various functions work as needed, DHCP Option 82 insertion, DHCP relay, IPv6 RA generation, SNMP, SSH, User account creation.

Test Results

Detailed test results and documentation can be viewed.

In general all required functionality was supported and tests successful. Some minor elements didn't function exactly as we'd expect, but all are very minor, certainly not "show stoppers".

Item Description
DHCP Option 82 The system supports the insertion of DHCP Option 82 information into DHCP requests sourced by end hosts, and will include the source port and switch hostname, which is the info we require. The format is slightly different to that the Juniper QFX send, but we can change our DHCPd config on install hosts to accomodate. Medium-term we will likely move away from dependency on top-of-rack switches inserting this information for reimaging hosts, and work towards using DHCP Option 97 information which the hosts themselves include in requests.
IPv6 Router Advertisements This functionality is supported in FRRouting, included in SONiC, however SONiC's data model for YANG or CLI configuration does not include it. So it can only be configured from the FRR "vtysh" shell, outside the normal configuration framework. Dell have committed to adding this to the regular command line / yang model in an upcoming release.
IPv6 Link-Local used for ICMP Messages The devices default to using an interfaces link-local IPv6 address when sourcing ICMPv6 messages. This didn't cause any actual issue other than traceroutes showing the link-local IPs, which isn't much use.

Conclusions

Overall Dell Enterprise SONiC worked very well, and we did not encounter any significant problems that would cause us to rule out the platform. In general the Linux base made it easy to navigate and get to grips with, and the CLI and configuration was straightforward and easy to use.

Pros

  • System works well and is easy to get working.
  • Dell are keen to make the product a success, and seem keen to provide hands-on assistance / support.
    • That could of course change if it does get more traction, and support is passed to more junior staff.
  • Experience / confidence gained by using the Dell-supported version might allow us to eventually transition to using the purely open-source release, eliminating software and support costs.
  • Debian base makes it easy to integrate into our overall stack, opens up new possibilities.
  • Dell are the only vendor who have given us short lead-times for datacenter switches in 2022.
  • Broadcom Trident 3 is same hardware as in current-gen QFX series, so we can expect similar performance.
  • Familiar with Dell procurement, RMA process etc. with server hardware, could leverage that experience for switches too.

Cons

  • OS lacks "frills" and "nerd knobs" to configure the same variety of features as Juniper.
    • This can be viewed as a positive in that the code base is smaller, and thus potentially less bugs.
    • What we need right now is covered, so no massive problem.
  • Relatively small installed base and newness of the OS may mean there are lots of unfound bugs.
    • JunOS by contrast is a lot more mature and has a massive install base.
    • There is much more documentation and community resources available for JunOS.
  • Introducing a new vendor for datacenter switches, different to that used for edge routers, firewalls etc., fragments our infrastructure resulting in higher management overhead for SRE.
  • While Dell and others have done a good job creating a single management interface for the platform, the reality of the multiple underlying components operating independently is still not completely masked.
  • There appear to be no barriers to automating the platform, but re-writing Homer to support Netconf/YANG style data models, and add a new transport module to support their API, will take a considerable amount of effort.
  • Lack of familiarity compared to JunOS
  • No SNMP MIB support for device-specific / environmental data. So cannot easily integrate with LibreNMS for that.
    • Can export to Prometheus relatively easy however.

Costs

Given the different sales models by different vendors, for both hardware, software and support, it is not always possible to do a direct comparison between vendors. The below sheet does give a break-down based on Juniper list prices, recent Juniper quotes, and quotes from Dell (including discount) for equipment.

https://docs.google.com/spreadsheets/d/1OWPPyrpXvfTSqavaw0_uxafwNRyiMxpvy_1GUQUPE1w

The TL;DR is that the costs are quite close on most comparisons. It may work out slightly cheaper in some cases to go with Dell, but either way there are not massive savings for the foundation in switching.

Note on Netconf/YANG

While the work required to uplift Homer to use a Netconf-based approach is not inconsiderable, in the longer term this is probably advantageous, whether we stick with Juniper or not.

It would allow us to more easily move between vendors. Even if we define configuration based on proprietary YANG models (for instance for Juniper or SONiC), having the same mechanism to build configurations and interact with devices makes migration a lot simpler. Further there may be scope to use vendor-neutral configuration models, such as those defined by the IETF or OpenConfig initiative, if they provide sufficient coverage in terms of features.

So it's worthwhile to uplift Homer to support Netconf anyway, and we should aim to carry out that work regardless. Of course there is less urgency to move away from CLI templates if we stick with Juniper in the short term.

Verdict

Overall, despite the good experience with Dell Enterprise SONiC, netops' preference is to stick with Juniper QFX series. Reasons include:

  • Familiarity and real-world production experience gives higher confidence than any lab-testing could.
  • JunOS overall seems a more mature and feature-rich system.
  • Both options come in at a similar price.
  • Having the same OS/config across edge routers and datacenter switches results in lower management overhead / SRE time.
  • Homer is already built to support JunOS and we have templates for the platform already, adapting for SONiC/Netconf is a big project.
  • Confidence that Juniper will remain in the datacenter market and supply JunOS into the future.
    • While Dell seem committed to SONiC, how long that continues is less certain, may depend on its success.
  • Given new-ness of Dell SONiC there is some fear adopting it would make us beta-testers for them.
  • DC-Ops estimate that we have enough existing capacity in Eqiad/Codfw to wait for delivery of Juniper QFX devices in 2023.
    • Short-term capacity concerns are not forcing us to select a vendor with faster lead times.
      • It should be noted that this is something of an estimate and may change

That said the exercise has given us reasonable confidence in Dell's offering, and should we need to move to another platform for any reason it can be considered a viable option. Long-term it may provide a stepping-stone to a more open-source model in line with the foundation's ethos.

The recommendation to stick with Juniper is largely down to being risk-adverse, and JunOS being a solid platform which we are relatively happy with. As opposed to there being any glaring deficiencies in the Dell product.