Quality of Service (Network)
Network QoS
Quality of Service is a term in networking which refers to a number of techniques that can be used to profile and, in some circumstances, prioritize particular traffic flows over others.
As communications systems moved to packet-switching, and ultimately all-IP networks carrying multiple types of traffic which at once stage would have each had their own dedicated physical channels (i.e. voice, video, storage, control), it has become and important consideration for network operators.
QoS configuration on network devices define how packets should be queued in internal buffers, and scheduled for transmission on the wire. It is important to note that for the most part QoS rules have no significant effect. They only get used when a device or link is under strain, and lacks sufficient bandwidth for all the data that wants to use it. A good way to think about QoS configuration is telling routers "what traffic to drop" when it *has* to drop something. Viewed in this way it's clear that the better solution is to build and operate networks that don't drop any packets. The simplest (and usually cheapest) way to deal with congestion is to provision more bandwidth.
Configuring QoS still makes sense, however, to deal with exceptional circumstances that might arise from time to time. That could be due to irregular traffic flows (sudden changes in application behavior, fault scenarios that reduce capacity, or potentially even denial-of-service attacks). In such scenarios where packet loss cannot be avoided, it helps if the network can make intelligent decisions about what traffic is least important.
QoS in Wikimedia
Traditionally, WMF network devices have no specific QoS configuration applied. All traffic is considered "best effort", and during congestion all traffic flows are equally liable to suffer from drops. By and large this has worked well (TCP and other higher layer protocols help to balance flows). One element that has helped is the relatively low speed that servers are connected at (typically 1G), which acts as a natural limit to how much traffic a single server can send. As we are now connecting more servers at 10G and even 25G, there is increased potential for a handful of servers to generate traffic flows that swamp the core network.
As discussed, the best way to accommodate such flows is to make sure we have sufficient bandwidth throughout the network. But there is something of a chicken-and-egg element here. It doesn't make sense to deploy a lot of additional bandwidth in case applications come along that require it. Likewise it represents a risk to deploy a lot of high-bandwidth servers, with potential to generate significant traffic, knowing that there are bottlenecks and high contention at certain points in the network.
To address this gap SRE is rolling our QoS configuration to our network devices. The goal is to allow us to connect servers at higher speeds, supporting continued growth and consolidation of compute and storage, while remaining confident that mission-critical services won't be starved of bandwidth.
QoS Classes
The first requirement when implementing a QoS framework is deciding how many traffic classes should be created. Obviously the more classes one has the more finely-grained policies about what to do can be, however it comes at the cost of complexity. Netops are of the opinion that a relatively simple approach, with just a few classes, is the best option. A smaller number of classes also ensures better hardware compatibility as the number of available queues varies by platform.
The following classes will be defined on network devices:
| Class name | DSCP Marking | DSCP Decimal | DSCP Bits | Scheduling BW% | Description |
|---|---|---|---|---|---|
| Control | CS6 | 48 | 110000 | 5% | Network control traffic (i.e. routing protocols), and critical management services. |
| High | AF21 | 18 | 010010 | 35% | High priority traffic |
| Normal | DE | 0 | 000000 | 50% | Default priority - same as existing single class |
| Low | AF41 | 34 | 100010 | 10% | Low priority "scavenger" class |
- Code-point AF11 is reserved for possible future use if a 'higher than high' traffic priority is deemed necessary.
- Above codepoint names are from the Diffserv standard. DE is also commonly refereed to as CS0, BE (Best Effort) and DF (DeFault). This table is a good reference to the various possible markings.
Scheduling Bandwidth
"Scheduling bandwidth" represents the minimum percentage of available link bandwidth that will be dedicated to a class when a link is under saturation. In our setup the "high priority" queue will get 35% of available bandwidth in such a scenario, despite the fact that only a small minority of all application flows will be mapped to it. The majority of our traffic will remain classified as "normal", and contend for the 50% of bandwidth available to it. Finally the "low" priority class gets the remaining 10%, to keep some data flowing within it, but it will suffer most due to the congestion.
All classes will be served by a weighted round-robin scheduler based on their defined scheduling bandwidth. No "expedited" (priority/strict) class is defined, meaning no queue will be configured such that it will be served immediately if any packets arrive on it. Such priority queuing is commonly used for real-time voice and video applications, where absolutely lowest latency and jitter (std. deviation) is required. While we may have high-priority flows, they are data flows and not real-time communications, so standard, non-expedited queuing is preferable.
It is also worth noting that the percentages simply reflect the scheduling priority. When links are not saturated, any class, including 'low', can use 100% of the available bandwidth.
DSCP Marking
Trusted vs Untrusted Interfaces
Any QoS design is a network-wide undertaking. A key concept involved is the idea of "trusted" and "untrusted" interfaces.
The basic idea here is that where traffic arrives in from an external source you can't "trust" the TOS/DSCP marking in the IP header. On these interfaces you need to:
- Map traffic to forwarding classes based on some criteria other than the DSCP bits in the header.
- Write the DSCP bits in the header to those you are using elsewhere on the network to represent that traffic class.
In the Wikimedia setup external internet-facing interfaces are clearly "untrusted" based on that. Server-facing interfaces on our switches are on the other hand considered "trusted", as we are going to control and set DSCP bits egress from our servers using netfilter. Extending the metaphor slightly we don't "trust" any DSCP bits third-party software might set "out of the box" on our servers. All traffic will therefore be marked as DE, classifying it as normal priority, unless specific rules are added in nftables/iptables to classify the flow differently.
DSCP Marking
The plan in Wikimedia is to set DSCP values on end servers, leveraging our existing iptables (Ferm) / nftables configuration frameworks. The network devices will be configured to trust the incoming values set on servers, and queue packets accordingly. Puppet will be used to drive the end-host configuration for packet marking.
Various schemes have been proposed for the use of the TOS/DSCP fields over the years, but ultimately there is no universal standard and these markings are generally ignored or rewritten across the internet. This means they only have significance internally for any organization, and merely serve to identify traffic classes based on local policy. As such any markings work as good as any other, as long as all devices are configured the same. That said, while strict adherence to any marking scheme is not a technical requirement, we try to conform to standards as much as possible. As such we use the code points / markings defined in RFC2597, but we map traffic into the classes according to our own needs, as we have our own unique mix of traffic types and applications, which don't necessarily correspond to the examples in the RFC or by specific network vendors.
Traffic Classes
As shown in the table, four classes of traffic are defined, which are detailed further below.
Management & Control
This class is used for management and control plane traffic. It is vital, in the presence of congestion, that such traffic is prioritized to ensure that devices remain reachable via SSH, monitoring continues to work, and router to router control plane (i.e. OSPF, BGP etc.) traffic is served. This ensures that the basic connectivity to keep the network running and enable engineers to connect to end systems is reliable even in fault scenarios.
High Priority
This class will be used for high-priority application flows as required. It has less scheduling bandwidth than the 'normal' class, but much fewer traffic flows are expected to be mapped into it, giving them a relatively higher weighting. Exactly what traffic should be mapped into it needs to be carefully considered, and discussed with the SRE teams responsible for the relevant applications. Typically only low-throughput, sensitive traffic flows should be mapped to this class. High-throughput bulk data transfers should not be mapped to this class.
While it might look attractive for any given flow to be declared 'high priority', it is easy to negate the usefulness of the category if too much many things are mapped to it (i.e. if everything is important, nothing is).
Normal Priority
This is the standard class into which all normal application flows are mapped. It can be thought of as the equivalent of our existing, single traffic class across the network. With the possible exception of some management/control traffic, the base server configuration will map all traffic into this class.
Low Priority
This is a "scavenger" class that can be used to map flows that have below-normal priority. Similar to the 'high priority' class we need to carefully consider what should be mapped into it. Unlike the 'high priority' class there is no real danger (on the network side), of marking too much traffic as low priority. So we can be a little less careful when deciding what goes into this class.
Teams are unlikely to deem their applications low priority, but it can be useful for them. A good example of what to place in this class is bulk data transfers, such as backups or storage replication traffic. Such traffic will often use "as much bandwidth as it can get". Mapping such flows to the low priority class allows them to use the entire network bandwidth when available, avoiding any need to place a hard upper limit on what they can use.
This class might also be useful to manage bandwidth at our internet edge, if we are in a situation where external links are being saturated due to a surge in requests. If we deem that certain requests are less important (scraping for instance), and can identify and mark the responses to them at the application layer, the network can be left to decide whether to transmit them or not based on available edge bandwidth. This may be more flexible than imposing fixed rate limits or totally blocking such requests.
Puppet
To enable us to place traffic into QoS classes the existing puppet resources for firewall::service and firewall::client have been extended to allow for a QoS classification to be added. In both cases a new optional parameter, 'qos' is available. This can be set to low/normal/high as needed to mark matching packets with the correct DSCP values so the network will place them in the right forwarding-class.
For example if we wished to place redis replication traffic in the low priority / scavenger class we could add the qos parameter to the existing firewall::service definition:
firewall::service { 'redis_replication':
proto => 'tcp',
port => $redis_port,
srange => $redis_replicas,
qos => 'low',
}
We need to be mindful, however, that the service definition only applies to the machine running the service. And thus the above addition would only map reply traffic that server sends from the $redis_port to clients. For correct operation we should ensure both sides of conversations map to the same class, so we would also need a firewall::client resource added on hosts which make outbound connections to $redis_port. For instance:
firewall::client { 'redis_replication':
proto => 'tcp',
port => $redis_port,
drange => $redis_replicas,
skip_output_chain => true,
qos => 'low',
}
It should be noted that many firewall::service definitions already exist in puppet. These are needed as our default firewall policy for packets in the INPUT chain is DROP. So to allow traffic in for any service (redis, ssh, http etc) we need to explicitly permit it in a firewall::service definition. Because our default policy for the OUTPUT chain is ACCEPT, the same is not typically true for clients making requests.
This means that while it should be very easy to add the 'qos' parameter to existing firewall::service definitions, we also need to add completely new firewall::client resource definitions in any roles that need them, to ensure we mark traffic on both sides of the connection.
More Complex Configurations
In some cases the above simple model won't be sufficient to classify a particular type of traffic. For instance if we need to match on more criteria than just the UDP/TCP ports of a service. In these cases we can use the generic `nftables::rules` resource to add additional rules as needed. For instance:
nftables::rules { 'scp-sftp':
desc => 'SSH based file transfers in low-prio scavenger class',
chain => 'postrouting',
rules => ['tcp sport 22 ip dscp 0x02 ip dscp set af41 return',
'tcp dport 22 ip dscp 0x02 ip dscp set af41 return',
'tcp sport 22 ip6 dscp 0x02 ip6 dscp set af41 return',
'tcp dport 22 ip6 dscp 0x02 ip6 dscp set af41 return'],
}
The above four rules match TCP traffic, for both IPv4 and IPv6, with either source or destination port 22, which already has a DSCP marking of '0x02' (OpenSSH uses this marking for 'high throughput' sub-systems like sftp and scp). It then sets the DSCP bits to 'af41' which would mark it into our 'low' priority class.
If a given system is not yet migrated to nftables we can add similar rules using the `ferm::rule` resource. Things are a little more complex here as we need to include the match criteria for the 'return' statement also:
ferm::rule { 'dscp-icmp-mon':
table => 'mangle',
chain => 'POSTROUTING',
rule => 'proto tcp sport ssh mod dscp dscp 0x02 DSCP set-dscp-class AF41; proto tcp sport ssh mod dscp dscp-class AF41 RETURN;'
}
Again netops are available to work with teams on creating the most appropriate rules for a given service.
Guidelines for SREs
In general the 'high' priority class should be used for low-bandwidth, latency sensitive, important traffic.
High bandwidth, bulk traffic, such as file transfers, backups, bulk data sync etc. is not suitable for being marked as 'high priority'. This traffic is indeed important, but it is too voluminous to have in the high priority queue if there is a network issue and we end up with congestion.
Such flows are in fact good candidates for the low/scavenger class, ensuring they can use as much bandwidth as available but not cause problems in exceptional circumstances. The fundamental role of QoS is to "keep the lights on" at those times, ensuring our own control traffic and high priority application flows are least affected.
Juniper Class-of-Service
QoS is implement using the "class of service" configuration in JunOS (a slightly older industry term for the same thing). Most network vendors have similar functions available, and as we adopt them we will need to create configuration for those that has the same results. The relatively simple design should help here.
The Juniper CoS framework, like most implementations, can be complex, involving multiple related configuration elements. Within that context we have done as much as possible to keep the configuration simple and understandable.
The overall Juniper framework is outlined below.
Queues and Forwarding-Classes
At the base level the hardware places packets in memory (buffer) structures as they arrive into a system. These structures are dynamically partitioned into multiple numbered queues, and packets are picked from these queues to be transmitted, or dropped, based on a scheduling configuration.
Juniper introduce an abstraction called "Forwarding Classes", and give us control over what packets get placed into what forwarding class through the use of classifiers. In turn we define what forwarding classes map to what actual system queues. In theory this allows multiple forwarding classes to map to a single queue, however in our configuration we maintain a 1:1 mapping of forwarding class to queue. In other words 1 queue is used for all the traffic from a given forwarding class, and no queue has more than 1 forwarding class using it.
We defined 4 forwarding classes in the Juniper configuration and map them to queues as follows:
| Forwarding-Class | Queue (MX Routers) | Queue (QFX/EX Switches) |
|---|---|---|
| LOW | 1 | 3 |
| NORMAL | 0 | 0 |
| HIGH | 2 | 4 |
| CONTROL | 3 | 7 |
The queue numbers used match the defaults that are pre-created on the different platforms, even though we rename them. For instance, these are the default classes and queues on a QFX5120:
cmooney@ssw1-d1-codfw> show class-of-service forwarding-class
Forwarding class ID Queue Policing priority No-Loss PFC priority
best-effort 0 0 normal Disabled
fcoe 1 3 normal Enabled
no-loss 2 4 normal Enabled
network-control 3 7 normal Disabled
mcast 8 8 normal Disabled
We reconfigure as follows with our configuration:
cmooney@ssw1-d8-codfw> show class-of-service forwarding-class
Forwarding class ID Queue Policing priority No-Loss PFC priority
normal 0 0 normal Disabled
low 1 3 normal Disabled
high 2 4 normal Disabled
control 3 7 normal Disabled
mcast 8 8 normal Disabled
The system will continues to place traffic it generates into queue 0 (normal) and queue 7 (control) as it was before, so re-using those queue numbers maintains the same priority for that traffic on egress.
Classifiers
Packets arriving in to a device need to be mapped to a given forwarding class for transmission. This function is known as classification. In general JunOS provides 3 ways to do this:
| Method | Description |
|---|---|
| Default Classifier | A default classifier maps all incoming traffic on a given interface to a single forwarding class. |
| DSCP Classifier | A DSCP classifier can be used to map packets to forwarding classes based on the value of the DSCP bits in each packet on arrival. This is used in the Wikimedia design on our "trusted" interfaces where we know the DSCP markings have already been set by our policy. |
| Firewall Filter | For the maximum flexibility the firewall filter / ACL functions on a device can be used to map traffic to forwarding classes. This can be used to map packets based on the typical criteria (src/dst networks, port numbers, protocol) in any ACL. JunOS also allows for the re-writing of DSCP bits in a firewall action. This is used in the Wikimedia design on our external, internet-facing interfaces. |
DSCP Classifiers
DSCP classifiers can be configured for either IPv4 or IPv6. Rather confusingly on some platforms both can be defined and applied to an interface, whereas on others you can only configure one or other type to an interface. Where it is not possible to add both an IPv4 and IPv6 classifier to an interface an IPv4 classifier should be used, and the rules it defines will be applied to both.
In either case we classify traffic into the same forwarding class based on the same DSCP bits regardless of the address family. We define a classifier of both kinds on all devices, however, as in some cases we need to configure both:
classifiers {
dscp v4_classifier {
forwarding-class control {
loss-priority low code-points 110000;
}
forwarding-class high {
loss-priority low code-points 010010;
}
forwarding-class low {
loss-priority high code-points 100010;
}
forwarding-class normal {
loss-priority high code-points [ 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 010000 010001 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 100000 100001 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 ];
}
}
dscp-ipv6 v6_classifier {
forwarding-class control {
loss-priority low code-points 110000;
}
forwarding-class high {
loss-priority low code-points 010010;
}
forwarding-class low {
loss-priority high code-points 100010;
}
forwarding-class normal {
loss-priority high code-points [ 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 010000 010001 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 100000 100001 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 ];
}
}
}
Interface Classifier Configuration
As discussed we need to apply one or both of these classifiers to interfaces in slightly different ways depending on the type of interface and platform. All are applied in the 'class-of-service interface {' context. The table below lists the way they need to be defined.
| Platform | Interface Type | Classifier Config | Classifier location |
|---|---|---|---|
| QFX5100 and EX4300 Series Switches | L2 port facing server | Only IPv4 DSCP classifier | Unit 0 of L2 port |
| QFX5100 and EX4300 Series Switches | Routed or Routed Sub-int Parent | Only IPv4 DSCP classifier | Physical int itself rather than any particular unit |
| QFX5120 Series Switches | L2 port facing server | IPv4 and IPv6 DSCP classifiers | Unit 0 of L2 port |
| QFX5120 Series Switches | Routed or Routed Sub-int Parent | Only IPv4 DSCP classifier | Physical int itself rather than any particular unit |
| MX Series Routers | Routed L3 interface with no vlan encap / sub-interfaces | IPv4 and IPv6 DSCP classifiers | Unit 0 of routed port |
| MX Series Routers | Routed sub-interface | IPv4 and IPv6 DSCP classifiers | Configured for every unit under the parent interface |
- For LAG/AE interfaces, the config is added for the AE interface ONLY, not the member ports.
Schedulers
A scheduler is a config definition which defines scheduling parameters to be applied to certain traffic on egress from the system. In our design we configure a single scheduler for each of our 4 defined forwarding-classes, using a simple 1:1 mapping. Each scheduler gets a "transmit-rate" configured, as well as a buffer-size (both defined as percentages). These schedulers ultimately implement the weighting described above under Scheduling Bandwidth.
The default schedulers to be configured on all our Juniper devices are as shown, 4 are configured matching the 4 forwarding classes we use:
schedulers {
sched_control {
transmit-rate percent 5;
buffer-size percent 5;
}
sched_high {
transmit-rate percent 35;
buffer-size percent 35;
}
sched_low {
transmit-rate percent 10;
buffer-size percent 10;
}
sched_normal {
transmit-rate percent 50;
buffer-size percent 50;
}
}
It should be noted that these represent the minimum allocation for bandwidth and buffer space each forwarding class gets allocated under congestion. Any class can utilize up to 100% of the available bandwidth or buffer if there are no packets from other classes that also need it.
Scheduler Maps
Each defined scheduler is an independent element which does nothing on its own. Scheduler-maps are used to bring everything together. A scheduler map references one or more forwarding-classes, and associates each with a previously defined scheduler. Scheduler maps are then associated with interfaces under the 'class-of-service' config to define the egress QoS behaviour for a particular port.
In the Wikimedia design only one scheduler map is defined, which is applied to all interfaces. Scheduler maps only relate to outbound traffic, so they get applied equally to trusted and untrusted interfaces. The default scheduler-map in the config is as follows:
scheduler-maps {
wmf_map {
forwarding-class control scheduler sched_control;
forwarding-class high scheduler sched_high;
forwarding-class low scheduler sched_low;
forwarding-class normal scheduler sched_normal;
}
}
Traffic Control Profile
On certain platforms, specifically Trident2 based like QFX5100 and EX4300, we need to use a "traffic control profile" to attach a scheduler-map to an interface (see T373594). This is defined as follows under 'class-of-service':
traffic-control-profiles {
wmf_tc_profile {
/* Trident 2 devices T373594 */
scheduler-map wmf_map;
guaranteed-rate percent 100;
}
}
Outbound Shapers
As seen in the previous section our schedulers use percentages to define minimum bandwidth for forwarding classes. On a given interface JunOS will calculate those rates as a percentage of its line rate (i.e. 1/10/25/40/100G etc).
For the most part that is desirable, however occasionally we utilize sub-rated Ethernet services as transport links between our sites. In these cases, where the actual bandwidth available to us over a link is lower than the line rate, we need to configure an outbound shaper to controll the maximum rate. This is configured for an interface using the 'shaping-rate' command under the interface in the 'class-of-service' config.
Interface Config
Bringing it all together below you can see the class-of-service configuration for various types of interface.
Routed L3 port on an MX router connecting a sub-rated transport circuit:
class-of-service {
interfaces {
xe-0/1/0 {
scheduler-map wmf_map;
shaping-rate 3920000;
unit 0 {
classifiers {
dscp v4_classifier;
dscp-ipv6 v6_classifier;
}
}
}
}
}
Routed ae port on an MX router facing L2 switch stack:
class-of-service {
interfaces {
ae1 {
scheduler-map wmf_map;
unit 402 {
classifiers {
dscp v4_classifier;
dscp-ipv6 v6_classifier;
}
}
unit 510 {
classifiers {
dscp v4_classifier;
dscp-ipv6 v6_classifier;
}
}
unit 520 {
classifiers {
dscp v4_classifier;
dscp-ipv6 v6_classifier;
}
}
unit 530 {
classifiers {
dscp v4_classifier;
dscp-ipv6 v6_classifier;
}
}
}
}
}
L2 port connecting to a server on a QFX5100 or EX switch:
class-of-service {
interfaces {
ge-1/0/13 {
# Scheduler has to be applied via 'traffic-control-profile' on these models
forwarding-class-set {
wmf_classes {
output-traffic-control-profile wmf_tc_profile;
}
}
unit 0 {
classifiers {
dscp v4_classifier;
}
}
}
}
}
L2 port connecting to a server on a QFX5120 switch:
class-of-service {
interfaces {
xe-0/0/1 {
scheduler-map wmf_map;
unit 0 {
classifiers {
dscp v4_classifier;
dscp-ipv6 v6_classifier;
}
}
}
}
}
There are other variants as described in the table previously but the above should give a taste of the configuration.
Troubleshooting
Classification
For inbound traffic on a network device interface the main thing that should be confirmed is that packets are being correctly classified (usually based on DSCP) and placed into the correct forwarding queues.
Unfortunately this behaviour is difficult to see from either the Juniper CLI or the exposed statistics. What we can do is confirm the configuration is in place, and there is no reason traffic should not be correctly mapped based on DSCP when it is, but we don't get counters of ingress packets mapped to the different forwarding classes.
To check the config is applied ok:
cmooney@ssw1-d8-codfw> show class-of-service interface et-0/0/31
Physical interface: et-0/0/31, Index: 653
Maximum usable queues: 10, Queues in use: 5
Exclude aggregate overhead bytes: disabled
Logical interface aggregate statistics: disabled
Scheduler map: wmf_map, Index: 9219
Congestion-notification: Disabled
Logical interface: et-0/0/31.0, Index: 809
Object Name Type Index
Classifier v4_classifier dscp 65033
Classifier v6_classifier dscp-ipv6 32393
Classifier ieee8021p-default ieee8021p 11
You will either see both a "v4_classifier" and a "v6_classifier" configured, or just a "v4_classifier". The config for a given port should be set as described in #Interface_Classifier_Configuration
Outbound Drops
The purpose of the QoS configuration is to give the device direction on what packets to drop when congestion occurs. This outbound scheduling is controlled by the 'scheduler-map' that we apply to physical interfaces. In general we have the same config everywhere:
set class-of-service interfaces et-0/0/31 scheduler-map wmf_map
With this in place we should see the 4 forwarding-classes it maps when we run show interface <int> detail, and the queues they are using:
cmooney@ssw1-d8-codfw> show interfaces et-0/0/8 detail
Physical interface: et-0/0/8, Enabled, Physical link is Up
Description: Core: lsw1-d1-codfw:et-0/0/54 {#230403800017}
<--- other output cut --->
Egress queues: 10 supported, 5 in use
Queue counters: Queued packets Transmitted packets Dropped packets
0 20 20 0
3 0 0 0
4 0 0 0
7 985 985 0
8 0 0 0
Queue number: Mapped forwarding classes
0 normal
3 low
4 high
7 control
8 mcast
Ideally we would see no "dropped packets" but if it does happen we will see there. The outbound packets transmitted and dropped in each forwarding class are also exported to Prometheus for the devices enbaled for gNMI telemetry. Per-interface graphs of these are available here:
https://grafana.wikimedia.org/d/5p97dAASz/cathal-network-queue-stats
We also have a dashboard which giving an overview of all drops: