Netflow/sflow

Goal

Gather network level (Layer 4) traffic flows metadata to assist with traffic engineering and DoS mitigation.

We first started with the "netflow" pipeline.

The sflow pipeline got added later on after requests from SREs for better visibility on internal flows. It was decided to keep it separated:

to not impact the primary (netflow) pipeline which is real-time sensitive and critical to SRE's days to day work
as devices functionally different (switches/routers) historically supported different protocols (the gap got narrowed only very recently)

On the routers side:

1 out of 1000 flows crossing the routers' external interfaces (both inbound and outbound) gets its metadata sent to a configured collector once the flow timeout is reached (here 10s)
- Example metadata are: source/dest IP/port/AS#, IP protocol, TCP flag...
The routers share their full BGP view with the collector

On the collectors side:

Samplicator duplicates the IPFIX (netflow) packets to Fastnetmon and nfacct, while spoofing the source IP (so they still seem to come from the routers)
Nfacct extrapolates the flow size and packets based on the sampling rate (eg. do *1000)
Nfacct uses a prefix list (exported from Puppet) to enrich the collected flows with traffic direction
Nfacct uses the BGP data provided by the routers to enrich the collected flows metadata (adds peer src/dst AS#, AS path, src/dst AS#)
Nfacct uses an IP to location database to enrich the collected flows metadata (adds source and destination country)
Nfacct exports the enriched flow data to Druid via Kafka
Fastnetmon monitors inbound traffic for both known attack patterns and traffic level threshold and if any condition is met:
- sends a notification email including a traffic signature if able
- Triggers our monitoring system

On the switches side:

On L3 switches only as older switches don't support sending sflow data over their management interface
1:1000 sampling is configured on all server facing ports in the server->switch direction (ingress) to prevent double accounting (inbound on one port, outbound on the other)
Packets exit through the data plane to not risk overwhelm the management plane (or management network)

On the collectors side:

Sfacct extrapolates the flow size and packets based on the sampling rate (eg. do *1000)
Sfacct uses a prefix list (exported from Puppet) to enrich the collected flows with their scope (eg. if source and destination IPs are Wikimedia's range, it's an internal flow)
Sfacct exports the flow data tagged as internal to Druid via Kafka

Collectors:

The network device side is provisioned automatically with Homer. Except:

On switches, add the following route if the sampling needs to use a different routing-instance: set routing-options static route [collector]/32 next-table PRODUCTION.inet.0
- https://kb.juniper.net/InfoCenter/index?page=content&id=KB35372&actp=RSS

$ kcat -b kafka-jumbo1001.eqiad.wmnet -t netflow -C -o end

$ kcat -C -u -b kafka-jumbo1001.eqiad.wmnet:9092 -t network_flows_internal -o end | grep --line-buffered XXX | jq .

$ fastnetmon_client

Both Pmacct and Fastnetmon log to syslog, grep for nfacctd , sfacctd, or fastnetmon

Detected attack details are logged in /var/log/fastnetmon_attacks/

roll out sensible flow-table-sizes to Juniper core routers with sampling enabled - T248394

Collect netflow data for internal traffic - T263277