Netflow/sflow
High level description on https://en.wikipedia.org/wiki/NetFlow and https://en.wikipedia.org/wiki/SFlow
Goal
Gather network level (Layer 4) traffic flows metadata to assist with traffic engineering and DoS mitigation.
How does it work?
We first started with the "netflow" pipeline.
The sflow pipeline got added later on after requests from SREs for better visibility on internal flows. It was decided to keep it separated:
- to not impact the primary (netflow) pipeline which is real-time sensitive and critical to SRE's days to day work
- as devices functionally different (switches/routers) historically supported different protocols (the gap got narrowed only very recently)
Netflow, on routers, for external traffic
On the routers side:
- 1 out of 1000 flows crossing the routers' external interfaces (both inbound and outbound) gets its metadata sent to a configured collector once the flow timeout is reached (here 10s)
- Example metadata are: source/dest IP/port/AS#, IP protocol, TCP flag...
- The routers share their full BGP view with the collector
On the collectors side:
- Samplicator duplicates the IPFIX (netflow) packets to Fastnetmon and nfacct, while spoofing the source IP (so they still seem to come from the routers)
- Nfacct extrapolates the flow size and packets based on the sampling rate (eg. do *1000)
- Nfacct uses a prefix list (exported from Puppet) to enrich the collected flows with traffic direction
- Nfacct uses the BGP data provided by the routers to enrich the collected flows metadata (adds peer src/dst AS#, AS path, src/dst AS#)
- Nfacct uses an IP to location database to enrich the collected flows metadata (adds source and destination country)
- Nfacct exports the enriched flow data to Druid via Kafka
- Fastnetmon monitors inbound traffic for both known attack patterns and traffic level threshold and if any condition is met:
- sends a notification email including a traffic signature if able
- Triggers our monitoring system
Sflow, on switches, for internal traffic
On the switches side:
- On L3 switches only as older switches don't support sending sflow data over their management interface
- 1:1000 sampling is configured on all server facing ports in the server->switch direction (ingress) to prevent double accounting (inbound on one port, outbound on the other)
- Packets exit through the data plane to not risk overwhelm the management plane (or management network)
On the collectors side:
- Sfacct extrapolates the flow size and packets based on the sampling rate (eg. do *1000)
- Sfacct uses a prefix list (exported from Puppet) to enrich the collected flows with their scope (eg. if source and destination IPs are Wikimedia's range, it's an internal flow)
- Sfacct exports the flow data tagged as internal to Druid via Kafka
How to deploy?
Collectors:
- Apply role::netinsights to a server (see existing servers for specs)
The network device side is provisioned automatically with Homer. Except:
- On switches, add the following route if the sampling needs to use a different routing-instance:
set routing-options static route [collector]/32 next-table PRODUCTION.inet.0
Troubleshooting
Check if pmacct is sending data to kafka
$ kcat -b kafka-jumbo1001.eqiad.wmnet -t netflow -C -o end
$ kcat -C -u -b kafka-jumbo1001.eqiad.wmnet:9092 -t network_flows_internal -o end | grep --line-buffered XXX | jq .
Real time Fastnetmon dashboard
$ fastnetmon_client
Check the logs
Both Pmacct and Fastnetmon log to syslog, grep for nfacctd
, sfacctd
, or fastnetmon
Detected attack details are logged in /var/log/fastnetmon_attacks/
Visualization
- Turnilo is the easiest way to drill down through the data.
- Netflow (external) is real time. https://turnilo.wikimedia.org/#wmf_netflow- Example dashboard: https://w.wiki/8oU
- Sflow (internal) have a ~6h delay. https://turnilo.wikimedia.org/#network_flows_internal/- Example dashboard: https://w.wiki/52B3
- Dashboards can also be made with Superset.
- Spark POC: https://gist.github.com/ottomata/58b3712a1d247a9575772b942e3d5ff3
- Fastnetmon heath data: https://grafana.wikimedia.org/d/jjn9MC_Vk/fastnetmon
Limitations
- Fastnetmon misreports attack type and protocol - T241374
Resources
https://github.com/pavel-odintsov/fastnetmon/
roll out sensible flow-table-sizes to Juniper core routers with sampling enabled - T248394
Collect netflow data for internal traffic - T263277