Ping offload

From Wikitech
Jump to navigation Jump to search

Service status: Work in progress

Documentation status: Ready

Goal: Lower the high ICMP load on LVS/CP servers by offloading echo requests to a dedicated server.

Linux has internal ICMP rate limiters that can cause the kernel to drop valuable ICMP packets. By offloading ICMP echo, we make sure the "important" ICMP (eg PMTU discovery) doesn't get dropped.

Deployment

Deployment task: https://phabricator.wikimedia.org/T190090

eqiad

cr1-eqiad/cr2-eqiad redirect inbound icmp echo requests to ping1001.eqiad.wmnet

codfw

cr1-codfw/cr2-codfw redirect inbound icmp echo requests to ping2001.codfw.wmnet

POPs

Plan is to wait for the Ganeti clusters in the POPs before duplicating the work there. T96852

Monitoring

Icinga: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=ping1001&style=hostservicedetail (and ping2001)

Grafana dashboard: https://grafana.wikimedia.org/dashboard/db/ping-offload

External monitoring: Ping to VIPs configured in Watchmouse

InAddrErrors alert

From the Grafana dashboard alerting.

This means the server is receiving packets for an IP not existing on the server.

  1. Run ip addr to check if all the redirected IPs are present on the loopback interface
    1. If not, they can manually be added temporarily with ip addr add <ip>/32 dev lo:ping_offload
  2. If the IPs are present, use tcpdump to find the IP in question (eg. filter out all the present IPs)
  3. In any cases or if the troubleshooting takes too long, disable the redirect (see bellow)

How-to

Temporarily stop the ICMP echo redirect

If the system is showing signs of issues or needs to go down for maintenance.

On both cr1 and cr2 routers of the target site, enter the following commands:

# deactivate firewall family inet filter border-in4 term offload-ping4

# deactivate firewall family inet filter transport-in4 term offload-ping4

Then verify that the changes about to be made are correct, the output should be similar to:

# show | compare
[edit firewall family inet filter border-in4]
!       inactive: term offload-ping4 { ... }
[edit firewall family inet filter transport-in4]
!       inactive: term offload-ping4 { ... }

Finish by committing the changes (replace <TASK #> with a phabricator task ID or relevant comment):

# commit comment "<TASK #>"

To confirm that the change is effective, monitor tcpdump on the ping host (for example sudo tcpdump -i ens5 icmp -nn) or the dashboard.

To re-activate the redirect, re-do the similar changes as above but replace deactivate with activate

Possible improvements

  • Use BGP flowspec to automatically advertise/remove the redirect
  • Add IPv6 support
  • Have multiple ping servers per site for redundancy