Monitoring/check eth

From Wikitech
Jump to navigation Jump to search

This Icinga alert says checking the health of a network interface failed.

It uses the command NRPE command /usr/local/lib/nagios/plugins/check_eth on the affected host which you can run yourself.

Since that is a shell script you can also look inside to see what it does exactly.

  • It checks for the existence of these interfaces: "eno1 eno2 enp5s0f0 enp5s0f1 lo" and bails out with a WARN if one of them is not found.
  • It checks if all of them are reporting a carrier (a cable is plugged in) and exits with a CRIT if one is missing one.
  • It checks if one of the interfaces is reported as administratively configued as DOWN.
  • It checks if all interfaces have speed negotiated to 1000 Mb/s using /sbin/ethtool.

In the cases of "missing cable" (no carrier), missing interface or wrong speed you should create a ticket for the dcops team to check hardware.

If an interface is configured as DOWN try to find out who did it and why, ping on IRC or on the ops list.