Check ganglia

From Wikitech
This page contains historical information. The check_ganglia script has been archived
See task T343707 2023

check_ganglia is a nagios/icinga plugin script that can be used to generate alerts based on metric values in Ganglia. It query gmond instances or the main gmetad via its xml query interface.

check_ganglia has been imported into our gerrit and debianized: https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/check_ganglia. It was originally imported from https://github.com/larsks/check_ganglia.

See also: https://rt.wikimedia.org/Ticket/Display.html?id=6602

Puppet Usage

To install a new icinga alert based on a Ganglia metric, use the monitor_ganglia define.

    # Set up icinga monitoring of Kafka broker per second.
    # If this drops too low, trigger an alert.
    # These thresholds have to be manually set.
    # adjust them if you add or remove data from Kafka topics.
    monitor_ganglia { 'kafka-broker-MessagesIn':
        description => 'Kafka Broker Messages In',
        metric      => 'kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate',
        warning     => ':1500.0',
        critical    => ':1000.0',
        require     => Class['::kafka::server::jmxtrans'],
    }

This is a just wrapper for our monitor_service define. See the monitor_ganglia documentation for up to date usage.

CLI usage

In addition to nagios/icinga alerts, check_ganglia is also a useful CLI interface for querying Ganglia directly. It can query either gmond aggregators or gmetad itself.

Querying gmond

Querying gmond aggregator instances is slightly faster than querying gmetad, but requires knowledge of which aggregator instance has data for which nodes.

# analytics1010.eqiad.wmnet is a gmond aggregator.
check_ganglia -g analytics1010.eqiad.wmnet -H analytics1010.eqiad.wmnet -m Hadoop.NameNode.FSNamesystem.CapacityRemainingGB

# or you can even list all ganglia metrics for a given node
check_ganglia -g analytics1010.eqiad.wmnet  -H analytics1010.eqiad.wmnet --list

Query gmetad

Querying gmetad is slightly easier, because all of the metrics for all nodes should be available here.

# -q flag tells ganglia to use gmetad (which is on port 8654 for us).
check_ganglia -q -g nickel.wikimedia.org -p 8654 -H analytics1010.eqiad.wmnet --list