LibreNMS

From Wikitech

LibreNMS is an autodiscovering PHP/MySQL/SNMP based network monitoring which includes support for a wide range of network hardware and operating systems including Cisco, Linux, Juniper, Foundry, and many more.

LibreNMS is a community-based fork of the last GPL-licensed version of Observium.

Service

Currently hosted on netmon1003 and netmon2002.

Replaces Observium which ran on Streber.

  • Software is not installed via Debian package
  • Software installed in: /srv/deployment/librenms/
  • RRD data stored in: /srv/librenms/
  • User creds are stored in MySQL: # grep auth_mechanism /srv/deployment/librenms/librenms/config.php
  • Authentication is done via LDAP

How to

Add a device to LibreNMS

Configure the read only v2c SNMP community on the device

Via webUI:

https://librenms.wikimedia.org/addhost/

And use the device FQDN, keep all the other fields as it (and do not force add it). Note: because of a bug, set port to "161".

The device should be discovered and polled in the next 10min.

Via CLI:

$ ssh librenms.wikimedia.org
$ cd /srv/deployment/librenms/librenms
$ sudo -u librenms ./lnms device:add --v2c -c <snmp_community> <device_fqdn>
Added device <fqdn> (XXX)
$ sudo -u librenms php discovery.php -h <fqdn> && sudo -u librenms php poller.php -h <fqdn>

Upgrade LibreNMS

Updating LibreNMS in our repositories

Let's assume your remote is configured like the following. And we're tracking new versions in different branches.

origin	ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (fetch)
origin	ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (push)
upstream	https://github.com/librenms/librenms.git (fetch)
upstream	https://github.com/librenms/librenms.git (push)
new=<new version>
old=<old version>

git fetch origin
git checkout -b upstream-$old origin/upstream-$old
git fetch upstream
git checkout -b upstream-$new $new

# If you are missing composer: apt install -y composer php-gd
composer install --no-dev --ignore-platform-reqs # (your will be prompted for any missing php requirements)
git add -f vendor
git commit -m "Add composer requirements for LibreNMS $new"

mkdir scap
git checkout upstream-$old -- scap/scap.cfg
git add scap
git commit -m "Add Scap config"

git push origin upstream-$new
WARNING: At this point you should make sure we are not leaving behind "our" changes to the old version. Check if any patches were applied on top of upstream-$old and cherry-pick them on upstream-$new. See for example an occurrence where a LibreNMS upgrade left behind patches: https://phabricator.wikimedia.org/T273716#7430992

Cherry picking commits from upstream-$old into upstream-$new

  1. Check Out the New Branch First, make sure you have the latest version of the LibreNMS repository and checkout the upstream-$new branch where you want to cherry-pick commits. git fetch origin git checkout upstream-$new
  2. List Commits Exclusive to the Old Branch You can use the git log command to list commits that are exclusive to the upstream-$old branch but not in the upstream-$new branch. This helps identify the patches that need to be cherry-picked. git log upstream-$new..upstream-$old This command shows commits from upstream-$old that are not in upstream-$new. Review these commits to decide which ones should be cherry-picked.
  3. Cherry-pick Commits For each commit you want to cherry-pick, use the git cherry-pick command. Suppose you have the commit hashes abc123 and def456 from upstream-$old that you want to apply to upstream-$new. git cherry-pick abc123 def456 If there are conflicts during the cherry-pick, Git will prompt you to resolve them. Open the conflicting files and make the necessary changes. After resolving conflicts, continue the cherry-picking process with: git cherry-pick --continue
  4. Push Changes After successfully cherry-picking the necessary commits, push your changes to the remote repository. git push origin upstream-$new

Updating LibreNMS in production

Backing up the LibreNMS database

On dbprov1002:

cd /etc/wmfbackups
cp backups.cnf librenms-backup.cnf
sed -i '/sections:/,$c\
  librenms:\
    regex: librenms[.]\
    host: '\''db1217.eqiad.wmnet'\''\
    port: 3321
' librenms-backup.cnf
chown dump:dump librenms-backup.cnf
sudo -u dump backup-mariadb --config-file librenms-backup.cnf

On deploy1002:

cd /srv/deployment/librenms/librenms/
git fetch origin
git branch # note the current branch
git checkout upstream-<version>
scap deploy Upgrade LibreNMS to <version> - <task>

Run puppet on netmon* hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet)

cumin O:netmon run-puppet-agent

On the netmon_server (git grep -h netmon_server: hieradata/)

cd /srv/deployment/librenms/librenms
sudo -u librenms ./daily.sh

Rollback

On deploy1002:

cd /srv/deployment/librenms/librenms/
git fetch origin
git checkout <previous branch>
scap deploy Rollback LibreNMS to <version> - <task>

Then run puppet again from cumin host:

cumin O:netmon run-puppet-agent

Check the logs

LibreNMS logs in 4 different locations:

  • /srv/deployment/librenms/librenms/logs/librenms.log
  • /var/log/librenms.log
  • /var/log/librenms/daily.log
  • /var/log/apache2/librenms.wikimedia.org.error.log

It would be great to have the first 3 in a single location.

Mass update PDU alerting thresholds

PDUs have automatically generated thresholds, the query bellow sets sane defaults to eqiad/codfw PDUs. And need to be run when new PDUs are being provisioned.
https://phabricator.wikimedia.org/T247358
https://phabricator.wikimedia.org/T245655

UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'power'
AND sensor_descr like "Phase%"
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 1400
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'current'
AND (sensor_descr like "%Phase%" or sensor_descr like "%Line%" )
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 12
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_descr like "Cord%"
AND sensor_class = 'power'
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 3440

Reduce pooling time

See more details in https://phabricator.wikimedia.org/T346759

In some cases it's possible to reduce the pooling time by increasing the "Max Repeaters" (for items like "bgp-peers") or the "max OIDs" (for items like sensors). This should only be done on a case by case basis, from experience routers with a high latency (far from the LibreNMS hosts).

Features

Interface grouping

LibreNMS can group interfaces based on their description's prefix, for example "Transit:", "Peering:". Which is shown under the "ports" dropdown.

Prefixes not shown in the dropdown are still reachable by editing the URL, for example:

https://librenms.wikimedia.org/iftype/type=transport-tun/

https://librenms.wikimedia.org/iftype/type=transport/

Prometheus push-gateway

Alertmanager integration

Known limitations

  • When failed over to the codfw (backup) instance (see. https://phabricator.wikimedia.org/T247967)
    • Polling time for eqiad devices increased significantly due to the added latency. For the most populated rows (eqiad B and D) this means that occasionally poll times are >5 min, resulting in alerts and potentially missed data
    • librenms web ui got significantly slower (from Europe at least) in part because of the added latency to reach codfw, in part because the database is still in eqiad

External links