LibreNMS
LibreNMS is an autodiscovering PHP/MySQL/SNMP based network monitoring which includes support for a wide range of network hardware and operating systems including Cisco, Linux, Juniper, Foundry, and many more.
LibreNMS is a community-based fork of the last GPL-licensed version of Observium.
Service
Currently hosted on netmon1003 and netmon2002.
Replaces Observium which ran on Streber.
- Software is not installed via Debian package
- Software installed in:
/srv/deployment/librenms/
- RRD data stored in:
/srv/librenms/
- User creds are stored in MySQL:
# grep auth_mechanism /srv/deployment/librenms/librenms/config.php
- Authentication is done via LDAP
How to
Add a device to LibreNMS
Configure the read only v2c SNMP community on the device
Via webUI:
https://librenms.wikimedia.org/addhost/
And use the device FQDN, keep all the other fields as it (and do not force add it). Note: because of a bug, set port to "161".
The device should be discovered and polled in the next 10min.
Via CLI:
$ ssh librenms.wikimedia.org
$ cd /srv/deployment/librenms/librenms
$ sudo -u librenms ./lnms device:add --v2c -c <snmp_community> <device_fqdn>
Added device <fqdn> (XXX)
$ sudo -u librenms php discovery.php -h <fqdn> && sudo -u librenms php poller.php -h <fqdn>
Upgrade LibreNMS
Updating LibreNMS in our repositories
Let's assume your remote is configured like the following. And we're tracking new versions in different branches.
origin ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (fetch)
origin ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (push)
upstream https://github.com/librenms/librenms.git (fetch)
upstream https://github.com/librenms/librenms.git (push)
new=<new version> old=<old version> git fetch origin git checkout -b upstream-$old origin/upstream-$old git fetch upstream git checkout -b upstream-$new $new # If you are missing composer: apt install -y composer php-gd composer install --no-dev --ignore-platform-reqs # (your will be prompted for any missing php requirements) git add -f vendor git commit -m "Add composer requirements for LibreNMS $new" mkdir scap git checkout upstream-$old -- scap/scap.cfg git add scap git commit -m "Add Scap config" git push origin upstream-$new
upstream-$old
and cherry-pick them on upstream-$new
. See for example an occurrence where a LibreNMS upgrade left behind patches: https://phabricator.wikimedia.org/T273716#7430992Cherry picking commits from upstream-$old into upstream-$new
- Check Out the New Branch First, make sure you have the latest version of the LibreNMS repository and checkout the
upstream-$new
branch where you want to cherry-pick commits.git fetch origin
git checkout upstream-$new
- List Commits Exclusive to the Old Branch You can use the git log command to list commits that are exclusive to the upstream-$old branch but not in the upstream-$new branch. This helps identify the patches that need to be cherry-picked.
git log upstream-$new..upstream-$old
This command shows commits from upstream-$old that are not in upstream-$new. Review these commits to decide which ones should be cherry-picked. - Cherry-pick Commits For each commit you want to cherry-pick, use the
git cherry-pick
command. Suppose you have the commit hashesabc123
anddef456
fromupstream-$old
that you want to apply toupstream-$new
.git cherry-pick abc123 def456
If there are conflicts during the cherry-pick, Git will prompt you to resolve them. Open the conflicting files and make the necessary changes. After resolving conflicts, continue the cherry-picking process with:git cherry-pick --continue
- Push Changes After successfully cherry-picking the necessary commits, push your changes to the remote repository.
git push origin upstream-$new
Updating LibreNMS in production
Backing up the LibreNMS database
On dbprov1002:
cd /etc/wmfbackups cp backups.cnf librenms-backup.cnf sed -i '/sections:/,$c\ librenms:\ regex: librenms[.]\ host: '\''db1217.eqiad.wmnet'\''\ port: 3321 ' librenms-backup.cnf chown dump:dump librenms-backup.cnf sudo -u dump backup-mariadb --config-file librenms-backup.cnf
On deploy1002:
cd /srv/deployment/librenms/librenms/ git fetch origin git branch # note the current branch git checkout upstream-<version> scap deploy Upgrade LibreNMS to <version> - <task>
Run puppet on netmon* hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
)
cumin O:netmon run-puppet-agent
On the netmon_server (git grep -h netmon_server: hieradata/)
cd /srv/deployment/librenms/librenms sudo -u librenms ./daily.sh
Rollback
On deploy1002:
cd /srv/deployment/librenms/librenms/ git fetch origin git checkout <previous branch> scap deploy Rollback LibreNMS to <version> - <task>
Then run puppet again from cumin host:
cumin O:netmon run-puppet-agent
Check the logs
LibreNMS logs in 4 different locations:
- /srv/deployment/librenms/librenms/logs/librenms.log
- /var/log/librenms.log
- /var/log/librenms/daily.log
- /var/log/apache2/librenms.wikimedia.org.error.log
It would be great to have the first 3 in a single location.
Mass update PDU alerting thresholds
PDUs have automatically generated thresholds, the query bellow sets sane defaults to eqiad/codfw PDUs. And need to be run when new PDUs are being provisioned.
https://phabricator.wikimedia.org/T247358
https://phabricator.wikimedia.org/T245655
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'power'
AND sensor_descr like "Phase%"
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 1400
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'current'
AND (sensor_descr like "%Phase%" or sensor_descr like "%Line%" )
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 12
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_descr like "Cord%"
AND sensor_class = 'power'
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 3440
Reduce pooling time
See more details in https://phabricator.wikimedia.org/T346759
In some cases it's possible to reduce the pooling time by increasing the "Max Repeaters" (for items like "bgp-peers") or the "max OIDs" (for items like sensors). This should only be done on a case by case basis, from experience routers with a high latency (far from the LibreNMS hosts).
Features
Interface grouping
LibreNMS can group interfaces based on their description's prefix, for example "Transit:", "Peering:". Which is shown under the "ports" dropdown.
Prefixes not shown in the dropdown are still reachable by editing the URL, for example:
https://librenms.wikimedia.org/iftype/type=transport-tun/
https://librenms.wikimedia.org/iftype/type=transport/
Prometheus push-gateway
Alertmanager integration
Known limitations
- When failed over to the codfw (backup) instance (see. https://phabricator.wikimedia.org/T247967)
- Polling time for eqiad devices increased significantly due to the added latency. For the most populated rows (eqiad B and D) this means that occasionally poll times are >5 min, resulting in alerts and potentially missed data
- librenms web ui got significantly slower (from Europe at least) in part because of the added latency to reach codfw, in part because the database is still in eqiad