Jump to navigation Jump to search
- 08:18 andrewbogott: rebooted nagios-dev
- 15:58 mutante: restarting nagios3 on nagios-main (which is icinga.wmflabs, was down per bug 52560)
- 18:23 wm-bot: petrb: restarting nagios
- 14:02 wm-bot: petrb: fixed nlogin o
- 15:05 wm-bot: petrb: restarted nagios irc bot
- 15:05 labs-logs-bottie: petrb: restarting nagios bot
- 22:33 Damianz: Made adding role checks simple - see https://gerrit.wikimedia.org/r/#/c/55424/ for basics
- 15:03 labs-logs-bottie: petrb: restarted ircecho
- 19:56 Ryan_Lane: restarted ircecho on nagios-main
- 09:11 petan: moved bot to nagios channel
- 11:45 labs-logs-bottie: petrb: restarting feed
- 21:21 Ryan_Lane: scratch that. it doesn't need to reboot
- 21:21 Ryan_Lane: rebooting nagios-main
- 09:21 petan: ignoring all swift-be* instances - no one cares about them and they are spamming channel
- 08:34 petan: rebooting nagios
- 15:58 labs-logs-bottie: petrb: restarting nagios server because it really needs it
- 06:55 Damianz: Fixed permissions on rw dir so snmptt can submit trap results.
- 17:57 Damianz: Implimented ignoring hosts in the rebuild script + restarted puppet/ircecho
- 08:40 Damianz: Stopped puppet/ircecho again, commented out in crontab this time
- 15:53 Damianz: fixed parser fetch so it rebuilds when the old file is missing
- 15:42 Damianz: fixed snmp trap config
- 15:15 Damianz: reset rw owernship to snmptt so the puppet check can be entered - probably should switch this to a setuid'd binary or setup group memberships properly.
- 08:39 Damianz: changing puppet-FAIL command to `echo "Puppet has not run in the last 10 hours" && exit 2` from `/usr/share/nagios3/puppet_check.sh $HOSTADDRESS$`
- 01:12 Damianz: Free ram plugin seems to be working, 27 hosts it's not working on - probably puppetmaster::self/broken puppet
- 23:24 Damianz: free ram check merged in, un-commenting service. Not reloading, should reloading on the next parser run giving time for puppet to run on the instances.
- 22:45 Damianz: Commented out free_ram check for now.
- 22:26 Damianz: Pending change to fix the Free ram checks (21822)
- 22:25 Damianz: Changed snmp host for trap in base::puppet - that should fix Puppet freshness checks on anything not running puppetmaster::self
- 20:58 Damianz: re-setup snmpd/snmptt for puppet freshness checks. Used config from puppet, plugin in /usr/lib/nagios/plugins/eventhandlers
- 18:38 Damianz: parser re-breaks configs, removing -a $ARG2$ from check_nrpe now as nothing gets an arg passed anyway. This should be fixed in a better way so we /can/ use args later.
- 18:20 Damianz: Copied /etc/nagios3/conf.d to /etc/nagios3/conf.d.backup and sed -i 's/check_nrpe/check_nrpe_1arg/g' /etc/nagios3/conf.d/* to fix nrpe checks, need to check the parser.
- 18:19 Damianz: chmod 644 /etc/nagios3/resource.cfg so nagios can read it on reload.
- 13:15 labs-logs-bottie: root: disabled user check for bastion
- 13:24 mutante: starting snmptrapd
- 13:54 labs-logs-bottie: petrb: fixed nagios
- 12:15 mutante: - started snmptrapd
- 09:23 mutante: starting snmptrapd
- 07:43 labs-logs-bottie: root: aptitude upgrade
- 07:01 petan|wk: rebooting
- 05:18 mutante: - put all the hosts currently down into scheduled downtimes for the next 3 days with manual bash commands
- 04:18 mutante: - temp. changed permissions on external command file per Nagios FAQ, added group "nagiocmd" to see if that allows me to schedule downtimes, it does (independetly from the host command perms), but took permissions back due to security concerns
- 03:36 mutante: even though listed in all authorized_for_* commands in cgi.cfg i get denied to execute any by web ui. guess related to the Apache LDAP auth / auto-login
- 03:27 mutante: puppet broken due to "Could not find class misc::apache2"
- 08:50 petan|wk: fixed irc