Nova Resource:Nagios/SAL

From Wikitech

January 15

  • 08:18 andrewbogott: rebooted nagios-dev

September 23

  • 15:58 mutante: restarting nagios3 on nagios-main (which is icinga.wmflabs, was down per bug 52560)

June 11

  • 18:23 wm-bot: petrb: restarting nagios

May 30

  • 14:02 wm-bot: petrb: fixed nlogin o

May 22

  • 15:05 wm-bot: petrb: restarted nagios irc bot

May 10

  • 15:05 labs-logs-bottie: petrb: restarting nagios bot

March 22

March 19

  • 15:03 labs-logs-bottie: petrb: restarted ircecho

March 5

  • 19:56 Ryan_Lane: restarted ircecho on nagios-main

February 21

  • 09:11 petan: moved bot to nagios channel

February 18

  • 11:45 labs-logs-bottie: petrb: restarting feed

February 6

  • 21:21 Ryan_Lane: scratch that. it doesn't need to reboot
  • 21:21 Ryan_Lane: rebooting nagios-main

January 30

  • 09:21 petan: ignoring all swift-be* instances - no one cares about them and they are spamming channel

December 19

  • 08:34 petan: rebooting nagios

October 8

  • 15:58 labs-logs-bottie: petrb: restarting nagios server because it really needs it

October 6

  • 06:55 Damianz: Fixed permissions on rw dir so snmptt can submit trap results.

October 3

  • 17:57 Damianz: Implimented ignoring hosts in the rebuild script + restarted puppet/ircecho

October 1

  • 08:40 Damianz: Stopped puppet/ircecho again, commented out in crontab this time

September 13

  • 15:53 Damianz: fixed parser fetch so it rebuilds when the old file is missing
  • 15:42 Damianz: fixed snmp trap config
  • 15:15 Damianz: reset rw owernship to snmptt so the puppet check can be entered - probably should switch this to a setuid'd binary or setup group memberships properly.

August 30

  • 08:39 Damianz: changing puppet-FAIL command to `echo "Puppet has not run in the last 10 hours" && exit 2` from `/usr/share/nagios3/puppet_check.sh $HOSTADDRESS$`

August 29

  • 01:12 Damianz: Free ram plugin seems to be working, 27 hosts it's not working on - probably puppetmaster::self/broken puppet

August 28

  • 23:24 Damianz: free ram check merged in, un-commenting service. Not reloading, should reloading on the next parser run giving time for puppet to run on the instances.
  • 22:45 Damianz: Commented out free_ram check for now.
  • 22:26 Damianz: Pending change to fix the Free ram checks (21822)
  • 22:25 Damianz: Changed snmp host for trap in base::puppet - that should fix Puppet freshness checks on anything not running puppetmaster::self
  • 20:58 Damianz: re-setup snmpd/snmptt for puppet freshness checks. Used config from puppet, plugin in /usr/lib/nagios/plugins/eventhandlers
  • 18:38 Damianz: parser re-breaks configs, removing -a $ARG2$ from check_nrpe now as nothing gets an arg passed anyway. This should be fixed in a better way so we /can/ use args later.
  • 18:20 Damianz: Copied /etc/nagios3/conf.d to /etc/nagios3/conf.d.backup and sed -i 's/check_nrpe/check_nrpe_1arg/g' /etc/nagios3/conf.d/* to fix nrpe checks, need to check the parser.
  • 18:19 Damianz: chmod 644 /etc/nagios3/resource.cfg so nagios can read it on reload.

June 20

  • 13:15 labs-logs-bottie: root: disabled user check for bastion

June 3

  • 13:24 mutante: starting snmptrapd

June 1

  • 13:54 labs-logs-bottie: petrb: fixed nagios

May 17

  • 12:15 mutante: - started snmptrapd

May 3

  • 09:23 mutante: starting snmptrapd
  • 07:43 labs-logs-bottie: root: aptitude upgrade

May 2

  • 07:01 petan|wk: rebooting

March 20

  • 05:18 mutante: - put all the hosts currently down into scheduled downtimes for the next 3 days with manual bash commands
  • 04:18 mutante: - temp. changed permissions on external command file per Nagios FAQ, added group "nagiocmd" to see if that allows me to schedule downtimes, it does (independetly from the host command perms), but took permissions back due to security concerns
  • 03:36 mutante: even though listed in all authorized_for_* commands in cgi.cfg i get denied to execute any by web ui. guess related to the Apache LDAP auth / auto-login
  • 03:27 mutante: puppet broken due to "Could not find class misc::apache2"

February 24

  • 08:50 petan|wk: fixed irc