Jump to navigation Jump to search
This page contains historical information. It is probably no longer true.
Proposed monitoring procedure
- Check nagios for new alerts.
- Fix simple issues such as daemons that need restarting or servers that can be rebooted remotely.
- Note any issues which need on-site attention at datacentre tasks.
- Pass responsibility for any more complex software issues to a competent staff member.
- Capacity check. Make sure key metrics such as application CPU utilisation and disk space usage are not approaching dangerous limits.
- Publish a report detailing the times at which Nagios was checked, the issues noted, and any people notified. Or, make this information available continuously, for review on a weekly basis.
- Another team member should check the report and make sure that the monitoring done was of an appropriate standard.
One to two months:
- Capacity review. Analyse capacity metrics and report your findings. Notify the team of upcoming performance bottlenecks which might require hardware purchases.
- Report any long-term issues which have been left unfixed.