Monitoring procedure

This page contains historical information. It may be outdated or unreliable.

Proposed monitoring procedure

Daily:

Check nagios for new alerts.
Fix simple issues such as daemons that need restarting or servers that can be rebooted remotely.
Note any issues which need on-site attention at datacenter tasks.
Pass responsibility for any more complex software issues to a competent staff member.

Weekly:

Capacity check. Make sure key metrics such as application CPU utilisation and disk space usage are not approaching dangerous limits.
Publish a report detailing the times at which Nagios was checked, the issues noted, and any people notified. Or, make this information available continuously, for review on a weekly basis.
Another team member should check the report and make sure that the monitoring done was of an appropriate standard.

One to two months:

Capacity review. Analyse capacity metrics and report your findings. Notify the team of upcoming performance bottlenecks which might require hardware purchases.
Report any long-term issues which have been left unfixed.