Jump to navigation Jump to search
This page contains historical information. It is probably no longer true.
This page is intended to document the monitoring infrastructure that exists for the fundraising as well as keep track of desired monitoring functionality.
Existing monitoring infrastructure
We currently monitor the 'minfraud' log (which gets aggregated form the payments cluster to Loudon).
See RT tickets #405
- Nagios check for alive-ness
- Nagios check for failed builds
- Note: some scripts run by Hudson need to be modified to throw a non-successful exit status when they don't complete properly (eg send/receive mail scripts for civimail)
- Nagios check for too many files in build folders (if the limit of 63999 gets hit, builds will fail)
- Nagios check for queues filling up too fast
- Service communication times
- Nagios checks for timeouts/unacceptably high communication times
- 3rd party service accessibility from payments cluster
- Nagios check for communications access to MaxMind/PayPal