Data Platform/Systems/Hive/Alerts
This page provides troubleshooting guidelines and runbooks to use in the event of alerts from our Monitoring systems,
Hive Metastore Heap Usage
A critical alert is triggered when 90% of the available heap has been used on average over the past hour.
A warning alert is triggered when 80% of the available heap has been used on average over the past hour.
Examine the trends via the Hive dashboard in Grafana.
An alert might indicate an issue with a currently running process, or it might indicate that we need to increase the resources available to the metastore.
High DB query rate
A critical alert is triggered when MariaDB servers backing Hive are seeing more than 1000 requests / second.
In some cases, Spark jobs can generate a larger than usual number of queries, overloading the MariaDB instance behind Hive. Jobs that are overloading needs to be identified and rewritten. Look at jobs that have been modified recently.