From Wikitech
Jump to navigation Jump to search

I think it makes sense to work on the Monitoring/Alerting/Paging aspect of Maps.

Monitoring: what parameters do we want to capture to be able to decide whether the system is working. - User facing latency is often a good parameter - Error rate is also very useful - Freshness, how updated is the map, most likely interesting as well

How would the system alert - if error rate is above a certain percentage, alert via IRC or Phab, if the above certain percentage * 2, page

In SRE we call the parameters Service Level Indicators (SLI), i.e. what do we want to measure. We call the combo of an SLO and the alerting thresholds Service Level Objectives (SLO).