SLO/FAQ
What use cases lead to a negative error budget?
A sustained amount of errors (namely, SLO target not met) may lead to the exhaustion of the error budget and a negative value (until the errors are resolved). In a rolling window, the negative error budget will show up for a while, check the "How to read a Pyrra dashboard" section for more info.
How to read a Pyrra dashboard?
The tool that SRE uses to create SLO dashboards is Pyrra. It is mostly focused on rolling windows and it carries some peculiarities to take into consideration while reviewing a dashboard.
Let's review a real use case that happened to the Machine Learning team. They run a service called Tone Check, that is basically an ML model wrapped around a nice HTTP-based API. They have set the following SLO: "90% of all the Tone Check's HTTP responses should be served in maximum one second". This was their dashboard at some point in time:

For a lot of people this is very confusing: how is it possible that the error budget starts from -100% and then rises up to 20%? Shouldn't it be always starting from 100%, to then decrease over time if the SLO target is not met? The answer is not definitive, because it really depends how the remaining error budget data is displayed and how it is calculated behind the scenes. Pyrra uses a Prometheus function called increase() to calculate the ratio between errors/successes and total, using an interval that corresponds to the width of the rolling window. For example, the ML team (like most of our users) went for four weeks of rolling window, hence the increase() function, for every datapoint displayed, calculates the difference between its related counter value at that time and what is was four weeks before. In the above use case, you can see how the "Too Slow" section shows that up to a certain day of the window, the service was periodically failing the SLO target, burning error budget. The ML team fixed it adding a GPU, but it will take some time before the past datapoints won't be calculated anymore. Every datapoint in the above graph can be seen as the remaining error budget percent at the end of a four weeks window started a month before. So the graph shows the remaining error budget's trend, that is not always what one would expect.
What about the Calendar view in Grafana?

In this case the window is fixed, it goes from one specific date to three months later. We'd expect to see the error budget percent always starting at 100%, but due to the aforementioned Pyrra's architecture, this is not the case. The metrics displayed above follow the same limitation of the rolling window ones, since they are the same, so the "calendar" view is nothing more than the rolling window view displayed over a wider time range. In the future, at the beginning of the quarter you may find that the remaining error budget for your service is not at 100%, and in this case it will be probably due to failures in meeting the SLO target in the month before.
Another use case is the Edit Check's SLO, where the graph looks more familiar to our expectations:

In the above graph it is clear to see that the service started failing the SLO target when the remaining error budget started decreasing from 100% (until it reached a negative value). The graph starts at 100% because before this event, the error budget target was always respected. When the underlying issue will be solved, then the error budget will start climbing back to 100% like shown in the Tone Check's use case.