Mean time to recovery
In modem software products and services, which are rapidly changing complex systems, failure is inevitable, so the key question becomes: How quickly can service be restored? — Accelerate: The Science of Lean Software and DevOps. IT Revolution Press
Mean time to recovery, or MTTR, is calculated by adding up the total time spent on resolving an incident during any given period and then dividing that time by the number of incidents. This metric is most useful to measure how quickly your organization is able to resolve an incident. In other words, it can give you a sense of how resilient your system is.
The MTTR section in Vitals also provides the number of incidents on the given period and a breakdown per service having suffered an incident.
The Evolution graph shows a compareason between 3 main buckets:
- Previous 3 months: the MTTR computed over the previous 3 months from the beginning of the selected period ; e.g selected period starts in the middle of June, then the months considered would be March, April, May.
- Previous month: the MTTR computed over the last month from the beginning of the given period ; e.g selected period starts in the middle of June, then the month considered would be May.
- Last week: the MTTR computed over the last week from the end of the selected period.