It’s true that “you cant improve what you don’t measure” but in the rapidly changing world of DevOps how do you know you are measuring the right thing? The publication of Accelerate1 and the annual State of DevOps report have gone a long way towards that, setting out 5 key metrics which are fundamental to the process of building, releasing and supporting software, regardless of the industry. They’ve based this on what they see as the factors that differentiate the highest and lowest performers.
Simple from here, right? Not so fast.
CTO Labs makes complex technology clear. We work with digital businesses on technology stack alignment to growth strategy, both with established enterprise as technology delivery partner, and we also work with leading startups as technology diligence advisor. This gives us an unprecedented perspective on what works and what doesn’t.
It’s also why we have a slightly different take on what and how to measure. Read on, and we’d love to hear your thoughts.
“You can’t improve what you don’t measure.”
State of DevOps Report
The state of DevOps report is an annual report which crowd sources details of how hundreds of businesses are operating relating to DevOps. Based on 10 years of the results of the annual report a book was released in 2018 by the core team: Accelerate. It’s a great read for technical and non-technical roles alike. The authors listed 4 key metrics which correlate strongly with successful business outcomes. These 4 key metrics relate directly to DevOps which is the foundational process of how to build, release and support software, regardless of the industry.
The 4 key metrics are;
Deployment frequency - How often changes are deployed to production.
Mean time to recovery - How long to recover from a major incident
Lead time for changes - How long does it take to go from writing code to deploying to production, also called Cycle time.
Change failure rate - What is the percentage of deployed changes which result in a failure/bug.
In the 2019 State of DevOps report a 5th metric was added
Availability - What is the percentage of availability of your platform/solution.
What do these metrics really mean and how do you know you’re measuring the right thing?
One of our core services is to go into digital businesses and evaluate the technology stack as part of an M&A process: looking at how it’s used, how it aligns to business growth targets. This insight gives us a fairly unique perspective.
Rather than stick to the same metrics for all businesses, we believe the answer lies in interpreting these metrics in a way that is unique to the business. There is not one solution that fits all, just as there is not one metrics system that works for everybody.
Here’s why not.
Deployment Frequency - How often changes are deployed to production.
This is likely the most straight forward metric to measure. The goal is to have a count of deployment of software changes making it into production and then measure it across time (per day, per week, per month).
But when you have a complex platform what is counted as a deployment? For me I go back to Domain Driven Design. It’s all about bounded contexts and deployment isolation. Depending on the coupling within your platform you will be able to separate it into discrete parts. If you can deploy your frontend service without needing changes to the backend then you have deployment isolation. Each deployment to the frontend would be a +1 to the deployment count. A different deployment to the backend would be another +1 to the count.
On the flip side if your frontend and backend always need to be deployed together then you have a deployment package of 2 services. When you deploy a change to either frontend or backend as they are both tightly coupled then that is a +1 to the count even if you deploy changes to both at the same time.
Meantime to Recovery - How long to recover from a major incident.
This metric has a little more ambiguity about it. For me recovery indicates that something major has happened. I relate this metric to major incidents rather than bugs. If this metric is about recovery time to the normal operational state of the platform, then when do you start the clock? If you deploy a change on Friday 6pm and no one finds it until Monday 9am is that 2.5 days or is it 2 hours?
I believe that users finding an issue is irrelevant. If you want to improve you track from the moment an issue was in play in production and switched on. This starting point will also help to perform a complete root cause analysis and identify any direct or indirect impacts.
The clock stops when the incident gets resolved, that specifically means when not only a resulting fix goes into production but also when all impacted systems or data have been resolved. If you’ve deployed a change which resolves a major issue but you need to spend the next 2 days manually updating corrupted data then you should track the extra 2 days. This will keep the entire impact very transparent and in larger businesses if you need to track support time then it’ll provide a more true picture of the actual cost to the business.
Lead time for changes - How long does it take to go from writing code to deploying to production, also called Cycle time.
The time it takes to go from a starting a change to verifying it running in production, will depend a little on your personal pipelines.
But I like to start the clock on a change from the moment the user story is moved from sprint backlog to the ‘doing’ column. By that point all the necessary design and detail should be agreed which directs the engineering team to what is being built.
You stop the clock when the change is in production and has been verified.
To be specific, don’t just wait for a CI/CD pipeline to deploy the change. You need a verification step, that might be a manual check, it may be an automated step in your pipeline to test the new feature. If you’re releasing or using feature switching in our opinion, you’re not exempt from needing to verify the change. You still need to turn the feature on for your verification step and confirm all is working as expected.
By measuring not only the time it takes to go from build to deployed but also the times which the engineers needed further clarity, the review process wasn’t available, the regression test was grouped with 5 other stories which slowed this story’s lead time down. These are all invaluable data points which will help to know exactly where the bottlenecks are in the process.
Any time which there is a queue in the process will slow things down, queues often occur when waiting for a few stories to complete before doing regression testing for example, or a tech lead who needs to review the work is snowed under. These will help to identify where to invest your effort in continuous improvement experiments.
Change Failure Rate - what is the percentage of deployed changes which result in a failure/bug.
What is the ratio between changes being deployed and a failure? What is a failure? This is the biggest question for measuring this metric.
There are really two options: either a failure is a major incident which is significantly impacting users or it’s a new bug being added to the backlog.
As an opinionated piece our guidance is ‘it depends’ (Yes we are consultants).
But I can answer in more detail.
If your platform is on fire and you have major or minor incidents on a fairly regular basis then track failures as a major incident.
The goal of measuring these metrics is to get better over time. Once your monthly stats report a Zero for change failure rate over a few months then you have successfully reduced the fuel of the burning platform and you can focus on more specific improvements. At this point reclassify a failure as an unexpected bug being created in a backlog. This can be triggered either by a user, QA, PO, any role.
Once you’re tracking new bugs generated then you are no longer putting out fires you are on a path to improve the quality of the platform. Look for opportunities to build automation in the pipeline that could have caught the bug which has been found. Consider learnings from Acceptance Testing Driven Development, surround your change with a failing test first, resolve that failing test then refactor. This will forever improve your platform and if done correctly that exact bug will never happen again.
One other aspect to mention is; How big is a change?
The size of a change is whatever is being released to production right now. However, you may want to perform experiments on how the size of a change impacts the change failure rate. It’s always best to make data led decisions and not to make too many assumptions. But you’re likely to find the larger the change the higher the change failure rate. Investigate in ways you can reduce the size of change and not negatively impact other metrics you’re tracking.
The key to success here will be automation. If every change needs to go through a manual stage, then that won’t scale if you go from 3 large changes a week to 300 tiny ones.
AVAILABILITY - What is the percentage of availability of your platform/solution.
The final and newest addition to the DevOps key metrics to track is availability.
The question here is what needs to be available. If you have a website and the frontend is faulty and no user can use the website or networking is broken, then you have total failure of your platform, and this is a clear and definite impact to availability.
But what if you have a micro services architecture and your email service is down for 5 minutes but you have an email queue and not one user was impacted? This is the complexity of this metric.
CTO Labs tends to guide clients to adopt a user facing perspective on this metric, at least to start with. Then introduce a feature hierarchy model. If clients are impacted then it’s counted towards the metric, it’s your decision as to if a delay of service is an impact. It depends on your industry and domain. The interpretation is up to you.
A final word
DevOps is a dynamic, fast-moving field, and while metrics are essential, so is taking into account the operational circumstances that are unique to you. If you want to track them differently then we say go for it. As long as you’re being transparent and tracking these metrics in a way that makes sense for you, then we are in agreement.
But let us know if you track anything differently based on technology, business or process requirements. We are interested in seeing what the industry is really doing in this space at depth.
Want help with your DevOps measurement and evaluation? Set up a callback below and lets talk.
Notes: