Tony was a Scrum Master. He started measuring the development trends of his team. However, whenever he selected a metric, he got the impression that developers were manipulating the results to meet the targets he set. For example, whenever he measured the closing bugs count, he noticed that from one iteration to another the rate grew from 5 to 15. The funny thing was that the bug opening count grew as well from 10 to 24. When trying to measure velocity, it doubled over 3 iterations.
Is this story familiar to you? Let’s see why we should measure. Then we can look at metrics that encourage desired behaviors rather than no-desired ones.
So why should we measure R&D metrics? We measure to amplify learning, to help us improve continuously. That’s what makes us professionals.
Some managers, Team Leaders and sometimes Scrum Masters miss the point and use metrics as a reviewing tool. They use metrics to evaluate developers, to grade them for the bi-yearly review, to decide on the bonus they will get, etc. They focus on output metrics such as lines of code, number of commits, number of pull requests, number of fixed bugs, number of features delivered, etc. Some leaders may mistake velocity as a performance indicator rather than using a planning tool. Those metrics may promote negative behaviors and culture such as metric manipulation and negative competition among team members etc.
Selecting the right metrics to measure is the key to improvement. Select metrics that amplify learning and drive quality. Before selecting a metric, ask yourselves what you would like to learn and whether the metric you are considering serve your purpose.
In “R&D execution – Using Significant Metrics” meetup led by Ori Keren, he shared with us research by DevOps Research and Assessment (DORA) led by Nicole Forsegren (PhD). The conclusions from the research that there are 4 key metrics that complement one another. Correlated Elite R&D Groups score highly in all those 4 metrics and they are:
- Lead time for change – Time from when deciding to implement a feature and until it is in production.
- Deployment frequency – how often we deploy working software
- Change failure rate – how often do we have to recover from a failure in production
- Meantime to restore – how long does it take us to restore a previous version in production.
The research suggests that top-scoring (“Elite Performers”) organizations on all 4 metrics perform significantly better than the others (“Low Performers”): Elite vs. Low performers.
Ori also suggests focusing on 3 metrics categories: Delivery, quality, and investment.
Delivery metrics may include those measured by DORA, “lead time” and others such as “cycle time”, “time to merge” and “time to release” or deploy. Those metrics can help us learn about our process, automation testing, and our pipeline. For example, if it takes us 2 days to release to production in a 2 weeks iteration, we can look at where we can improve on E2E and regression testing cycle.
Quality metrics may include code churn, bug count in production, time to restore (from prod incident), refactoring rate (% of release code being refactored).
The third category that Ori suggests is “Investment”. “Functional backlog to bug ratio” is the number of bugs worked on in every iteration compared with the number of functional requirements worked on. It teaches us about our investment and may teach about our coding standards quality and engineering practices as well. “Functional to sub-tasks Ratio” represents the number of sub-tasks needed to complete one user story. It may teach about the size of our backlog items. For example, if it takes 20 sub-tasks to fulfill one user story, maybe the size of our user stories are too big or are sub-tasks our too small.
To summarize, use metrics to amplify your continuous improvement. Select metrics that their results may lead you to take action to improve your engineering practices, your process, and your professionalism.