The methodical way of measuring software development

5 min readOct 19, 2019

In the world of Software, there is a constant debate about data and numbers.

Some people (and I must admit that I’m one of them) want to see numbers all the time, to measure everything and be able to have a data-driven decision.

On the other hand, especially in Software, it’s not easy to measure. History showed that measuring the wrong things might contradict the goals you try to achieve. Once you start measuring, people start to work toward those numbers.

If for example, you measure developer productivity by the number of lines a developer is coding, you might find yourself with many junk lines of code people put to biff up the numbers.

Another example of measuring something and get things wrong is velocity.

If you decide to measure velocity as a mean of productivity, developers start to overestimate their development efforts to get a growth graph.

Moreover, while the velocity goal is to enable backlog completion predictability, gaming the numbers causes the exact opposite.

So what should you measure?

Two key characteristics we should include in any measurement:

Focus on global outcomes
Make sure you measure outcome and not output

What does it mean?

When we discuss global outcomes we want to make sure that measurements won’t create a hidden incentive for one team to prioritize itself over the other team, if the measurement outcome is not global one might find himself getting several teams well optimized while others lag, this might cause a local optimization but global drawback.

Let us take an example.

We might have several scrum teams; if we measure each team on an on-time delivery measurement, we might find ourselves with a scenario where each team does its best to deliver its features, ignoring other teams and sometimes create latency in the delivery work of the other teams.

In that case, the numbers for the specific team might be right, but our overall delivery time to the customer has worsened since all the teams must deliver on time for us to bring value.

Measuring outcomes and not output adds another critical layer in this topic, continuing from the last example.

Now let say that we measure all of our teams’ ability to deliver features on time, and they all do that, on time!

Did we measure the right thing? Does getting those numbers high correlates to more value to the company, one that we can translate into money?

What if our customers aren’t using all the features we delivered? In that case, did we deliver value?

Another good example is measuring developers on delivery (throughput) and operation for stability, this makes sure that the development group generates a lot of low-quality features that are thrown over the wall to operations, on the other hand, operations incentive to build processes and gatekeepers to slow deployment, since not changes means lowering instability risks.

Then what we should be measuring?

There are four measurements that I came across while reading about this subject, measurements that show a correlation with successful business results:

Lead Time
Deployment Frequency
Mean Time to Restore (MTTR)
Change Fail Percentage

Lead time is the time it takes to go from the customer request until we’re able to deliver on that request.

One might say that this is an extensive measurement, one that includes other departments in the organization, from account managers, product managers, software developers, qualification engineers, and operation staff.

Moreover, that is right!

Measure things that bring value to the company.

To bring value, all of those stakeholders should work together in sync.

If we want to use this measure internally, as we did, we should consider measuring the areas that we control, so that we as a software organization could optimize the time it takes us to develop, test install a feature. (In our case we’ve called it Feature Cycle Time, and we measure it from Epic Start till Epic Done)

We should not forget, however, that the global measurement is our goal and we should work with the organization to put those measures in place.

Deployment Frequency means how many times do you deploy code to production?

Some might say that this is an easy to manipulate measurement since you can cut down your delivery to small chanks and raise the number, but small chanks of work are what we want to achieve!

Develop small increments and get feedback fast, not to mention that in a case there is a bug fix the engineer who wrote that code might still remember what he coded so that MTTR (next) is short as well…

Mean time to restore (MTTR); once there is a problem in production, how much time it takes to get the systems back up and running?

Its important to note that I don’t mix that with SLA (Service Level Agreement), since SLA is business term we want to make sure that once we have a downtime we can recover fast, merging that with the second measurement makes sure we won’t get into delivery stagnation, as the old say goes “If its not broken, don’t touch it!”

Change fail percentage is an important measurement as well, in our organization we measure rollbacks, only as a number, its much better to measure it as a percentage, again, if you measure it as a number its easy to optimize that number and build big and bulky version with a lot of moving parts, measuring it as a percentage will make sure that we value small batches and frequent delivery while maintaining our commitment to stability.

If you want to read more about the science behind lean software development and the DevOps way of thinking, I would recommend reading Accelerate, by Nicole Forsgren, Ph.D. Jez Humble, and Gene Kim.

The book was written based on several years of research, data that was collected for the “State of DevOps report” within puppet alongside great analysis of the data.

The book connects the dots between good technical and management behavior and business success.

As always I heard it via Audible, but you should find the best way to consume that information.

The methodical way of measuring software development

Written by Nir Sagiv

No responses yet