How can we measure the performance of our software engineers?

Posted by Ted Johansson on Nov 8, 2018 10:01:25 AM

That question can send a chill down the spine of many young and aspiring engineering managers, and prompt a jaded sigh from old and weathered ones.

“Here we go again.” “Well, we can’t” is the most common reaction. “At least not in any meaningful way”, another interjects.“Or in ways that aren’t counter productive”, someone adds. “It’s complicated.” So what is it about measurements in software development that is so complicated? Why is it that we can’t, or shouldn’t, measure the performance of our engineers? And what can we do instead? We were faced with these same questions when building our engineering team in EngageRocket, and in this article we will outline some of the things that we have found work, and some that don’t.

We will start by having a look at the most common pitfalls when setting out to measure software development in general, and individual performance in particular.

What’s So Different About Software Development?

Our fixation with what’s easily quantifiable has its roots in the school of Scientific Management. It places a focus on synthesis--being able to break down complex tasks into smaller ones--and standardisation. Historically this served well for industrial applications, where work can be reduced to mundane, repeatable tasks, without much need for worker skill or -autonomy.

Building software is a different beast altogether. A cerebral game, involving highly conceptual work. This is where Scientific Management falls on its nose. Tasks often present engineers either with novel problems, or familiar ones that are different in subtle ways. Problems, more often than not, call for bespoke solutions. Most also require intimate collaboration with other team members and business users. There is simply no such thing as a standard task.

Despite this, a few attempts have still been made. One of the most naïve approaches involves measuring lines of code produced by each engineer. As it turns out, though, the ability to hit the keyboard really, really fast does not correlate with the ability to build software that is valuable to our customers. (Often time it is the other way around.) The lowest common denominator is simply too low to be at all meaningful.

But surely measuring something must be better than nothing. Right?

Truth is, in software development, there are always more things to do. There are always more considerations to be made. This presents us with a hard challenge. If we spend too much time and energy on things that are ultimately unimportant, we end up with a “bike shed”, and progress stagnates. On the contrary, if we don’t spend enough of it on truly important decisions, we are on a death march, and will soon have a code base that resists any future changes. One defining trait of great software engineers is the ability to assign importance to the right things.

By deciding what to measure ahead of time we directly counteract this fact, and risk undermining one of the most important functions of great software developers: combining skill and experience with deep contextual knowledge to decide what is ultimately important. Too many potentially great software developers end up being a detriment to their team because they get caught up in the delivery mindset, and let this skill atrophy..

Everything is Integrated

One thing that sets software developers apart from other knowledge workers is the fact that the team all work in and around the same code base. And those code bases tend to stick around for a very long time. This presents yet another challenge in measuring the impact of individual engineers. A decision by an engineer to create a useful abstraction around common functionality can save the team a tenfold or even hundredfold amount of effort compared to what they would have saved by cutting that corner.

This resists the idea that we should measure performance on an individual level. Empathy is an often overlooked skill in software development. It helps us focus on our team instead of ourselves, and lets us better relate to our users. We want a team where everyone is focused on real and combined value being delivered, not one where people are stepping on each others’ feet to gain an upper hand in some perceived competition.

Okay. So what do we do instead?

If you have come this far, you might be at least a little bit convinced that measuring software development is not as straightforward as we’d like it to be. Perhaps you have even started thinking about what to do instead? If that’s the case, we won’t keep you on the edge of your seat any longer. Here are some of the things we’re doing at EngageRocket to make performance measurements productive.

We like to think about performance of knowledge workers by asking two questions. Firstly, is the person doing their best? Secondly, are the results good by any appreciable standard? We loosely refer to these attitude and aptitude.

Is Everyone Doing Their Best?

People who are happy at work will generally put in their best effort. This gives us a good proxy for knowing if someone is doing their best (which is something we could never know) and we can use frequent engagement surveys to get a good gauge of the happiness of our engineering team.

Interestingly, when it comes to software development, there is a correlation between self reported happiness and code quality. This seems to be a two-way street. Low code quality tends to make engineers unhappy, and unhappy engineers tend to produce low quality code. Engineers often report not having enough time to fix bugs and other defects as a major source of workplace dissatisfaction.

Establishing that someone is doing their best is necessary for creating a safe work environment. And if there are reasons to think that someone is not, we treat it as a systemic problem, not a personal one. By moving responsibility for employee engagement to the organization we avoid the blame game, and put the ball back in our court.

How Do We Know if That is Any Good?

It is entirely possible for an engineer who is highly motivated to still be a net negative to the team if they fail to assign importance to the right things or, more commonly, to assign too much of it to the wrong ones. Good news is now we know we know they are trying their hardest, and we can treat low performance as a learning opportunity, instead of a personal shortcoming.

But first, we need some indication of how a person is doing. Because of the inherently complex nature of software development, we need to combine many sources of structured and unstructured feedback to get a more nuanced view of how valuable someone is to the team and to the company. One thing we look at is patterns of consistent behaviour. Is the person acting in ways that are beneficial to the team?

We already have access to a lot of information about what activities are valuable. (Most teams will have a good idea.) After sitting down and discussing which consistent behaviours lead to our desired outcomes, we decided to create a matrix consisting of four rows, representing different  competencies: individual contribution, collaboration, teamwork, and system design. For each row, there are seven columns, representing an increasing level of sophistication in those areas. Within the resulting grid, we write out the behaviours, for example:

“System Design III: Independently makes well reasoned design decisions for small, constrained problems”

to create a competency matrix that we can use for evaluation. This matrix has the added benefit of making the expectations we have on each other explicit, it clearly outlines a progression for the professional development of each engineer, and helps identify areas of improvement.

complexity matrix                A snapshot of an early version of our competency matrix.

 

To allow for the required autonomy, each item is intentionally broad. To aid discussion, we put some concrete examples of how these behaviours might inform our work. It is the shared responsibility of each engineer and their engineering manager to highlight when they have exemplified these behaviours. This normally happens in one-to-ones, or as part of day to day work

The challenge of this approach is the need for continuous follow up. If we were to only look at the matrix once every quarter, we would be subject to recency bias, and the decision would be, or at least appear to be, made arbitrarily by the manager..

To complement this assessment, which focuses on more tangible hard skills, we also run 360 feedback reviews within the team to get a contrasting view of the person’s soft skills.

These are the feedback sources we’re using today, but we’re constantly reviewing and revising our approach in the quest of finding something that is both fair to everyone and valuable to the company, and we might be looking at adding additional sources down the line.

Breaking New Ground

Measuring software development is hard. Many of the intuitive approaches fail to be effective, and risk becoming harmful to both employees and the company. In the face of ever changing challenges, and when solving human problems, often unconstrained by any reason or logic, we need highly skilled and highly autonomous people. In trying to measure the value we produce, we must respect the complex nature of their work, and not reduce it to a factory line.

 

At EngageRocket we’re committed to finding ways of measuring performance that are useful in the modern workforce. Ways that are fair to everyone, and valuable to the organisations who are willing to live at the bleeding edge of leadership.

Topics: Recognition, Management, Mentoring, HR Tech, Tech