Predicting Code Quality Issues Before They Happen: A Minority Report for Code
CodeScene’s sweet spot is to identify and prioritize technical debt based on the likely business impact. Over the past year we have worked to shorten the feedback loop so that a decline in code health is caught as early as possible, for example via a CI/CD pipeline. But could we have even shorter feedback loops? Can we predict future code quality issues while that code is still a spark in the eye of a developer? That is, could we predict issues before the code is even written? Follow along, as we look to predict the future.
Case Study: Predicting future Code Health decline in Docker
Empear recently published a short video interview where I talk about the forensic psychology roots of behavioral code analysis. Towards the end, I’m asked where we will go from here. I reply that we want to be a (benevolent) minority report for code; we want to predict the future.
The reason I said that was because I was aware of the prototypes we have in our lab. And now it’s time to reveal the first of CodeScene’s new predictive capabilities. Let’s look at how we can predict future hotspots in Docker.
A Docker Hotspot that declines in Code Health
I regularly analyze well-known open source codebases, and as I did a behavioral code analysis of Docker, I noticed that its
daemon.go module has been a development hotspot for years. This is unsurprising given that the daemon is a central part of Docker. However, its complexity trend made me think:
The previous figure shows that the hotspot was refactored in 2016, but the past years have seen some code complexity climb back in. Specifically, there’s a steep increase starting in 2017. Would it have been possible to predict this complexity increase? Yes, let’s see how it looks.
Predicting Future Complexity Increase
First a word about the
Code Health metric. The Code Health metric goes from
10 (healthy code that’s relatively easy to understand and evolve) down to
1, which indicates code with severe quality issues. CodeScene calculates code health using a combination of both properties of the code as well as organizational factors. The factors are chosen based on research, and known to correlate with increased maintenance costs and the risk for defects. We then weight, normalize, and score the findings according to our baseline data to come up with the code health metric. With that covered, let’s return to Docker.
The nice thing with version-control data is that it’s easy to travel in time. So armed with CodeScene’s new predictive analyses, I rolled back the Docker Git repository to 2017 before the rise in code complexity and ran some analyses. Here’s what the analysis results looked like in 2017:
Wow! So that future complexity growth in
daemon.go was predicted already back in 2017, which means it could have been prevented. I’ll talk more about how that is possible soon, but let’s cover why this is interesting and useful first by talking about how we eat our own dog food at Empear.
For the Love of Dog Food
At Empear, we developers are using CodeScene ourselves. That means we often see opportunities for new features. You know, like
hmm, if I had this data point too, then I could answer that specific question quicker. Those thoughts are then fed back into our code and get implemented in CodeScene.
The code health decline prediction came into life through that process. Larger code quality issues and significant technical debt is hard and expensive to act upon. As a consequence, we frequently noticed that once a hotspot has declined in code health it tends to stay that way; the cost to restore its health will always compete with more pressing immediate concerns, quite often the drive for new features.
This means that anything we can do to provide an early detection mechanism would be valuable. That way, an organization can do pro-active refactorings while they are still affordable and avoid preventable future maintenance headaches.
Predicting declining Code Health: How it Works
So how does CodeScene pull off its predictions? Black Magic? Unfortunately reality is slightly more mundane. But not much. Basically we have accumulated lots of historic data from real-world codebases. This makes it possible to apply algorithms and machine learning to pick up patterns. We guide the pattern selection process with our domain expertise; we have analyzed hundreds of codebases over the past years, and built a decent understanding of how code evolves.
For example, a module with low cohesion and too many responsibilities might stabilize. But combine those design smells with heavy developer congestion and the potential problems can quickly grow into a real maintenance nightmare. Similarly, a complex method might be something we can live with, but if that code is a knowledge island and the only developer who understands it leaves at the same time that new features get implemented in that area, then things can turn south quickly.
A common theme across our predictors is that it’s rarely some properties of just the current code that causes a decline in quality. Very often it’s a combination of properties of the code with organizational factors like overlapping team responsibilities or long-term trends that start to accelerate. We couldn’t have done these predictions without CodeScene’s knowledge about the social aspects of code and its history.
Explore More and try CodeScene
I hope you are as excited about this new CodeScene feature as I am. We have used the code health decline predictions internally as part of our services, and detected several examples on how CodeScene finds real, growing problems very early. The effect has been spectacular. We don’t catch all future issues – that would be a true precog – but the one’s we catch are relevant.
These predictive capabilities arm development organizations with the superpower of being pro-active by acting on quality issues at a stage where it’s still affordable and relatively easy.
You can also check out the product reviews to see who else is using CodeScene and the value they get out of it.