How Healthy is Your Codebase? Introducing Biomarkers for Code
We at Empear make heavy use of CodeScene ourselves. We use the tool as part of our services. Over the past years we have analyzed hundreds of different codebases, and there are some patterns that we have seen repeated over and over again. Thus, we have started to implement support in CodeScene for auto-detecting those patterns, and we have called the feature code biomarkers. We chose that name because we wanted to avoid terms like “quality” or “maintenance effort” since they suggest an absolute truth; instead, we wanted a concept that doesn’t judge, but acts like a friendly, unbiased, and skilled team member.
Detect Your Code’s Biomarkers
In medicine, a biomarker is a measure that might indicate a particular disease or physiological state of an organism. CodeScene’s biomarkers do the same for code. Combined with our biomarker trend measures, you get a high-level summary of the state of your hotspots and the direction your code is moving in. Code biomarkers act like a virtual code reviewer that looks for patterns that might indicate problems.
Code biomarkers are scored from
A is the best and
E indicates code with severe potential problems. CodeScene also aggregates those scores into a total score for the whole project. This lets you keep track of the overall status. As an example, the next figure shows a particular codebase that has improved over the past month, indicated by the move from a
D score to a
I spend a lot of time reviewing code, and over the years I’ve learned to look for certain high-level patterns that are likely to indicate problematic designs. Our goal with the biomarkers concept is to automate that pattern detection. Hence, you can click on a hotspot and inspect the biomarkers in detail:
The detailed information is intended to help developers select appropriate refactoring steps. For example, if we consider the previous figure, we note that that hotspots contains a large Brain Method,
GenerateInput. A brain method is simply a large function with high complexity that seems to do too many things. Modularizing the design by splitting the method into smaller, well-named methods with clear responsibilities is likely to improve the design by making it easier to read and understand the overall algorithm.
Biomarkers introduce short Feedback Loops
In large-scale systems, social factors tend to be at least as important as any technical issues you might have. In fact, as I wrote in my book, we often mistake organizational problems for technical issues. Hence, we have developed biomarkers that detect organizational issues that are known to correlate with unwanted properties like defects and low organizational system mastery. The next figure shows an example:
All together, code biomarkers fill a number of important gaps by providing feedback loops in an organization:
- Bridge the gap between developers and non-technical stakeholders: The biomarkers help you decide when it’s time to take a step back and invest in technical improvements, versus when it’s OK to continue to add features at a high pace.
- Get immediate feedback on improvements: Biomarker trends give you immediate and visual feedback on the investments you make in refactorings. Not only is it motivating – it also helps ensure that you’re on track.
- Share an objective picture of your codebase: A successful project is one where everyone has a shared understanding of what the code looks like and how it evolves. CodeScene provides an additional monitor view where the biomarkers are continuously updated with the status of your ongoing work. Present the view on a TV in the office, as shown in the next figure, to create awareness of your technical debt.
Integrate Code Biomarkers in your Continuous Integration Pipeline
CodeScene offers integration points that let you incorporate the analysis results into your build pipeline. We have expanded that integration to also auto-detect files that seem to degrade in quality through issues introduced in the current commit or pull request. This is done by calculating code biomarkers, which are then supervised for their trend. The next figure shows an example by using CodeScene’s Jenkins plugin.
Metrics must be Actionable
Biomarker scores use baseline data from thousands of codebases, and your code is scored against an industry average of similar codebases. The biomarkers concept is built on top of CodeScene’s other metrics and behavioral data. That means we only score the prioritized parts of the codebase, the parts that are most likely to impact development and maintenance costs.
Hence, the biomarkers provide insights into the parts of your code that are most likely to benefit from improvements.