Better Than Silver Bullets: A Milestone for Behavioral Code Analysis
My work on code analytics started 10 years ago, and the CodeScene analysis tool has been my main focus for the past 4 years. CodeScene is the first real product built around the concept of behavioral code analysis, which is a radical departure from traditional static analysis techniques.
Over the past years, I have spoken at a ton of conferences, written two books, and published several articles and the occasional research paper on behavioral code analysis. I’ve done my best to popularize the field.
At times, this has been a lonely journey, so it’s great that more people and more companies are joining this community.
The most recent addition is GitLab that is now entering the behavioral code analysis space. This marks a milestone for the field, and a validation for me personally that what I have been claiming for years makes sense to outsiders as well; behavioral code analysis is as close as we get to a silver bullet for making sense of large-scale codebases. The level of insights and the speed with which we get them continues to fascinate me. Behavioral code analysis also has a clear advantage over silver bullets: it’s real.
GitLab’s entry also provides validation for the CodeScene tool. We never took any venture capital, but decided to build a great product first to prove the value and business model (read the full startup story here). Since then, CodeScene has grown into a tool suite that’s used by organizations around the world; thousands of people are using CodeScene in their daily work on large-scale codebases.
However, to serve a growing community, we need to focus around a common vocabulary that clarifies the concepts. Let me explain.
Growing a Community
We often joke that naming is one of the hardest problems in software. The reason those jokes are fun is because they are true. Naming is hard. The naming problem is there for a product in an evolving field too. I know, since I have made my fair share of mistakes.
I’m responsible for most of the names and concepts that you find in CodeScene. Some names are new to CodeScene, others are lifted from my books or academic research papers. What follows are some examples on how ill-chosen names cause unnecessary confusion:
- Temporal Coupling: The purpose of this analysis is to detect co-evolving modules that are modified together as part of the same logical change. The coupling analysis is my personal favorite, and I use it for a myriad of purposes, for example to reason about change impact. But the name isn’t well-chosen, and we now prefer to talk about Change Coupling. I explain why in Software Design X-Rays:
Abandoned Code: This analysis uncovers any knowledge gaps that we might have in our codebase due to code written by former contributors. You know, the kind of code no one else has worked on and, hence, is more expensive to modify since that requires learning unfamiliar code. In my books, I call this knowledge loss, but we now prefer to measure the inverse: How high is the System Mastery of a particular module?
Inter-Team Coordination: I’m fortunate to have a team of true world-class experts on CodeScene’s advisory board. I’m also fortunate in that all of them – as opposed to me – happen to be native English speakers. That means they can call out some of our naming issues, and “Inter-Team Coordination” is one of those. We now prefer Team Coupling since the term more accurately describes the situation where multiple teams need to work in the same parts of the code, which often indicates organizational or architectural issues.
My initial thinking is continuously evolving as I learn more by working with others in the community. I will continue to share those learnings. After all, behavioral code analysis is a young discipline, a new generation of code analysis, and leading the way means educating new users and encourage them to explore the space. When it comes to exploration and learning, a clear and consistent vocabulary is paramount. Join in and welcome to the CodeScene!