Understanding source code is a crucial aspect of software development, but it can often be challenging and intimidating. Whether it's examining someone else's code or revisiting our own past work, deciphering code can leave us feeling perplexed and inadequate. However, there are various approaches and techniques that can help developers gain a deeper understanding of source code and overcome these obstacles.
In this blog post, we will explore the pain points of understanding source code and delve into less common yet effective ways to comprehend code. We will discuss the importance of team collaboration, alternative representations of source code, leveraging version control history, the significance of tests, and the benefits of scratch refactoring. By embracing these approaches, developers can enhance their ability to understand code and navigate complex software projects more efficiently. So, let's dive in and discover how we can unravel the mysteries of source code!
1. Pain points
1.1 It can be hard and scary
Understanding source code written by someone else or even one's past work can be hard and irritating, often leaving us feeling challenged, puzzled, and occasionally inadequate. Delving into unfamiliar source code, particularly under time pressure, can be daunting. Only a select few have managed to conquer this feeling through repeated exposure, deliberate practice, and accumulated experience. It is crucial to recognize that understanding source code requires a distinct set of skills separate from those needed for its creation.
1.2 Writing code is easier
Developers primarily acquire programming skills through writing code. However, there is an imbalance in the emphasis and attention given to writing software versus understanding it. There is an implicit expectation placed upon developers, both by themselves and others, that they should be able to understand the code they work with. Unfortunately, we often fail to distinguish between the skills involved in writing code and those necessary for understanding it. Consequently, we lack comprehensive information and resources on the subject.
1.3 Where to begin?
When developers work with a code base, they may find it confusing to decide where to start when trying to understand it. However, the task at hand usually helps narrow down the focus area. For example, fixing a bug, removing a feature, modifying a feature, or starting to work with a new team all require different starting points. Figuring out the right starting point depends on experience and familiarity with the architecture, tools, and platform/framework being used.
1.4 Dialects and scale
Developers, much like speakers of common languages, possess their own dialects in coding. Their past experiences and the existing patterns in the codebase influence their coding style. Many general-purpose programming languages are now embracing multiple paradigms. For instance, a single language can be used to write programs using both functional and object-oriented approaches. Developers may lean towards a particular paradigm based on their preferences. When collaborating on source code, the interplay of paradigms, patterns, libraries, and personal choices adds diversity to the codebase. This diversity can be significant, particularly in large projects with numerous contributors. However, it also introduces additional cognitive overhead when attempting to understand the code.
1.5 Fewer resources
While resources for understanding source code are not as abundant as those for writing new software, some notable individuals have emphasized the importance of reading code. Figures such as Felienne Hermans, Marit van Dijk, and Trisha Gee have created content specifically dedicated to this topic. Additionally, Tudor Girba advocates making software systems more explainable using Glamorous Toolkit. Their contributions have helped shed light on the significance of code comprehension in the software development process.
1.6 Tacit knowledge
Understanding code requires a variety of skills that are not easy to explain or acquire. Developers fall on a spectrum, with some struggling to comprehend code while others excel at it. Interestingly, the ability to read and understand code may not necessarily impact a person's skill in writing new code too much. However, these skills and knowledge are difficult to transfer through writing or talking. Experience and deliberate practice in understanding code can make a difference for developers.
1.7 Lack of systematic approaches
Understanding source code is a highly individual and implicit process that varies from person to person. Despite efforts to educate individuals on approaching code bases, the vast amount of information involved makes comprehensive communication challenging. Some of this information can be abstract or dependent on an individual's own abilities. As far as I am aware, there are only a few resources available that can assist others in consistently reproducing results across different code bases.
1.8 Source code is non-linear
Unlike literature, source code lacks a linear structure. It cannot be read from top to bottom for a complete understanding. Instead, source code represents a graph, where understanding the relationships between interconnected elements is more meaningful than reading it like a piece of prose. IDEs excel at assisting developers in navigating this non-linear structure. However, they often struggle when it comes to filtering information and providing a broader perspective on the codebase as a whole.
1.9 Limitations of tools
We often confine ourselves to the tools we are comfortable with, such as IDEs, tests, and GUI-based version control clients. These tools are typically optimized to excel at specific tasks and do an excellent job in those areas. However, they may fall short when it comes to understanding source code. Fortunately, there are tools designed explicitly for comprehending source code. Tools like jQAssistant, Eureka, and Glamorous Toolkit can be immensely helpful in the JVM ecosystem. While these tools may have a learning curve, investing time in mastering them can yield outstanding results.
1.10 Documentation
"No matter what the documentation says, the source code is the ultimate truth, the best and most definitive and up-to-date documentation you're likely to find."
—Jeff Atwood
In custom software development, source code documentation often takes a backseat. It is seen as more effort with fewer benefits, resulting in nonexistent or outdated documentation. Instead, a pragmatic approach is to rely on tests for documenting behavior in custom software. Documentation burdens developers to keep it in sync with the code's behavior. It's important to note that not all documentation is equal; architectural decision records and system design documents are valuable for understanding software systems. However, documenting the source code tends to provide a lower return on investment. Additionally, writing effective documentation for human understanding is a more challenging task compared to coding for computers.
1.11 Mindset
To become skilled at understanding code, it's important to keep an open mind, be non-judgmental, and showcase persistent curiosity. Developers often have a habit of wanting immediate answers. However, when working with someone else's code, it's crucial to be patient and ask questions instead of making hasty assumptions. It's okay to say, "I don't know" until you have enough information to form ideas or conclusions. Also, remember that our idea of good code is often based on what we're familiar with. When assessing someone else's code, try to consider the challenges and constraints they faced instead of solely focusing on your own approach. Showing respect for code maintainers (both past and present) and having empathy can make the already difficult journey of understanding code much smoother.
1.12 Mental model
"Note that the quality of a later programmer's work is related to the match between their theories and the previous programmer's theories."
—Peter Naur
Building a mental model of source code involves grasping both the problem domain and the corresponding solution that the source code seeks to capture. It's important to recognize that either of these elements may be incomplete or unclear. When working on extending or modifying existing software, the task becomes more complex, as it requires understanding not only the past problem and solution but also the current problem and your proposed solution. Additionally, you must navigate the constraints imposed by the existing codebase. Developing a comprehensive mental model takes time and effort, as it involves integrating insights from various viewpoints and combining them to form an understanding.
2. Common ways to understand source code
2.1 Reading source code from the IDE
Most developers are familiar with reading source code to understand it. However, it's important to note that reading code is not always the sole or primary method for comprehension. While it can be effective when you have a specific goal in mind, it can also be limiting, especially when dealing with large files or when you're unsure where to begin.
2.2 Code walkthrough
Having someone who is familiar with the code walk you through it can be a less stressful approach. This person can provide valuable insights about the business domain, important aspects of the source code, design decisions, architectural choices, and tribal knowledge specific to the project that may not be apparent just by examining the code on your own. These walkthrough sessions also allow you to interact directly with someone who has contextual understanding, enabling you to save time by asking questions and clarifying any uncertainties you may have.
2.3 Running the application
One of the most common and practical approaches to understanding what an application does is by interactively running and engaging with it. This method is most beneficial when dealing with GUI or CLI applications. By actively running and interacting with the application, you can gain valuable insights into its functionality and behavior.
2.4 Debugger
When running an application, debuggers can be invaluable in helping you narrow down specific code paths that you're interested in. Certain debuggers offer features that enable you to interact with the code by modifying variables and even manipulating the execution stack. These capabilities empower you to gain deeper insights and better understand the inner workings of the code during runtime.
3. Less common ways to understand source code
3.1 Version control history
One underrated approach to understanding code is leveraging the version control history. While we often rely on annotations to identify code authors, version control systems offer a wealth of additional information. This includes insights into the code's evolution over time, its relationships with other files (including configuration files), and churn as an indicator of the code's importance in the past and present. Moreover, examining version control can provide valuable ownership details and highlight potential knowledge loss risks if maintainers leave or move away from the project. Utilizing plugins like GitLens can harness the power of version control history within your repository, allowing you to creatively use this information and gain additional dimensions to comprehend the source code.
3.2 Alternative representations of source code
Often, we approach source code solely as text, overlooking its potential as valuable information about the problem domain it addresses. Source code embodies relational information both between files and within files. As this information forms a graph, alternative visualizations such as dependency structure matrices and edge-bundling graphs can significantly aid developers in quickly grasping the problem domain. Eureka, for instance, generates edge bundling graphs for Java and Kotlin classes, surpassing the limitations of traditional source code displays. For example, when dealing with large classes, scrolling becomes necessary to fit the entire source code text on a single screen. However, Eureka can easily visualize even lengthy classes spanning several thousand lines on a single screen. It is important to note that alternative source code representations do not replace the act of reading code; instead, they serve as complementary tools, allowing developers to zoom out, filter unnecessary information, and gain a broader perspective.
3.3 Scratch refactoring
Scratch refactoring is a technique aimed at comprehending source code without the risk of introducing breaking changes. This involves creating a separate branch in the repository and making modifications such as renaming variables or extracting functions that you deem necessary to gain a better understanding of the code. After you have gained insight into what the code does, the branch can be safely deleted. Although some may perceive this approach as wasteful without experiencing its benefits firsthand, scratch refactoring proves to be an effective technique for understanding source code.
3.4 Reading existing tests
If you're fortunate enough that the code already includes tests, you can examine them instead of delving directly into the production code to comprehend its behavior. Well-written tests often provide multiple examples that aid in understanding the code and allow you to consider scenarios that the original author/authors may have overlooked when implementing the functionality. By analyzing these tests, you can gain valuable insights into how the code is intended to work and uncover potential edge cases.
3.5 Writing tests
If the code lacks tests, you have the option to gradually uncover and understand it by writing characterization tests. While writing tests may not be the most enticing choice for developers, it offers several benefits when dealing with complex and valuable code.
- Tests serve as documentation, providing clear examples of how the code should behave.
- Tests provide a safety net when modifying the source code, offering regression protection against unintended changes.
It's worth noting that depending on the type of tests you write, you can achieve regression protection even without explicitly documenting the code. For instance, combination tests can be utilized in certain scenarios. Sometimes, developers cite a lack of time as an excuse for not writing tests. However, libraries like Approvals can assist developers in quickly incorporating code into tests, making the process more efficient.
3.6 Pair/ensemble programming
Programming is not always a task done alone. When dealing with unfamiliar code or technology, teaming up with at least one person to improve your workflow is beneficial. Trying to tackle a complex codebase on your own, especially when it's unfamiliar, can be slow and counterproductive. Having someone else to help you understand the code, share ideas, confirm assumptions, challenge your understanding, and provide different viewpoints can make your work more productive. Collaborating with a teammate allows you to leverage teamwork to overcome challenges and gain a better grasp of the codebase.
Conclusion
Understanding source code is an essential skill for developers, and exploring different techniques and approaches can significantly enhance this ability. We have discussed various pain points associated with comprehending code and have explored alternative ways to understand source code to overcome these challenges.
In particular, one tool that stands out is Eureka, which offers a comprehensive visualization of even lengthy and complex Java and Kotlin classes. By using Eureka, developers can gain a deeper understanding of their source code, unravel its intricacies, and uncover insights that may have otherwise gone unnoticed. If you already have Homebrew, it is a no-brainer to install.
brew install legacycodehq/tap/eureka
As a next step, we encourage you to try Eureka in your Java and Kotlin projects. Embrace the power of alternative code representations and witness how they can revolutionize your understanding of complex codebases. Integrating Eureka into your development workflow can bring a fresh perspective to your source code and accelerate your journey toward code comprehension.