Managing the Uncertainty of Legacy Code: Q&A Part #1 with J.B. Rainsberger

In this first chapter of our three-part Q&A blog series he adressed questions that came up during his session.

On June 3, 2020 J.B. Rainsberger spoke in our remote Intro Talk about managing the various kinds of uncertainty that we routinely encounter on projects that involve legacy code. He presented a handful of ideas for how we might improve our practices related to testing, design, planning, and collaboration. These ideas and practices help us with general software project work, but they help us even more when working with legacy code, since legacy code tends to add significant uncertainty and pressure to every bit of our work. Fortunately, we can build our skill while doing everyday work away from legacy code, then exploit that extra skill when we work with legacy code.

Our next remote course Surviving Legacy Code from 14-17 September 2020.

J. B. Rainsberger helps software companies better satisfy their customers and the business that they support.

Here are some questions that came up during this session and some answers to those questions.

One of the issues is that the legacy code base consists of useful code and dead code and it’s hard to know which is which.

Indeed so. Working with legacy code tends to increase the likelihood of wasting time working with dead code before we feel confident to delete it. I don’t know how to avoid this risk, so I combine monitoring, testing, and microcommitting to mitigate the risk.

Microcommits make it easier to remove code safely because we can recover it more safely. Committing frequently helps, but also committing surgically (the smallest portion of code that we know is dead) and cohesively (portions of code that seem logically related to each other) helps. If our commits are more independent, then it’s easier to move them backward and forward in time, which makes it easier to recover some code that we mistakenly deleted earlier while disturbing the live code less. We will probably never do this perfectly, but smaller and more-cohesive commits make it more likely to succeed. This seems like a special case of the general principle that as I trust my ability to recover from mistakes more, I feel less worried about making mistakes, so I change things more aggressively. When I learned test-driven development in the early years of my career, I noticed that I become much more confident to change things, because I could change them back more safely. Practising test-driven development in general and microcommitting when working with legacy code combine to help the programmer feel more confident to delete code—not only code that seems dead.

Even with all this, you might still feel afraid to delete that code. In that case, you could add “Someone executed this code” logging statements, then monitor the system for those logging statements. You could track the length of time since you last saw each of these “heartbeat” logging messages, then make a guess when it becomes safe to delete that code. You might decide that if nothing has executed that code in 6 months, then you judge it as dead and plan to remove it. This could never give us perfect confidence, but at least it goes beyond guessing to gathering some amount of evidence to support our guesses

More testing, especially microtesting, puts more positive pressure on the design to become simpler: less duplication, better names, healthier dependencies, more referential transparency. I have noticed a pattern: as I simplify the design, I find it easier to notice parts that look irrelevant and I find it clearer that those parts are indeed dead code. Moreover, sometimes obviously dead code simply appears before my eyes without trying! This makes it safer to delete that code, using the microcommitting and monitoring as a recovery strategy in case I get it wrong.

So not all legacy code adds value to the business… but it is hard to know which part does.

Indeed so. We have to spend time, energy, and money to figure this out. I accept responsibility as a programmer to give the business more options to decide when to keep the more-profitable parts running and to retire the less-profitable parts. As I improve the design of the system, I create more options by making it less expensive to separate and isolate parts of the system from each other, which reduces the cost of replacing or removing various parts. Remember: we refactor in order to reduce volatility in the marginal cost of features, but more-generally in the marginal cost of any changes, which might include strangling a troublesome subsystem or a less-profitable feature area.

The Strangler approach describes incrementally replacing something in place: adding the new thing alongside the old thing, then gradually sending traffic to the new thing until the old thing becomes dead. Refactoring the system to improve the health of the dependencies makes this strangling strategy more effective, which gives the business more options to replace parts of the legacy system as they determine that a replacement would likely generate more profit. As we improve the dependencies within the system, we give the business more options by reducing the size of the smallest part that we’d need to replace. If we make every part of the system easier to replace, then we increase the chances of investing less to replace less-profitable code with more-profitable code.

This illustrates a general principle of risk management: if we don’t know how to reduce the probability of failure, then we try reducing the cost of failure. If we can’t clearly see which parts of the legacy code generate more profit and which ones generate less, then we could instead work to reduce the cost of replacing anything, so that we waste less money trying to replace things. This uses the strategy outlined in Black Swan of accepting small losses more often in order to create the possibility of unplanned large wins.

What do you think about exploratory refactoring? Do you use this technique sometimes?

Yes, I absolutely do! I believe that programmers can benefit from both exploratory refactoring and feature-oriented refactoring, but they need to remain aware of which they are doing at any time, because they might need to work differently with each strategy to achieve those benefits.

When I’m refactoring in order to add a feature or change a specific part of the code, I remind myself to focus on that part of the code and to treat any other issues I find as distractions. I write down other design problems or testing tasks in my Inbox as I work. I relentlessly resist the urge to do those things “while I’m in this part of the code”. I don’t even follow the Two-Minute Rule here: I insist on refactoring only the code that right now stands between me and finishing the task. Once I have added my feature, I release the changes, then spend perhaps 30 minutes cleaning up before moving on, which might include finishing a few of those Two-Minute tasks.

The rest of the time, I’m exploring. I’m removing duplication, improving names, trying to add microtests, and hoping that those activities lead somewhere helpful. This reminds me of the part of The Goal, when the manufacturing floor workers engineered a sale by creating an efficiency that nobody in the sales department had previously thought possible. When I do this, I take great care to timebox the activity. I use timers to monitor how much time I’m investing and I stop when my time runs out. I take frequent breaks—I use programming episodes of about 40 minutes—in order to give my mind a chance to rise out of the details and notice higher-level patterns. I don’t worry about making progress, because I donI ’t yet know what progress would look like—instead I know it when I see it. By putting all these safeguards in place, I feel confident in letting myself focus deeply on exploring by refactoring. I avoid distracting feelings of guilt or pressure while I do this work. I also feel comfortable throwing it all away in case it leads nowhere good or somewhere bad. This combination of enabling focus and limiting investment leads me over time to increasingly better results. As I learn more about the code, exploratory refactoring turns into feature-oriented refactoring, which provides more slack for more exploratory refactoring, creating a virtuous cycle.

What is your experience with Approval Tests, in cases where writing conventional unit tests might be to expensive?

I like the Golden Master technique (and particularly using the Approval Tests library), especially when text is already a convenient format for describing the output of the system. I use it freely and teach it as part of my Surviving Legacy Code course. It provides a way to create tests from whatever text output the system might already produce.

I get nervous when programmers start going out of their way to add a text-based interfaces to code that doesn’t otherwise need it only for the purpose of writing Golden Master tests. In this case, checking objects in memory with equals() tends to work well enough and costs less. I notice it often that programmers discover a helpful technique, then try to use it everywhere, then run into difficulties, then invest more in overcoming those difficulties than they would invest in merely doing things another way. Golden Master/Approval Tests represents merely another situation in which this risk comes to the surface.

I get nervous when programmers start choosing to write integrated tests for code where microtests would work equally well. When programmers think about adding Golden Master tests, they tend to think of these as end-to-end tests, because they often judge that as the wisest place to start. Just as in the previous paragraph, they sometimes fall into the trap of believing that “since it has helped so far, we must always do it this way”. No law prevents you from writing unit tests using Golden Master/Approval Tests! Indeed, some of the participants of my Surviving Legacy Code training independently discover this idea and use it to great effect. Imagine a single function that tangles together complicated calculations and JSON integration: it might help a lot to use Approval Tests to write Golden Master tests for this function while you slowly isolate the calculations from the JSON parsing and formatting. The Golden Master tests work very well with multiline text, such as values expressed in JSON format, but probably make the calculation tests awkward, compared with merely checking numeric values in memory using assertEquals().

When programmers use Golden Master/Approval Tests, they need to treat it as just one tool in their toolbox. This is the same as with any technique! I tend to treat Golden Master as a temporary and complementary technique. I use it when I focus on writing tests as a testing technique, even though I tend to prefer to write tests for design feedback. Not everyone does this! If you find yourself in the stage where you’re drowning in defects and need to focus on fixing them, then Golden Master can be a great tool to get many tests running early. Once you’ve stopped drowning, it becomes easier to look at replacing Golden Master with simpler and more-powerful unit tests—eventually microtests.


➡️ Also read our two Q&A Blogposts with J.B. Rainsberger Part #2 The Risks Related to Refactoring Without Tests” and Part #3 “Questions About Test Frameworks“! Follow us on Twitter or LinkedIn to get new posts.