How I Start Working with Legacy Code

While I was cleaning up one of my old folders, I found an old application jar file from 2004. I remember that I wrote this app for my friend who used it for calculating some kind of gas consumption:

As you can see, the design is quite simple: you can set certain gas compounds and the application calculates the emission based on the compounds using my friend’s secret formulas. This application is eight years old now, and I don’t remember much about the details, but implementing the algorithm which interprets those formulas took me a lot of time, that’s for sure. I tend to revisit my old projects and see how differently I solve problems now than before, so I decided to revisit this evaluation algorithm and share it. Moreover, I can use this project for educational purposes like coding dojos: how to work with legacy code, one doesn’t really know how to work with unknown legacy code. So, here is my plan:

  1. I need a new project and a repository for the source files of the evaluation algorithm

  2. I must know what I’m dealing with and how much time it will take

  3. I must discover the available functionalities using test cases - black box testing

  4. I must fully understand (100% coverage) what the code does using more test cases - white box testing

  5. I can start refactoring following the previously generated code metrics

Getting the Source Code and Creating a Project

Unfortunately, neither my friend nor I have the source code anymore, so I’m using jad to decompile the class files and recreate the sources. These are the directories after the decompilation:

~/Desktop/emcalc % ls -1
~/Desktop/emcalc %

I’m not interested in the gui, html and xml folders, because I’m petty sure that they do something else, so let’s collect the files from the calculation and exception folders and create a new eclipse project. At the moment, I don’t care about the package structure, so I’m putting everything into one single package:

I want to share this code so that others can practice on it, so I’m creating a github repository and pushing the current structure to the master branch, and I’m creating a different branch for myself. All my changes will go to the zsolt_refactoring branch, because if I commit my changes into the master branch, others who may want to use the code, won’t be able to use the HEAD as a starting point.

What I’m Dealing with

Before I start working with legacy code I like to know what I’m dealing with. In this case I ran different code analyzer tools. The detailed reports from** findbugs, pmd, and crap4j **can be found in the target/reports folder. Here’s a quick summary:

  • Findbugs: 2 bugs: malicious code vulnerability

  • PMD: 1 priority 1 and 18 priority 2 issues

  • Crap4j: 5 classes whose complexity is above 5, and one with a complexity of 17

It is not that bad, I expected worse. Based on this information I may need 2-3 hours to finish the work, which fits into my weekend schedule. If I have to deal with a large code base - larger than this - static code analysis can help me find out what I’m dealing with. If those numbers had been greater I wouldn’t have done anything, because it’s not worth the effort. You may wonder why I haven’t run checkstyle analysis. The reason is quite simple: I’m working with decompiled code and that analysis would show how good jad is, which would be useless in this case. However, when I have source I wrote myself, I definitely run checkstyle, too.

The First Test Cases

It is not wise to start working on legacy code with test cases, which can help discover what the code actually does. Of course, I can review the code, which in this case may be a wise decision, because there aren’t too many classes and their dependency tree is simple, but when I’m dealing with a lot of classes, I most probably won’t have time for a thorough code review.

If I have the “entry point” to the application, I can start writing test cases which call this entry point and I can check the returned results. In most of the cases this entry point is the main method, but in this case there is no main method at all - it is somewhere in the previously deleted gui folder. Hint: eclipse can help find the main methods: right click on the project -> Run as… -> Java Application.

The classes which hold the entry points have a unique attribute: no other class depends on them. If I’m able to determine the dependency graph in this project, I will find the entry points I’m looking for. I tried different tools like jDependCDA and classycle, but the result was a bit disappointing: it was hard to use these tools and their results weren’t satisfactory enough. Nevertheless, classycle could tell me this:

class evaluator.NotInDomainException (578 bytes)
class evaluator.Expressions (1836 bytes)
    class evaluator.Operator (1716 bytes)
    class evaluator.Calculate (2839 bytes)
class evaluator.Function (1844 bytes)
    class evaluator.Interval (1419 bytes)
    class evaluator.NotInDomainException (578 bytes)
    class evaluator.Calculate (2839 bytes)
class evaluator.Operator (1716 bytes)
class evaluator.Calculate (2839 bytes)
    class evaluator.Operator (1716 bytes)
class evaluator.InvalidConfigFormatException (477 bytes)
class evaluator.Interval (1419 bytes)
class evaluator.InvalidDataException (453 bytes)

None of the classes depends on the Expressions or on the Function classes, so they are the entry points I’m looking for. Additionally, the InvalidDataException and InvalidConfigFormatException classes are not referenced at all, so I don’t need them in this project.

Let’s start writing test cases for the Expressions and Function classes. At this point, I’d rather not look at the code if possible - black box testing - and try to write a green test case based on the available javaDoc and method/class signatures. This is the best approach for me to learn as much as possible about the code in a short period of time. Reading code takes a lot of time, and I’d like to make some progress, so I’m “kicking” the project until there is a green test case. After three minutes, I have the green test cases:

public class ExpressionsTest {
    public void testSomething() {
        Hashtable table = new Hashtable();
        table.put("expression", "6 * 2");
        Expressions expressions = new Expressions(table, null, 0);
        assertEquals(12.0, expressions.getResult("expression"), 0.0);
public class FunctionTest {
    public void testSomething() throws NotInDomainException {
        Function function = new Function();
        function.addInterval(new Interval(-5.0, 5.0), "6 * 2");
        assertEquals(12.0, function.getValue(2.0), 0.0);

Here is what I’ve learnt: the arithmetic expressions are given through wrappers like a Hashtable or an internal class Interval. I have no idea why, but it would be fun to find out. And, the Expressions and Function classes seem to do the same thing.

In the first example I only used the ‘*’ operator, but let’s see what happens if I use other operators like ‘+’, ‘-‘, ‘/’, ‘^’, ‘(‘ and ‘)’? Moreover, it may be possible to use variables, too. So I’m writing more test cases using different operators and variables.

Cover Everything

Although I have several working test cases, I’m unable to use variables in them. I’m stopping here for the moment, because I’m doing two things at the same time: covering the evaluation part and writing test cases for the variable replacement part. Let’s see the test coverage of the evaluation part first. After running cobertura, it seems that I’ve managed to cover every possible scenario except those which are related to the variable replacement part. It’s time to change from the black box testing which is very fast to the white box testing, which is more precise, but slower.

It seems that the classes are a bit strict about how the variables should be named. For example:

  expression2.append((new Double(getResult(element))).toString());
  expression2.append(new Double(factor));

This will be the first thing I’m going to change after the code is 100% covered and refactored. Nevertheless, I’m adding some test cases where the variables are named according to the code.

Refactoring and Closing Thoughts

I won’t repeat the refactored code here - unless you explicitly ask for it in a comment -, you can have a look and comment it on github. The purpose of the post was to share how I start working with legacy code: how to use different tools and testing methods to have the best result in a reasonable timeframe. I hope you liked it, have a good time refactoring and don’t forget that every code which was written more than 3 weeks ago is legacy :-)

comments powered by Disqus