Tuesday, November 20, 2007

Stop the madness

I've been extending a legacy codebase lately to make it a bit more testable, and a few small, bad decisions have slowed my progress immensely.  One decision isn't bad in and of itself, but a small bad decision multiplied a hundred times leads to significant pain.  It's death by a thousand cuts, and it absolutely kills productivity.  Some of the small decisions I'm seeing are:

  • Protected/public instance fields
  • Inconsistent naming conventions
  • Try-Catch-Publish-Swallow
  • Downcasting

The pains of these bad decisions can wreak havoc when trying to add testability to legacy codebases.

Public/protected instance fields

One of the pillars of OOP is encapsulation, which allows clients to use an object's functionality without knowing the details behind it.  From FDG:

The principle states that data stored inside an object should be accessible only to that object.

Followed immediately by their guideline:

DO NOT provide instance fields that are public or protected.

It goes on to say that access to simple public properties are optimized by the JIT compiler, so there's no performance penalty in using better alternatives.  Here's an example of a protected field:

public class Address
{
    protected string zip;

    public string Zip
    {
        get { return zip; }
    }
}

public class FullAddress : Address
{
    private string zip4;

    public string Zip4 
    {
        get
        {
            if (Zip.Contains("-"))
            {
                zip4 = zip.Substring(zip.IndexOf("-") + 1);
                zip = zip.Substring(0, zip.IndexOf("-"));
            }
            return zip4;
        }
    }
}

There was an originally good reason to provide the derived FullAddress write access to the data in Address, but there are better ways to approach it.  Here's a better approach:

public class Address
{
    private string zip;

    public string Zip
    {
        get { return zip; }
        protected set { zip = value; }
    }
}

I've done two things here:

  • Added a protected setter to the Zip property
  • Changed the access level of the field to private

Functionally it's exactly the same for derived classes, but the design has greatly improved.  We should only declare private instance fields because:

  • Public/protected/internal violates encapsulation
  • When encapsulation is violated, refactoring becomes difficult as we're exposing the inner details
  • Adding a property later with the same name breaks backwards binary compatibility (i.e. clients are forced to recompile)
  • Interfaces don't allow you to declare fields, only properties and methods
  • C# 2.0 added the ability to declare separate visibility for individual getters and setters

There's no reason to have public/protected instance fields, so make all instance fields private.

Inconsistent naming conventions

Names of classes, interfaces, and members can convey a great deal of information to clients if used properly.  Here's a good example of inconsistent naming conventions:

public class Order
{
    public Address address { get; set; }
}

public class Quote : Order
{
    public void Process()
    {
        if (address == null)
            throw new InvalidOperationException("Address is null");
    }
}

When I'm down in the "Process" method, what is the "address" variable?  Is it a local variable?  Is it a private field?  Nope, it's a property.  Since it's declared camelCase instead of PascalCase, it led to confusion on the developer's part about what we were dealing with.  If it's local variable, which the name suggests, I might treat the value much differently than if it were a public property.

Deviations from FDG's naming conventions cause confusion.  When I'm using an .NET API that uses Java's camelCase conventions, it's just one more hoop I have to jump through.  In places where my team had public API's we were publishing, it wasn't even up for discussion whether or not we would follow the naming conventions Microsoft used in the .NET Framework.  It just happened, as any deviation from accepted convention leads to an inconsistent and negative user experience.

It's not worth the time to argue whether interfaces should be prefixed with an "I".  That was the accepted convention, so we followed it.  Consistent user experience is far more important than petty arguments on naming conventions.  If I developed in Java, I'd happily use camelCase, as it's the accepted convention.

Another item you may notice as there are no naming guidelines for instance fields.  This reinforces the notion that they should be declared private, and the only people who should care about the names are the developers of that class and the class itself.  In that case, just pick a convention, stick to it, and keep it consistent across your codebase so it becomes one less decision for developers.

Try Catch Publish Swallow

Exception handling can really wreck a system if not done properly.  I think developers might be scared of exceptions, given the number of useless try...catch blocks I've seen around.  Anders Heljsberg notes:

It is funny how people think that the important thing about exceptions is handling them. That is not the important thing about exceptions. In a well-written application there's a ratio of ten to one, in my opinion, of try finally to try catch. Or in C#, using statements, which are like try finally.

Here's an example of a useless try-catch:

public class OrderProcessor
{
    public void Process(Order order)
    {
        try
        {
            ((Quote)order).Process();
        }
        catch (Exception ex)
        {
            ExceptionManager.Publish(ex);
        }
    }
}

In here we have Try Catch Publish Swallow.  We put the try block around an area of code that might fail, and catch exceptions in case it does.  To handle the exception, we publish it through some means, and then nothing.  That's exception swallowing.

Here's a short list of problems with TCPS:

  • Exceptions shouldn't be used to make decisions
  • If there is an alternative to making decisions based on exceptions, use it (such as the "as" operator in the above code)
  • Exceptions are exceptional, and logging exceptions should be done at the highest layer of the application
  • Not re-throwing leads to bad user experience and bad maintainability, as we're now relying on exception logs to tell us our code is wrong

Another approach to the example might be:

public class OrderProcessor
{
    public void Process(Order order)
    {
        Quote quote = order as Quote;
        
        if (quote != null)
            quote.Process();
    }
}

The only problem this code will have is if "quote.Process()" throws an exception, and in that case, we'll let the appropriate layer deal with those issues.  Since I don't any resources to clean up, there's no need for a "try..finally".

Downcasting

I already wrote about this recently, but it's worth mentioning again.  I spent a great deal of time recently removing a boatload of downcasts from a codebase, and it made it even worse that the downcasts were pointless.  Nothing in client code was using any additional members in the derived class.  It turned out to be a large pointless hoop I needed to jump through to enable testing.

Regaining sanity

The problem with shortcuts and knowingly bad design decisions is that this behavior can become habitual, and the many indiscretions can add up to disaster.  I had a high school band director who taught me "Practice doesn't make perfect - perfect practice makes perfect."

By making good decisions every day, it becomes a habit.  Good habits practiced over time eventually become etched into your muscle memory so that it doesn't require any thought.  Until you run into a legacy codebase that is, and you realize how bad your old habits were.

No comments: