Tuesday, June 19, 2007

The problem with code comments

Let me first state that I'm not proposing eliminating the use of code comments.  Code comments can be very helpful pointing a developer in the right direction when trying to change a complex or non-intuitive block of code.

Additionally, I'm also not referring to API documentation (like C# XML code comments) when I'm referring to code comments.  I'm talking about the little snippets of comments that some helpful coder gave to you as a gift to explain why this code is the way it is.  Code comments are easy to introduce, can be very helpful, so what's the big deal?

Code comments lie

Code comments cannot be tested to determine their accuracy.  I can't ask a code comment, "Are you still correct?  Are you lying to me?"  The comment may be correct, or it may not be, I don't really know unless I visually (and tediously) inspect the code for accuracy.

I can trust the comment and just assume that whatever it tells me is still correct.  But everyone knows the colloquialism about assuming, so chances are I'll be wrong to assume.  Who's to blame then, me or the author of the original comment?  It can be dangerous to assume that a piece of text that is neither executable nor testable is inherently correct.

Code comments are another form of duplication

This duplication is difficult to see unless I need to change the code the comment pertains to.  Now I have to make the change in two places, but one of the places is in comments.  If every complexity in code required comments, how much time would I need to spend keeping the original comments up to date?  I would assert that it takes as much time to update a code comment as it does to make a change on the code being commented.

Since the cost to maintain comments is high, they're simply not maintained.  They then fall into another pernicious category of duplication where the duplicate is stale and invalid.  When code comments are invalid, they actually hurt the next developer looking at the code because the comment may lie to the developer and cause them to introduce bugs, make the wrong changes, form invalid assumptions, etc.

Code comments are an opportunity cost

I think the real reason code comments aren't maintained is that most developers instinctively view them as an opportunity cost.  That is, spending time (and therefore money) to maintain code comments costs me in terms of not taking the opportunity to improve the code such that I wouldn't need the comments in the first place.  The benefits of making the code more soluble, testable, and consequently more maintainable are much more valuable than having up-to-date comments.

A worse side-effect is when developers use the time to update the comments to do nothing instead.  Call it apathy, ignorance, or just plain laziness, but more often than not the developer would rather leave the incorrect comments as-is and not worry about eliminating the need for comments.

Code comments are not testable

If I can't test a code comment, I can't verify it.  Untested or not testable code is by definition legacy code and not maintainable.  But code can be refactored and modified to be made testable and verifiable.  Code comments can't, so they will always remain not maintainable.  Putting processes in place to enforce code comments are up-to-date is not the answer since the fundamental problem with comments are I can't test to know if they are correct in any kind of automated or repeatable fashion.

Alternatives

So if code comments are bad (there are exceptions of course), what should I do instead?

  • Refactor code so that it is soluble
  • Refactor code so that it is testable
  • Use intention-revealing names for classes and members
  • Use intention-revealing names for tests

There are always exceptions to the rule, and some scenarios where code comments are appropriate could be:

  • Explaining third-party libraries, like MSMQ, etc. (that should be hidden behind interfaces anyway)
  • Explaining test results (rare)
  • Explaining usage of a third-party framework like ASP.NET where your code is intimate with their framework

I'd say 99 times out of 100, when I encounter a code comment, I just use Extract Method on the block being commented with a name that might include some of the comment.  Tools like ReSharper will actually examine your code comments and suggest a good name.  When the code comment block is extracted in a method, it's testable, and now I can enforce the behavior through a test, eliminating the need for the comment. 

No comments: