Thursday, December 20, 2007

Upgrading to Windows XP SP2

After months of soul-searching, I made the gut-wrenching decision today to upgrade my home PC to Windows XP SP2.

Upgrade from Vista, that is.

I'm completely convinced that Vista is not designed to run on single-core/processor machines.  I've run Vista on work machines without any hiccups, with Aero Glass going full on.  I thought I had a semi-decent home PC:

  • AMD Athlon XP 2800+
  • 2 GB RAM

Alas, it was not enough to net me more than about 2.9 on the Windows Experience Index.  UAC annoys the hell out of me, most file operations take forever, I'm denied access to do simple operations, like creating a folder on my D: drive.  At work, I'll turn all of these safety features off, as I'm okay running with scissors in a development environment.  I have no idea how a home user deals with all of it, I sure couldn't.  Hopefully Vista's SP1 will fix these issues.

Tuesday, December 18, 2007

Extension methods and primitive obsession

In another water-cooler argument today, a couple of coworkers didn't like my extension method example.  One main problem is that it violates instance semantics, where you expect that a method call off an instance won't work if the instance is null.  However, extension methods break that convention, leading the developer to question every method call and wonder if it's an extension method or not.  For example, you can run into these types of scenarios:

string nullString = null;

bool isNull = nullString.IsNullOrEmpty();

In normal circumstances, the call to IsNullOrEmpty would throw a NullReferenceException.  Since we're using an extension method, we leave it up to the developer of the extension method to determine what to do with null references.

Since there's no way to describe to the user of the API whether or not the extension method handles nulls, or how it handles null references, this can lead to quite a bit of confusion to clients of that API, or later, those maintaining code using extension methods.

In addition to problems with dealing with null references (which Elton pointed out, could be better handled with design-by-contract), some examples of extension methods online propose examples that show more than a whiff of the "Primitive Obsession" code smell:

Dealing with primitive obsession

In both of the examples above (Scott cites David's example), an extension method is used to determine if a string is an email:

string email = txtEmailAddress.Text;

if (! email.IsValidEmailAddress())
{
    // oh noes!
}

It's something I've done a hundred times, taking raw text from user input and performing some validation to make sure it's the "right" kind of string I want.  But where do you stop with validation?  Do you assume all throughout the application that this string is the correct kind of string, or do you duplicate the validation?

An alternative approach is accept that classes are your friend, and create a small class to represent your "special" primitive.  Convert back and forth at the boundaries between your system and customer-facing layers.  Here's the new Email class:

public class Email
{
    private readonly string _value;
    private static readonly Regex _regex = new Regex(@"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$");

    public Email(string value)
    {
        if (!_regex.IsMatch(value))
            throw new ArgumentException("Invalid email format.", "value");

        _value = value;
    }

    public string Value
    {
        get { return _value; }
    }

    public static implicit operator string(Email email)
    {
        return email.Value;
    }

    public static explicit operator Email(string value)
    {
        return new Email(value);
    }

    public static Email Parse(string email)
    {
        if (email == null)
            throw new ArgumentNullException("email");

        Email result = null;

        if (!TryParse(email, out result))
            throw new FormatException("Invalid email format.");

        return result;
    }

    public static bool TryParse(string email, out Email result)
    {
        if (!_regex.IsMatch(email))
        {
            result = null;
            return false;
        }

        result = new Email(email);
        return true;
    }
}

I do a few things to make it easy on developers to use an email class that can play well with strings as well as other use cases:

  • Made Email immutable
  • Defined conversion operators to and from string
  • Added the Try-Parse pattern

The usage of the Email class closely resembles usage for other string-friendly types, such as DateTime:

string inputEmail = txtEmailAddress.Text;

Email email;

if (! Email.TryParse(inputEmail, out email))
{
    // oh noes!
}

txtEmailAddress.Text = email;

Now I can go back and forth from strings and my Email class, plus I provided a way to convert without throwing exceptions.  This looks very similar to code dealing with textual date representations.

Yes, but

The final Email class takes more code to write than the original extension method.  However, now that we have a single class that plays nice with primitives, additional Email behavior has a nice home.  With a class in place, I can now model more expressive emails, such as ones that include names like "Ricky Bobby <ricky.bobby@rb.com>".  Once the home is created, behavior can start moving in.  Otherwise, validation would be sprinkled throughout the system at each user boundary, such as importing data, GUIs, etc.

If you find yourself adding logic to primitives to the point of obsession, it's a strong indicator you're suffering from primitive obsession and a nice, small, specialized class can help eliminate a lot of the duplication primitive obsession tends to create.

Dead Google Calendar gadget

This morning I received an interesting yet disturbing message on the Google Calendar gadget on my iGoogle home page:

Great gadget that it was, I think I might be a little more discerning about what gadgets I put on the home page.  Word of warning, you probably don't want to google "donkey-punching", definitely NSFW.  It looks like Google changed something, broke the gadget, and the gadget author decided to let everyone know, through an....interesting means.

ALT.NET summary blog

If the ALT.NET mailing list is too much to keep up with, as it is the Mother of All Firehoses (MOAF), several folks have pointed out a nice summary blog:

Alt.Net Pursefight!

It keeps a nice daily ego check and blow-by-blow commentary of some of the more interesting comment wars going on there.  Pretty funny.

Friday, December 14, 2007

Ruby-style Array methods in C# 3.0

A while back I played with Ruby-style loops in C# 3.0.  This sparked my jealousy of other fun Ruby constructs that I couldn't find in C#, and a couple of them are the "each" and "each_with_index" methods for arrays.  Here's an example, from thinkvitamin.com:

my_vitamins = ['b-12', 'c', 'riboflavin']

my_vitamins.each do |vitamin|
  puts "#{vitamin} is tasty!"
end
=> b-12 is tasty!
=> c is tasty!
=> riboflavin is tasty!

With both Arrays and List<T> in .NET, this is already possible: 

string[] myVitamins = {"b-12", "c", "riboflavin"};

Array.ForEach(myVitamins,
    (vitamin) =>
    {
        Console.WriteLine("{0} is tasty", vitamin);
    }
);

var myOtherVitamins = new List<string>() { "b-12", "c", "riboflavin" };

myOtherVitamins.ForEach(
    (vitamin) =>
    {
        Console.WriteLine("{0} is very tasty", vitamin);
    }
);

There are a few problems with these implementations, however:

  • Inconsistent between types
  • IEnumerable<T> left out
  • Array has a static method, whereas List<T> is instance
  • Index is unknown

Since T[] implicitly implements IEnumerable<T>, we can create a simple extension method to handle any case.

Without index

I still like the "Do" keyword in Ruby to signify the start of a block, and I'm not a fan of the readability (or "solubility", whatever) of the "ForEach" method.  Instead, I'll borrow from the loop-style syntax I created in the previous post that uses a "Do" method:

myVitamins.Each().Do(
    (vitamin) =>
    {
        Console.WriteLine("{0} is tasty", vitamin);
    }
);

To accomplish this, I'll need something to add the "Each" method, and something to provide the "Do" method.  Here's what I came up with:

public static class RubyArrayExtensions
{
    public class EachIterator<T>
    {
        private readonly IEnumerable<T> values;

        internal EachIterator(IEnumerable<T> values)
        {
            this.values = values;
        }

        public void Do(Action<T> action)
        {
            foreach (var item in values)
            {
                action(item);
            }
        }
    }

    public static EachIterator<T> Each<T>(this IEnumerable<T> values)
    {
        return new EachIterator<T>(values);
    }
}

The "Each" generic method is an extension method that extends anything that implements IEnumerable<T>, which includes arrays, List<T>, and many others.  IEnumerable<T> is ripe for extension, as .NET 3.5 introduced dozens of extension methods for it in the System.Linq.Enumerable class.  With these changes, I now have a consistent mechanism to perform an action against an array or list of items:

string[] myVitamins = { "b-12", "c", "riboflavin" };

myVitamins.Each().Do(
    (vitamin) =>
    {
        Console.WriteLine("{0} is tasty", vitamin);
    }
);

var myOtherVitamins = new List<string>() { "b-12", "c", "riboflavin" };

myOtherVitamins.Each().Do(
    (vitamin) =>
    {
        Console.WriteLine("{0} is very tasty", vitamin);
    }
);

With index

Ruby also has a "each_with_index" method for arrays, and in this case, there aren't any existing methods on System.Array or List<T> to accomplish this.  With extension methods, this is still trivial to accomplish.  I now just include the index whenever executing the callback to the Action<T, int> passed in.  Here's the extension method with the index:

public static class RubyArrayExtensions
{
    public class EachWithIndexIterator<T>
    {
        private readonly IEnumerable<T> values;

        internal EachWithIndexIterator(IEnumerable<T> values)
        {
            this.values = values;
        }

        public void Do(Action<T, int> action)
        {
            int i = 0;
            foreach (var item in values)
            {
                action(item, i++);
            }
        }
    }

    public static EachWithIndexIterator<T> EachWithIndex<T>(this IEnumerable<T> values)
    {
        return new EachWithIndexIterator<T>(values);
    }
}

The only difference here is I keep track of an index to send back to the delegate passed in from the client side, which now looks like this:

string[] myVitamins = { "b-12", "c", "riboflavin" };

myVitamins.EachWithIndex().Do(
    (vitamin, index) =>
    {
        Console.WriteLine("{0} cheers for {1}!", index, vitamin);
    }
);

var myOtherVitamins = new List<string>() { "b-12", "c", "riboflavin" };

myOtherVitamins.EachWithIndex().Do(
    (vitamin, index) =>
    {
        Console.WriteLine("{0} cheers for {1}!", index, vitamin);
    }
);

This now outputs:

0 cheers for b-12!
1 cheers for c!
2 cheers for riboflavin!
0 cheers for b-12!
1 cheers for c!
2 cheers for riboflavin!

Pointless but fun

I don't think I'd ever introduce these into production code, as it's never fun to drop new ways to loop on other's laps.  If anything, it shows how even parentheses can hinder readability, even if the method names themselves read better.

In any case, I now have a simple, unified mechanism to perform an action against any type that implements IEnumerable<T>, which includes arrays and List<T>.

Wednesday, December 12, 2007

Decomposing a book club

Book clubs can be a great way to foster learning and encourage growth on a team.  They aren't always the best avenue for training, which might include:

  • Formal training
  • Industry events
  • Presentations
  • Brown bag lunches
  • etc.

I always enjoyed book clubs because it gave our team a chance to discuss technical topics on a regular basis, sometimes outside of domains we were working on.

Planning a book club

If you're starting a book club for the first time, you might have to just pick a book that everyone might be interested in without specifically asking anyone.  Having the desire to start a book club probably means you already know what deficiencies exist in your team, so you're better equipped to pick a book.

In that case, pick a deficiency and find a great book to study.  It helps if you've read it beforehand so you know what you're getting yourself into.  Book clubs need guidance and leadership during meetings to make sure learning and growth take place, and it's hard to do if the material is new to everyone. 

Software books usually come in two flavors:

  • Design
  • Technology

Examples of design books might be GoF, anything in the Fowler signature series, Pragmatic Programmer, and other books that are design/principle specific but language/technology agnostic.  Technology books are filled with referential material, but not as much guidance on specific subjects like ASP.NET.

Some books I've done book clubs on are:

  • Programming ASP.NET 2.0: Core Reference, Dino Esposito
  • The Pragmatic Programmer, Andrew Hunt and David Thomas
  • Agile Software Development: Principles, Practices and Patterns, Robert C. Martin
  • Framework Design Guidelines, Cwalina and Abrams
  • Working Effectively with Legacy Code, Michael Feathers
  • Refactoring to Patterns, Joshua Kerievsky

As you can see, I'm heavily weighted to "Design" type books, mostly because values, principles, and practices translate well to any new technology.

Book club agenda

After we select a book, we decide on an agenda and schedule.  For internal book clubs, we try to meet once a week at lunch, covering about 30-40 pages a session.  We try to get each book club to end after 10-12 weeks, or about 4 months.  Everyone reads all the material and discusses it at each meeting.  Sometimes we skip chapters if the material isn't relevant or particularly insightful.

Here in the Austin area, we're forming an Austin-wide DDD book club to go over Evans' Domain-Driven Design book.  The first thing I do in determining schedule is to break down the parts and chapters into the number of pages they take up:

I laid out each of the chapters and noted what page they started on, then calculated the part and chapter lengths in separate columns.  Since a single discussion can never cover more than 40 pages in an hour, I used conditional formatting to add bars signifying lengths of the chapter for easier planning.

Chapter length distribution seemed to be all over the place, so I created a distribution chart to see it a little more clearly:

This distribution is interesting, as it shows a fairly random distribution of chapter lengths.  Not Gaussian, not evenly distributed, but a whole bunch of short chapters, a few medium chapters, a few more big chapters, and one off-the chart chapter.  Ideally, I'd have one 40 page chapter per session, but it doesn't always work out that way.

Suggested agendas

Sometimes, especially with the Fowler Signature Series books, the introductory chapters suggest both the important chapters and a suggested study sequence/reading sequence.  This can guide the book club agenda.

In the case of the Evans book, he calls out the "most important" chapters, and says that all others can be read out of order, but are meant to be read in their entirety.  We might flag some chapters as "nice to have" and put them aside, to go back to later if there's time.

The meeting

Book club meetings are meant to be fun, open, inviting, and intellectually appealing.  If someone isn't engaged, they're probably on their laptop or Crack-berry, so keep an eye out for wandering minds.  Asking questions tends to get everyone involved, rather than a chorus of "yeah that makes sense, I agree, echo echo echo".

If you're leading the book club, be prepared to read ahead and have a list of talking points before you go in.  If anything, the book club leader is charged with creating discussion, but not leading discussion.  Talking points tend to enforce focus, so discussions don't wind down tangents or social topics for too long.  "OMG did you see last night's episode of 'Heroes'?" should be kept to a minimum.

Finally, be aware of your audience.  If you're covering a topic new to a lot of folks, you might have to do some additional prodding to make sure everyone feels like they contributed.  Nobody likes to listen to someone else's conversation for an hour (well, almost nobody).

Encouraging self-improvement

Probably the biggest benefit of book clubs and other organic training like brown bags is that they create a culture of self-improvement.  Having the team engaged in book clubs, brown bags, user groups, etc. can set the bar higher when it comes to quality and pride in individual workmanship.  Plus, sometimes companies pay for the lunch, so that's always good, right?

Thursday, December 6, 2007

Don't hide the ugly

I wanted to take some time to highlight the difference between encapsulation and subterfuge.  Just so we're on the same page:

  • Encapsulation: The ability to provide users with a well-defined interface to a set of functions in a way which hides their internal workings.
  • Subterfuge: An artifice or expedient used to evade a rule, escape a consequence, hide something, etc.

When related to code, both of these techniques hide internal details of the system.  The key difference is who we're hiding the details from.  With encapsulation, we're hiding details from end-users or consumers of our API, which is a good thing.  With subterfuge, we're hiding ugly from developers needing to change the API, which can be disastrous.

Subterfuge hides the ugly, and for ugly to get fixed, I want it front and center.  Subterfuge comes in many varieties, but all achieve the same end result of hiding the ugly instead of fixing the ugly.

Region directives

Region directives in C# and VB.Net let you declare a region of code that can be collapsed or hidden.  I've used them in the past to partition a class based on visibility or structure, so I'll have a ".ctor" region or maybe a "Public Members" region.  Other regions can be collapsed so I don't need to look at them.

For example, our DataGateway class seems nice and concise, and it looks like it has only about 10-20 lines of code:

One small problem, note the region "Legacy db access code".  By applying a region to some nasty code, I've hidden the ugliness away from the developer who otherwise might think it was a problem.  The developer doesn't know about the problem, as I've collapsed over 5000 lines of code into one small block.

If a class is big and nasty, then just let it all hang out.  If it's really bothersome, extract the bad code into a different class.  At least you'll have separation between ugly and nice.  Hiding ugly code in region directives doesn't encourage anyone to fix it, and it tends to deceive those glancing at a class how crazy it might be underneath.

One type per file, please

Nothing irks me more than looking at a solution in the solution explorer, seeing a relatively small number of files, and then realizing that each file has a zillion types.  A file called "DataGateway.cs" hides all sorts of nuttiness:

public enum DataGatewayType
{

}

public abstract class DataGateway
{
    public abstract int[] GetCustomerIDs();
}

public class SqlDataGateway : DataGateway
{
    public override int[] GetCustomerIDs()
    {
        // sproc's for all!!!

        return new int[] { };
    }
}

public class OracleDataGateway : DataGateway
{
    public override int[] GetCustomerIDs()
    {
        // now we're getting enterprise-y 

        return new int[] { };
    }
}

public class MySqlGateway : DataGateway
{
    public override int[] GetCustomerIDs()
    {
        // we don't support hippies 

        return null;
    }
}

Java, as I understand it, has a strict convention of one type per file.  I really like that convention (if I'm correct), as it forces the developer to match their file structure to their package (assembly) structure.  No such luck in .NET, although there may be some FxCop or ReSharper rules to generate warnings.

The problem with combining multiple types to one file is that it makes it very difficult for the developer to gain any understanding of the code they're working with.  Incorrect assumptions start to arise when I see a file name that doesn't match the internal structure.  Some of the rules I use are:

  • One type per file (enum, delegate, class, interface, struct)
  • Project name matches assembly name (or root namespace name)
  • File name matches type name
  • Folder structure matches namespaces
    • i.e. <root>\Security == RootNamespace.Security

With these straightforward rules, a developer can look at the folder structure or the solution explorer and know exactly what types are in the project, how they are organized, and maybe even understand responsibilities and architecture.  Even slight deviations from the above rules can cause unnecessary confusion for developers.  You're not doing anyone any favors by cramming 10 classes into one file, so don't do it.

Letting it all hang out

When I was young, my mom would insist on me cleaning my room once a week (the horror!).  Being the resourceful chap that I was, I found I could get the room resembling clean by cramming everything just out of sight, into the closet, under the bed, even in the bed.  This strategy worked great until I actually needed something I stashed away, and couldn't find it.  I had created the illusion of organization, but the monster was lurking just out of sight.

So instead of hiding the problem, I forced myself to deal with the mess until I cared enough to clean it up.  Eventually this led me to fix problems early and often, so they don't snowball into a disastrous mess.  This wasn't out of self-satisfaction or anything like that, but laziness, as I found it was more work to deal with hidden messes than it was to clean it up in the first place.

If my code is a mess, I'll just let it be ugly.  Ugly gets annoying and ugly gets fixed, but ugly swept under the rug gets overlooked, until it pounces on the next unsuspecting developer.

Monday, December 3, 2007

Dealing with primitive obsession

One code smell I tend to miss a lot is primitive obsession.  Primitives are the building blocks of data in any programming language, such as strings, numbers, booleans, and so on.

Many times, primitives have special meaning, such as phone numbers, zip codes, money, etc.  Nearly every time I encounter these values, they're exposed as simple primitives:

public class Address
{
    public string ZipCode { get; set; }
}

But there are special rules for zip codes, such as they can only be in a couple formats in the US: "12345" or "12345-3467".  This logic is typically captured somewhere away from the "ZipCode" value, and typically duplicated throughout the application.  For some reason, I was averse to creating small objects to hold these values and their simple logic.  I don't really know why, as data objects tend to be highly cohesive and can cut down a lot of duplication.

Beyond what Fowler walks through, I need to add a couple more features to my data object to make it really useful.

Creating the data object

First I'll need to create the data object by following the steps in Fowler's book.  I'll make the ZipCode class a DDD Value Object, and this is what I end up with:

public class Address
{
    public ZipCode ZipCode { get; set; }
}

public class ZipCode
{
    private readonly string _value;

    public ZipCode(string value)
    {
        // perform regex matching to verify XXXXX or XXXXX-XXXX format
        _value = value;
    }

    public string Value
    {
        get { return _value; }
    }
}

This is pretty much where Fowler's walkthrough stops.  But there are some issues with this implementation:

  • Now more difficult to deal with Zip in its native format, strings
  • Zip codes used to be easier to display

Both of these issues can be easy to fix with the .NET Framework's casting operators and available overrides.

Cleaning it up

First, I'll override the ToString() method and just output the internal value:

public override string ToString()
{
    return _value;
}

Lots of classes, tools, and frameworks use the ToString method to display the value of an object, and now it will use the internal value of the zip code instead of just outputting the name of the type (which is the default).

Next, I can create some casting operators to go to and from System.String.  Since zip codes are still dealt with mostly as strings in this system, I stuck with string instead of int or some other primitive.  Also, many other countries have different zip code formats, so I stayed with strings.  Here are the cast operators, both implicit and explicit:

public static implicit operator string(ZipCode zipCode)
{
    return zipCode.Value;
}

public static explicit operator ZipCode(string value)
{
    return new ZipCode(value);
}

I prefer explicit operators when converting from primitives, and implicit operators when converting to primitives.  FDG guidelines for conversion operators are:

DO NOT provide a conversion operator if such conversion is not clearly expected by the end users.

DO NOT define conversion operators outside of a type's domain.

DO NOT provide an implicit conversion if the conversion is potentially lossy.

DO NOT throw exceptions from implicit casts.

DO throw System.InvalidCastException if a call to a cast operator results in lossy conversion and the contract of the operator does not allow lossy conversions.

I meet all of these guidelines, so I think this implementation will work.

End result

Usability with the changed ZipCode class is much improved now:

Address address = new Address();

address.ZipCode = new ZipCode("12345"); // constructor
address.ZipCode = (ZipCode) "12345"; // explicit operator

string zip = address.ZipCode; // implicit operator

Console.WriteLine("ZipCode: {0}", address.ZipCode); // ToString method

Basically, my ZipCode class now "plays nice" with strings and code that expects strings.

With any hurdles out of the way for using simple data objects, I can eliminate a lot of duplication and scattered logic by creating small, specialized classes for all of the special primitives in my app.

Time is running out

I popped open Windows Live Writer today and got a fun message:

I thought this product was free, and I never paid for anything, so I'm a little confused how a free product can expire.  Live Writer isn't supported on Server 2003, which is what I use, so I have to jump through 80 or so hoops to get Live Writer installed on Server 2003.  Everything works perfectly fine, but now it seems I will be compelled to jump through the same hoops to upgrade to a version I don't need.  Fun times.