GrabBag<T>: May 2007

Wednesday, May 30, 2007

Preventing new Remote Desktop sessions in Server 2003

I have both a laptop and a desktop, and it's fairly often that I remote into my desktop to do development. Although my laptop is no slouch, you really can't beat a desktop dev experience. However, my dev machine is running Server 2003, which allows multiple remote desktop sessions at once. Windows XP only allows one user session at a time, whether it's a console (I'm physically at the machine) or remote session. That was nice, because I could have a bunch of applications open at once, lock the machine, then remote into it and have all of my applications up and running.

When I try and remote into my Server 2003 dev box, by default remote desktop starts a new session. Poof, all of my applications are gone (in a another session at least). I'd like to mimic the behavior of Windows XP, and continue the console session I already had going for me. Luckily for me, it's pretty easy to accomplish this. I actually have two options:

Using /console command-line switch to mstsc.exe (not very user-friendly)
Edit a saved remote desktop connection file (.RDP file)

I like the second option, since I often save connections to known machines. I can never remember any of the machine names anyway. Just edit a saved remote desktop connection (RDP file) in Notepad or another text editor and add the following line at the end of the file:

connect to console:i:1

Save the file, and close Notepad. When you run the RDP file, you will connect to the console session, and you'll have all of the programs you had when you were logged in to the console session.

Tuesday, May 29, 2007

Refactoring NAnt and MSBuild build scripts

A while back, I talked about the harmful effects of "Copy Paste". While editing some NAnt and MSBuild build scripts, I forgot about the evil twin of "Copy Paste", which is "Find and Replace" (I guess both twins are evil). I needed to update an MSBuild script to have the correct version numbers of an application we're starting on. Here's what the abridged MSBuild script looked like before any modifications:

<Project>

  <PropertyGroup>

    <LocalPath1>E:\builds\V2.1\US\ecomm\CoreBusinessObjectsDistribution</LocalPath1>

    <LocalPath2>E:\builds\V2.1\US\ecomm\OrderWorkflowDistribution</LocalPath2>

    <LocalPath3>E:\builds\V2.1\US\ecomm\Store.BusinessObjects.Ecommerce</LocalPath3>

    <LocalPath4>E:\builds\V2.1\US\ecomm\Store.UI</LocalPath4>

    <LocalPath5>E:\builds\V2.1\US\ecomm\Store.Utilities.Ecommerce</LocalPath5>

    <LocalPath6>E:\builds\V2.1\US\ecomm\MyCompany.Store.UI.Ecommerce</LocalPath6>

    <LocalPath7>E:\builds\V2.1\US\ecomm\MyCompany.Store.UI</LocalPath7>

    <LocalPath8>E:\builds\V2.1\US\ecomm\store</LocalPath8>

    <LocalPath9>E:\builds\V2.1\US\ecomm</LocalPath9>

    <LocalPath10>E:\builds\V2.1\US\ecomm\Deploy</LocalPath10>

  </PropertyGroup>

</Project>

This file was targeting our "V2.1" release, but I needed to update it to "V2.1.5", so all of the directory names had to be changed. I started to whip out the ever-faithful "Ctrl-H" to perform a "Find and Replace", but I stopped myself. This was a great opportunity for a refactoring.

Eliminating duplication

One of the major code smells is duplicated code. But duplications don't always have to occur in code, as the previous MSBuild script showed. I needed to change all of the references of "V2.1" to "V2.1.5", and there were two dozen examples of these, which I would need to change through "Find and Replace".

The problem with "Find and Replace" is that it can be error-prone. The search can be case-sensitive, I might pick "Search entire word", etc. There are so many options, I would need to try several combinations to make sure I found all of the instances I wanted to replace. Instead of wallowing through the "Find and Replace" mud, can't I just eliminate the duplication so I only need to make one change? Why don't we take a look at our catalog of refactorings to see if one fits for MSBuild.

Refactoring the script

Detailed in Martin Fowler's refactoring book and website, I can look up a specific code smell and find an appropriate refactoring. There are some websites that also list out "smells to refactorings". The one that looks the most promising is Extract Method. MSBuild scripts don't exactly have methods, but they do have the concepts of properties and tasks.

I can introduce a property that encapsulates the commonality between all of the "LocalPathXxx" properties, which is namely the root directory. I'll give the extracted property a good name, and then make all properties and tasks that use the root directory use my new property instead of hard-coding the path. Here's the final script:

<Project>

  <PropertyGroup>

    <LocalPathRoot>E:\builds\V2.1.5\ecomm</LocalPathRoot>

    <LocalPath1>$(LocalPathRoot)\CoreBusinessObjectsDistribution</LocalPath1>

    <LocalPath2>$(LocalPathRoot)\OrderWorkflowDistribution</LocalPath2>

    <LocalPath3>$(LocalPathRoot)\Store.BusinessObjects.Ecommerce</LocalPath3>

    <LocalPath4>$(LocalPathRoot)\Store.UI</LocalPath4>

    <LocalPath5>$(LocalPathRoot)\Store.Utilities.Ecommerce</LocalPath5>

    <LocalPath6>$(LocalPathRoot)\MyCompany.Store.UI.Ecommerce</LocalPath6>

    <LocalPath7>$(LocalPathRoot)\MyCompany.Store.UI</LocalPath7>

    <LocalPath8>$(LocalPathRoot)\store</LocalPath8>

    <LocalPath9>$(LocalPathRoot)</LocalPath9>

    <LocalPath10>$(LocalPathRoot)\Deploy</LocalPath10>

  </PropertyGroup>

</Project>

Now in future versions (V2.2 maybe?) we'll only need to make one change, instead of several dozen. Any time I eliminate duplication, I greatly reduce the chances for error.

So where are we?

The code smells laid out in Martin Fowler's book don't apply only to code. As we've seen with this MSBuild script, they can apply to all sorts of other domains where duplication causes problems. All we have to do is find appropriate mappings to the new domain for the refactorings laid out for that particular smell. Of course, if you don't know about code smells and how to recognize them, the duplication will probably continue to live on and wreak havoc on your productivity.

My next step is to replace these horrible "LocalPathXxx" property names with intention-revealing names. Originally, this script had comments around each property explaining what it meant. There's nothing like using intention-revealing names to eliminate the need for comments.

Thursday, May 24, 2007

TFS Guide now available

One thing I always thought was missing from MSDN regarding TFS was any kind of guidance or best practices. Just released was a Beta 1 of a TFS Best Practices guide, and after taking a quick look at the contents (over 300 pages) it has a ton of great information. The section on Team Build is worth its weight in gold, as it not only covers good build practices, but relates them to Team Build.

TFS Guide homepage

Latest Release

This document was put out by the Microsoft Patterns and Practices team, and while the document is a Beta, and it's easy to go straight to the guidance you need. It even has a whole chapter dedicated to Continuous Integration, though it does have the caveat "Team Foundation Server 2005 does not provide a CI solution out of box". TFS does provide the framework to support it, so you have tools like TeamCI, TFS Integrator, and Automaton. Also nice is a chapter on Large Project Considerations, and many other chapters have sections dedicated to large project considerations.

The book is laid out into separate parts:

Fundamentals
Source Control
Builds
Large Project Consideration
Project Management
Process Templates
Reporting
Setting Up and Maintaining

It finishes out with a list of:

Guidelines
Practices
Questions and Answers
How Tos

All in all, a pretty nice reference, and something I really wish I had a year ago.

Team Foundation Build, Part 3: Creating a Build

In part 1 and 2 of this series, I gave an overview of Team Foundation Build and discussed installation and configuration options. One thing I should note is that if a team needs to add custom tasks to the build that are in separate assemblies, these assemblies need to be copied to the build machine. That implies that the dev team probably needs administrator access to the build machine.

In VSTS, build definitions are called Build Types, and are created through the Team Explorer. Creating a Build Type is accomplished through a wizard, which will walk you through the steps of defining the build. So what does Team Build provide out of the box? Namely, what are the build steps involved? First, Team Build will:

Synchronize with source control
Compile the application
Run unit tests
Perform code analysis
Release builds on a file server
Publish build reports

So out of the box, we don't have to worry about configuring source control, compiling, and other common tasks that we would otherwise need to define ourselves. When I launch the New Team Build Type Creation Wizard from Team Explorer, the wizard walks me through the following steps:

Create a new build type
Select the solutions to build
Select a configuration and platforms for build
Select a build machine and drop location
Select build options

I'll walk through each of these steps one by one.

Step 1: Create a new build type

In the first screen, you need to specify the name of your Build Type. Unfortunately, all Build Types are grouped in one folder in source control, so we have to use names instead of folders to distinguish different builds. Naming conventions can help that situation, so something like <Application>_<Version>_<Region>_<BuildType> would work. In the past, I've defined several builds for the same application, like "Deploy", "Nightly", "CI", etc. Build Type names can be a pain to change, so choose your Build Type names carefully.

Step 2: Select the solutions to build

Build Type definitions allow you to select one or more Visual Studio solutions to build. In most cases, you would have only one solution to build, but if there are more than one solution to build, you can select multiple and specify the order that each solution will be compiled. If SolutionA depends on SolutionB, just have SolutionB build before SolutionA.

Step 3: Select a configuration and platforms for build

In this screen, you can specify the project configuration you would like to build with. Typically this could be "DEBUG", "RELEASE", or any custom project configurations you might have. Typically, I might have a separate project configuration like "AUTOMATEDDEBUG" that might add code analysis. I usually leave the platform to "Any CPU", but if you have specific platform requirements, this is where you would specify that.

Step 4: Select a build machine and drop location

When specifying the build machine, Team Build needs two pieces of information:

What is the name of the build machine?
What directory on the build machine should I build in?

The build machine is the machine that has Team Build Service installed on it. The directory can be anything, but keep in mind that you don't necessarily want all builds being built in the same directory. Team Build is good about separating builds in the file system, but I've had hard drives fill up when I had too many builds going on the same machine

The other piece of information in this step is the drop location. When Team Build finishes compiling and testing, it will copy the files to a UNC share you specify here. Don't worry, if you need additional files dropped, you can customize this later in the Build Type definition.

Step 5: Select build options

This step is entirely optional (but strongly recommended). You can specify that you would like this build to run tests and perform code analysis. If you select "Run test", you will need to specify the test metadata file (*.vsmdi) and the test list to run. In my last project, we had over 1300 unit tests when I left, which was absolutely impossible to manage in a test list. We used custom task to specify our tests to run, which would use reflection to load the tests dynamically.

The other option available is code analysis, which is important for enforcing coding guidelines and standards. Without code analysis turned on, you'll probably have a different coding standard for every developer who touched the code.

Step 6: Finish

When you complete the wizard, two new files are created in source control:

TfsBuild.proj - this is the Build Type definition, where you'd put any customization
WorkspaceMapping.xml - definition of the source control workspace, where you can change the build directory

You can find these files in the source control explorer in $/[Team Project]/TeamBuildType/[Build type name]. Manually going through source control is a little bit of a pain if I want to edit the TfsBuild.proj file, so I use Attrice's Team Foundation Sidekick add-in, which lets my right-click and check out and check in directly from Team Explorer.

So that's it! To start a new build just right-click the Build Type in Team Explorer and select "Build". Double-clicking the Build Type will bring up a list of all of the builds with their statuses. This is also where you can view the details of an individual build.

In the next posts, I'll detail some values, principles, and practices when it comes to automated builds, as well as some discussion on customizing and extending a Build Type definition.

Wednesday, May 23, 2007

Unit testing with stubs and Rhino Mocks

I've been using Rhino Mocks for about a year now, and Oren has never failed to impress me with the features he keeps adding on a regular basis. I needed to test a particular method that accepted an IProfile interface as an argument. I didn't want to use an existing IProfile implementation I found, I was really interested in just sending the method a stub. If I used Rhino Mocks to create a mock, I'd have to set a bunch of expectations to get everything set up, but I really just want a stub. It's a huge pain to set up a stub manually right now, as this would entail creating your own class that implemented IProfile with a basic implementation, etc. For more information about mocks, stubs, dummy objects and fake objects, check out Fowler's paper on the subject. Here's the test I created:

[TestMethod]

public void SetPaymentType_WithValidPayment_AddsPaymentFieldToPaymentFields()

{

    MockRepository repo = new MockRepository();

    IProfile profile = repo.Stub<IProfile>();

    IPayment payment = repo.Stub<IPayment>();



    using (repo.Record())

    {

        profile.Payments = new IPayment[] {payment};

        payment.PaymentCode = "CC";

    }



    using (repo.Playback())

    {

        bool result = ProfileHelper.SetPaymentType("TestValue", profile);



        Assert.AreEqual(true, result);

        Assert.AreEqual(1, payment.PaymentFields.Length);



        IField paymentField = payment.PaymentFields[0];



        Assert.AreEqual("PaymentType", paymentField.FieldKey);

        Assert.AreEqual("TestValue", paymentField.FieldValue);

    }

}

The MockRepository object is from Rhino Mocks. I call the Stub method to generate a stub object for the interfaces I'm interested in, which are specifically the IProfile and IPayment types. I set the MockRepository to Record to put in the initial values for my stubs. Note that Rhino Mocks creates the interface types, and nowhere in my code will I create an implementation of IProfile or IPayment. Rhino Mocks does this for me. I set the MockRepository back to Playback mode and call the method I wanted to test (ProfileHelper.SetPaymentType). Notice that the SetPaymentType method modifies the PaymentFields property on the IProfile object, and does it correctly. I finish out the test making assertions about the values that should be set in the IProfile object.

What's clear from looking at this test is that I'm only concerned about testing the interaction between the ProfileHelper.SetPaymentType method and the IProfile object, but I don't care about the specific implementation of the IProfile object. If I passed in a specific implementation of an IProfile object, there may be some unwanted side effects that might cause some false positives or false negatives. Using stubs makes sure I limit the scope of what's being tested only to the method I'm calling.

Team Foundation Build, Part 2: Installation and Configuration

So now that we have some understanding of what the components of Team Build are from Part 1, where should these components be installed? Luckily, there's some pretty good documentation on Team Foundation Server components and topologies on MSDN.

Lots of arrows and boxes, but the main point of this diagram is that Team Build is installed on a separate box from the Application Tier (Team Foundation Server or TFS Proxy) and from any client machines. A build machine should only have software installed to support the execution of a build. You shouldn't install:

Third-party control packages
Database client tools (Toad, SQL Server Client Tools, etc.)
Anything that would push assemblies into the GAC

Ideally, all you would have installed would be:

Team Build
Team Edition for Developers (for static analysis)
Team Edition for Testers (for running tests during a build)

Anything else installed could potentially cause build errors because the build might use incorrect versions of third party libraries when compiling. That's why it's always best to check in all third-party libraries into source control, instead of relying on installers to get them to work. For a detailed installation guide, check out the Team Foundation Installation Guide.

Another piece to note on the diagram above is the upper-right hand corner, noted as the "Build Drop site". This could be a file server or a share on the buildserver, where the compiled assemblies, log files, etc. are dropped. In the next post, I'll discuss creating a Team Build definition and an introduction into extending the build.

Tuesday, May 22, 2007

Team Foundation Build, Part 1: Introduction

There's been some interest recently for our team to utilize more features of Team System, including Team Foundation Build. Rather than send out a blanket email, I'm following Jon Udell's advice and maximizing the value of my keystrokes by posting a series of blog entries on this topic.

Visual Studio Team System introduced quite a few productivity enhancements for development teams including work items, process templates, reporting, source control, and builds. Team Foundation Build is the build server component of VSTS. Build definitions in VSTS are:

Managed in Team Explorer
Represented by MSBuild scripts
Stored in Team Foundation Source Control
Executed on a build machine by the Team Build Service
Can be initiated through Team Explorer
Report results to Team System

So why should we use Team Build over a home grown solution like batch files, Nant scripts, etc.?

Centralized management

All builds are defined, managed, and viewed through Team Explorer. Since builds are stored in source control, we get all of the benefits source control provides, such as versioning, security, etc. We also have one central repository to view and edit builds. I can double-click a build definition to view all of the executed builds with status (success/failure), and drill down into a single build to view more details. If I'm using ReSharper, I get IntelliSense and refactoring tools for MSBuild.

Defined with MSBuild

MSBuild is the new build platform for Visual Studio. Project files (.vsproj, .vbproj, etc.) are now defined as MSBuild scripts. Tasks in MSBuild are customizable and extensible, so I can define new tasks and use community built tasks. Team Build definitions also allow extensibility points, similar to the ASP.NET page event model, by extending certain targets such as "BeforeGet", "AfterTest", and "AfterDropBuild".

Status and reporting

There are usually two pieces of information I'm curious about when looking at builds:

What is the status of the current build? (In progress, successful, failed)
Is there a trend in the build statuses?

All of this information can be seen through Team Explorer. Additionally, I've seen tray icon applications that will display a red, yellow, or green light indicating the status of a certain build definition.

Where do we go from here?

In coming posts, I'll discuss installation and configuration, defining builds, and outlining a set of values, principles, and practices Team Build can be used to encourage and enforce. I'll also outline some ideas on what kinds of build definitions are good to have, and what kinds of activities we might want to accomplish as part of our builds.

Friday, May 18, 2007

Fun with recursive Lambda functions

I saw a couple of posts on recursive lambda expressions, and I thought it would be fun to write a class to encapsulate those two approaches. BTW, I'm running this on Orcas Beta 1, so don't try this at home (VS 2005) kids. Let's say I wanted to write a single Func variable that computed the factorial of a number:

Func<int, int> fac = x => x == 0 ? 1 : x * fac(x-1);

When I try to compile this, I get a compiler error:

Use of unassigned local variable 'fac'

That's no good. The C# compiler always evaluates the right hand expression first, and it can't use a variable before it is assigned.

Something of a solution

Well, the C# compiler couldn't automagically figure out my recursion, but I can see why. So I have a couple of different solutions, one where I create an instance of a class that encapsulates my recursion, and another where a static factory method gives me a delegate to call. I combined both approaches into one class:

public class RecursiveFunc<T>

{

    private delegate Func<A, R> Recursive<A, R>(Recursive<A, R> r);

    private readonly Func<Func<T, T>, Func<T, T>> f;



    public RecursiveFunc(Func<Func<T, T>, Func<T, T>> higherOrderFunction)

    {

        f = higherOrderFunction;

    }



    private Func<T, T> Fix(Func<Func<T, T>, Func<T, T>> f)

    {

        return t => f(Fix(f))(t);

    }



    public T Execute(T value)

    {

        return Fix(f)(value);

    }



    public static Func<T, T> CreateFixedPointCombinator(Func<Func<T, T>, Func<T, T>> f)

    {

        Recursive<T, T> rec = r => a => f(r(r))(a);

        return rec(rec);

    }

}

Using an instance of a class

The idea behind using a class is it might be more clear to the user to have an instance of a concrete type, and call methods on that type instead of calling a delegate directly. Let's look at an example of this usage, with the Fibonacci and factorial recursive methods:

[TestMethod]

public void RecursiveFunc_WithFactorial_ComputesCorrectly()

{

    var factorial = new RecursiveFunc<int>(fac => x => x == 0 ? 1 : x * fac(x - 1));



    Assert.AreEqual(1, factorial.Execute(1));

    Assert.AreEqual(6, factorial.Execute(3));

    Assert.AreEqual(120, factorial.Execute(5));

}



[TestMethod]

public void RecursiveFunc_WithFibonacci_ComputesCorrectly()

{

    var fibonacci = new RecursiveFunc<int>(fib => x => 

        (x == 0) || (x == 1) ? x : fib(x - 1) + fib(x - 2)

    );



    Assert.AreEqual(0, fibonacci.Execute(0));

    Assert.AreEqual(1, fibonacci.Execute(1));

    Assert.AreEqual(1, fibonacci.Execute(2));

    Assert.AreEqual(2, fibonacci.Execute(3));

    Assert.AreEqual(5, fibonacci.Execute(5));

    Assert.AreEqual(55, fibonacci.Execute(10));

}

So in each case I can pass in the Func delegate I was trying to create (without success) in the compiler error example at the top of the post. I instantiate the class with my recursive function, and call Execute to execute that function recursively. Not too shabby.

Using a static factory method

With a static factory method, calling the recursive function looks a little prettier. Again, here are two examples that use the Fibonacci sequence and factorials for recursive algorithms:

[TestMethod]

public void FixedPointCombinator_WithFactorial_ComputesCorrectly()

{

    var factorial = RecursiveFunc<int>.CreateFixedPointCombinator(fac => x => x == 0 ? 1 : x * fac(x - 1));



    Assert.AreEqual(1, factorial(1));

    Assert.AreEqual(6, factorial(3));

    Assert.AreEqual(120, factorial(5));

}



[TestMethod]

public void FixedPointCombinator_WithFibonacci_ComputesCorrectly()

{

    var fibonacci = RecursiveFunc<int>.CreateFixedPointCombinator(fib => x =>

        (x == 0) || (x == 1) ? x : fib(x - 1) + fib(x - 2)

    );



    Assert.AreEqual(0, fibonacci(0));

    Assert.AreEqual(1, fibonacci(1));

    Assert.AreEqual(1, fibonacci(2));

    Assert.AreEqual(2, fibonacci(3));

    Assert.AreEqual(5, fibonacci(5));

    Assert.AreEqual(55, fibonacci(10));

}

After some thought on both, I think I like the second way better. Calling the Func delegate directly seems to look a little nicer, and it saves me from having to have some random Fibonacci or factorial helper class. Of course, I could still have those helper methods somewhere, but where's the fun in that? Now if only I had taken a lambda calculus class in college...

Thursday, May 17, 2007

Project management with Microsoft Office SharePoint Server

Having used Team System in the past, I've been trying to wrap my head around using Team System for work items in our group. We're using Team System exclusively for source control, but there's a lot more functionality available to use. At the heart of a Team System project is the process template, which creates the work item templates, reports, and the team portal page. The problem I'm seeing is that our Team Project, scoped for source control, spans many groups, many internal projects, many versions, and many global teams. We have Core, Back End, Personalization, Front End, B2B, etc. applications. We have 2.1, 2.1.5, and 2.2 versions. We have Global, Asia Pacific, US/CA, Latin America, and Europe regions. All of these different groups, concerns, and project requirements are under a single Team Project. How are we supposed to create a single process template that could possibly work?

What Team System can give us

In my last project, we used Scrum as our development process, and Scrum for Team System as our process template. For those unfamiliar with Scrum, it is a lightweight, incremental and iterative development process that breaks the development cycle into iterations called "sprints". Each sprint is timeboxed, such that no extra work can be assigned nor can the length of the iteration be changed during the sprint. Time and requirements are fixed during the sprint.

All development we did was driven off of requirements that were defined and managed from Team System. When we checked in code, we associated the check-in with a work item. When we ran builds, we could see what checkins were part of that build, what comments were available, and what work items were worked on for that build. Additionally, we no longer needed any status meetings, since individual team members would update the work remaining of their work items every day. Burndown charts told us (and management) if we were on track or not. Reports told us at any given time:

Progress against the work committed for the current sprint
Progress against the work committed for the current release
Status of individual features or user stories (not started, in progress, ready for test, complete)
Hierarchical composition of features and tasks, with effort and work remaining

Call me crazy, but I think it's perfectly reasonable to let the team members be responsible for keeping the status of their tasks up to date and not the project manager. All of this information was available at any time, in real time, and always represented the "truth" of the project status.

Current issues

The problem with our current layout for our Team Project is that it spans so many teams, so many projects, and so many geographical groups. For a process template to be effective for this topology, it would need to be

Generalized so we don't pigeon hole all teams into a monolithic process
Flexible to handle different process needs and schedules
Extensible to allow modifications and additions

I'm a big fan of self-organizing teams. We have a lot of intelligent people on our team, we should be able to decide how best to work. Process templates are pretty much set in stone once the Team Project is created, so I don't see a whole lot of value applying a process template to the topology we have now in our source control. With Scrum, we had a Sprint Retrospective after each sprint to look at improving our process. This regular feedback would be tough, if not impossible to act on if we have to approve changes across a global team.

The SharePoint solution

I recently ran across another solution to this problem that used SharePoint. Instead of Team System to manage the Product Backlog and Sprint Backlog, you can use SharePoint lists to house these artifacts. You can still use Excel for reports, and SharePoint includes a powerful search feature that Team System doesn't have. What you would lose is the ability to link to work items as you can in Team System. But without completely changing the topology of our Team Projects to project-based, I just can't see us being able to take advantage of the process templates in Team System. SharePoint also gives you custom views on top of your data, and those look to be a little bit easier to use than the custom queries and reports in Team System.

The cool thing about a SharePoint solution is that it wouldn't be tied to Team System, so each team could manage their own team project however they wish. You give up some in the integration that Team System provides, but you can gain some by allowing each team to take responsibility for their process. If some teams have well-defined and mature development processes, some meta-elements could eventually be developed into a framework for a process template (I'm a big proponent of harvested frameworks). Since the reality is we can't do whole team together, SharePoint is a great solution to enable collaborative, communicative teams.

Tuesday, May 15, 2007

Parsing strings with the TryParse method

I recently posted on the out and ref keywords in C#, and mentioned the only time I'd see the "out" keyword was in the Tester-Doer pattern. Well, I was really looking for the Try-Parse pattern (near the end of the post). The Try-Parse pattern is ideal for situations where exceptions might be thrown in common scenarios, like parsing strings for numeric or date-time data.

A simple example

Let's say I've read some text in from an outside source into a string. The outside source could be a querystring, XML, a database row, user input, etc. The problem is that I need the value in terms of an integer, date, or some other primitive type. So how would we do this in .NET 1.0 or 1.1?

string rawCustomerNumber = GetCustomerNumber();



try

{

    int customerNumber = int.Parse(rawCustomerNumber);

    DoSomethingWithCustomerNumber(customerNumber);

}

catch

{

}

So what's so bad with this code? The real issue is that exceptions are very expensive to handle in .NET. If "rawCustomerNumber" often has bad values, this code snippet could kill the performance of our application. Whenever I profile application performance, number of exceptions thrown and caught are one of the first things I'll look at since they're so expensive. Besides, exceptions are supposed to be exceptional, but in the snippet above, exceptions could happen quite often when parsing text.

A new way

So how should we parse text going forward? Versions of the .NET Framework starting with 2.0 introduced a new method for most primitive types, "TryParse". Here's what Int32.TryParse looks like:

public static bool TryParse (

    string s,

    out int result

)

Before, the parse method would return the parsed integer value. Now, the return value is a bool, specifying whether or not parsing was successful or not. Exceptions won't get thrown if the string isn't a valid value anymore, and I now use the "out" param to get the parsed value back from the function. Here's the modified code:

string rawCustomerNumber = GetCustomerNumber();



int customerNumber;

if (Int32.TryParse(rawCustomerNumber, out customerNumber))

{

    DoSomethingWithCustomerNumber(customerNumber);

}

Although "out" params should be generally avoided, in this situation they are perfectly reasonable because the readability has improved. I don't like relying on exceptions for flow control logic, which can kill readability. Nothing is more confusing than trying to follow a bunch of nested try-catch blocks to see what the real behavior is supposed to be. Now I have a very clear flow control path, "If parsing was successful, do something with the result" instead of "Try to parse, and if I don't get an exception, do something with the result".

A look at the numbers

I timed the two methods calling them 10,000 times with bad values. The original example took nearly 4 seconds to execute, while the TryParse method took less than 100 milliseconds to complete. That's over a 40x difference! If this code was deep down in a large stack trace, the difference would be even greater. That's some good incentive to pick TryParse over the original Parse method.

Closing thoughts

The Try-Parse pattern is fairly common in the .NET Framework, and you can find it on numeric types, dates, even the Dictionary class. Since it's a pattern, you can implement it yourself by following the FDG recommendations detailed here. I've used it in the past for search methods and other situations where I want a result and also a boolean telling me if the operation was successful. The pattern isn't for every situation, but it's another tool in your repertoire.

Friday, May 11, 2007

Pop quiz on ref and out parameters in C#

If you line up 100 C# developers, I would be willing to bet that the number of developers that could explain the difference between out and ref parameter keywords could be counted on one hand. When I first started .NET development coming over from pointer-centric languages such as C and C++, I used the ref keyword with reckless abandon. I started in VB.NET, which only exacerbated the problem with its ByVal and ByRef keywords. While working on a defect today, I spotted an interesting use of the ref keyword that took me back to my nascent days as a .NET developer:

SqlCommand cmd = new SqlCommand("SELECT * FROM customers WHERE customer_name = @Name");


AddInputParam(ref cmd, "@Name", SqlDbType.NVarChar, customer.Name);

The code went on to execute the query. But that last line really had me confused. Under what circumstances would I be getting a different SqlCommand object out of "AddInputParam"? After some investigation, it turned out that this was just an incorrect use of the ref parameter.

So what are the ref and out keywords?

To understand what the ref and out keywords are, you have to know a little about pointers and reference types in .NET. In the snippet above, the variable "cmd" holds a reference to a SqlCommand object. When you specify the "ref" keyword on a method parameter, you are notifying the caller that the reference to the object they passed in can change. What this told me in the above snippet is that "cmd" could be pointing to a completely different SqlCommand object when the method returned. I'm pretty sure that's not what the intention of this code is supposed to be. I don't want to execute a different SqlCommand object, I want to execute the one I created.

With the "out" keyword, it is akin to extra return variables. It signifies that something extra is passed out of the method, and the caller should initialize the variable they are passing in as null.

Out params should be passed in as a null reference, and have to assign the value before exiting the method
Ref params should be passed in as an instantiated object, and may re-assign the value before exiting the method

The problem with the snippet above was that the "ref" keyword was completely unnecessary. When you pass in a reference type by value to a method (the default), the variable reference itself can't change, but the object itself can change. I could remove the "ref" keyword, and change the SqlCommand object all I wanted, and changes would get reflected in that object when the method returned. But if I set the "cmd" variable inside the method to a new SqlCommand object, the original SqlCommand object will still point to the original instance.

Example using ref and out

Let's look at a trivial case highlighting the differences between ref, out, and value parameters. I have a simple Customer class that looks like this:

public class Customer

{

    private string _name;



    public Customer(string name)

    {

        _name = name;

    }



    public string Name

    {

        get { return _name; }

        set { _name = value; }

    }



}

Pretty simple, just a customer with a name. Now, some code with methods with out, ref, and value parameters:

public void RefAndOutParamExample()

{

    Customer customer = new Customer("Bob");

    Debug.WriteLine(customer.Name);



    Test1(customer);

    Debug.WriteLine(customer.Name);



    Test2(ref customer);

    Debug.WriteLine(customer.Name);



    Test3(out customer);

    Debug.WriteLine(customer.Name);

}



private void Test1(Customer customer)

{

    customer.Name = "Billy";

    customer = new Customer("Sally");

}



private void Test2(ref Customer customer)

{

    customer.Name = "Larry";

    customer = new Customer("Joe");

}



private void Test3(out Customer customer)

{

    // customer.Name = "Suzie"; // Compile error, I can't reference an

                                // out param without assigning it first

    customer = new Customer("Chris");

}

The output of the RefAndOutParamExample would be:

Original:    Bob

Value param: Billy

ref param:   Joe

out param:   Chris

So what happened?

In all of these methods, I reassign the customer.Name property, then reassign the customer parameter to a new instance of a Customer object. All of the methods successfully change the Name property of the original customer instance, but only methods with the out and ref parameter can change what Customer object the original variable referenced. The final Test3 method can't assign the Name property, and will get a compile error if I try to access it before assigning it.

When to use ref and out parameters

From Framework Design Guidelines, pages 156 and 157, I see two guidelines:

AVOID using out or ref parameters
DO NOT pass reference types by reference

Framework Design Guidelines has 4 types of recommendations related to guidelines:

DO - should be always followed
CONSIDER - should generally be followed, unless you really know what's going on and have a good reason to break the rule
DO NOT - should almost never do
AVOID - generally not a good idea, but there might be a few known cases where it makes sense

So the FDG tells me that in general I should avoid out and ref parameters, and should never pass in reference types with the "ref" or "out" keyword. The problem with these keywords is that they require some knowledge of pointers and reference types, which can be easily confused. It also forces the caller to declare temporary variables. They hurt the readability of the code since it violates the common pattern of assigning a variable the result of a method call.

If you feel the need to add a ref param, I'd suggest taking a look at FDG to see the recommendations for these parameters in depth. You could also consider refactoring your code to return the entire result into a single object, instead of splitting the results into two objects in a return parameter and a ref parameter. The only time I've ever justified the need of a ref parameter was in the Tester-Doer pattern, which is for a very specific scenario. To me, ref and out params remind me of Mr. Miyagi's advice about karate in The Karate Kid, "You learn karate so that you never need to use it".

Proper string comparison

It seems developers have a knack for collecting technical books. I have a system where books are available to me relative to their importance. Books I crack once every few months stay at home. If I start needing the book more than once a month, it goes on my bookshelf in my desk. There are a few books I need on almost a daily basis, and these books stay within arms reach. The two books I've always kept close to me are Framework Design Guidelines and CLR via C#. I would almost consider these books required reading for a .NET team, though the latter can be fairly low-level at times. One such situation I've needed these books close at hand is when I'm doing string comparisons.

An exciting bug

I was looking at a defect that came down to data not being saved into the database properly. All of the fields in a record had a value, but one important field was conspicuously blank. A colleague hunted down the persistence code and noticed the following line (scrubbed somewhat):

if (address.Other1.Type.Equals("taxcode"))

    AddInputParam(sqlCmd, "@tax_code", SqlDbType.Char, address.Other1.Value);

else

    AddInputParam(sqlCmd, "@tax_code", SqlDbType.Char, String.Empty);

This code seems innocuous. If the address's Other1 property's Type equals "taxcode", then set the "@tax_code" parameter to its value, otherwise set the "@tax_code" parameter value to an empty string. This stored procedure requires a "@tax_code" parameter, but the address may or may not have a "taxcode" value. All very straightforward, and fairly explicit.

Capital letters are fun

Oh snap. It looks like the value of "address.Other1.Type" isn't "taxcode", but rather "TAXCODE". For this snippet of code, the intent was to ignore the case of the Type, but by default, string.Equals will do a character by character comparison in a case-sensitive fashion to determine equality. Since chars are 16-bit Unicode values, internally the char.Equals will just compare the 16-bit values of two characters to determine equality. So we need a good way to determine equality while taking into account case-sensitivity (hint: address.Other1.Type.ToUpper().Equals("TAXCODE") is NOT the right answer).

So how should we compare strings then? This is where the CLR via C# book comes in extra-handy. The answer depends on the context of comparison you want to do. There are three types of comparisons we can perform on a string:

Current culture
Invariant culture
Ordinal

Additionally, each comparison type has a case-insensitive option available.

So what should I have used instead?

By far most string comparisons should use the Ordinal comparison types. Ordinal comparisons use the UTF-16 value of the string, character by character, while oridinal case-insensitive comparisons use invariant culture rules. Ordinal comparisons are also much faster than culture-aware comparisons. By default, string comparisons use the CultureInfo.CurrentCulture settings, which can change during the course of execution, unknown to the code performing the string comparison. That's why we don't want to ever use ToUpper() when we're not dealing with user input, since the underlying culture can change without us knowing. Comparisons that were equal when executed in the US are now not in France. This can cause major headaches when dealing with global code. Ordinal comparisons are good for use in:

Path and file names
Database object names (database, table, field, etc.)
XML tags and attribute names
Local constants, identifiers and "magic values"

Notice that the defect was an example of the last entry. Since I wanted a case-sensitive comparison, I would have used StringComparison.OrdinalIgnoreCase. For simplicity and ease-of-use, the .NET Framework has static and member overloads of the following methods that will take the StringComparison enumeration:

Equals
Compare
StartsWith
EndsWith

So to fix the defect, I would have changed the code snippet to:

if (address.Other1.Type.Equals("taxcode", StringComparison.OrdinalIgnoreCase))

    AddInputParam(sqlCmd, "@tax_code", SqlDbType.Char, address.Other1.Value);

else

    AddInputParam(sqlCmd, "@tax_code", SqlDbType.Char, String.Empty);

So how do I compare strings in the future?

When I'm performing string comparisons, I really only need to ask myself two questions:

Where did each string come from? (Hard-coded value, user input, etc.)
Do I care about the case?

The answer of each of the questions will point me to the correct StringComparison value. Just to be explicit, I never use the default methods for ToUpper, Equals, etc. When you leave out the StringComparison or other culture information, you're more likely to introduce bugs simply because you haven't been explicit about what kind of comparison you'd like to do, regardless whether the default implementation does what you'd want it to do. If you're explicit, the next developer to look at that code will understand immediately what the intention of the comparison was instead of trying to figure out what kind of comparison it should have been.

For further information

Consult CLR via C#
Encoding and Localization on MSDN
StringComparison documentation
String documentation

Monday, May 7, 2007

Consistency in user interface behavior

I know I can't be the only person that gets annoyed by this, but the developers of Windows Messenger and Office Communicator must have been on crack when they determined behavior for the "Close" and "Minimize" button. Every application I have ever used closes when I hit "X" and minimizes when I hit "_" in the title bar. Tray icon applications are even smart enough to minimize to the tray when I hit "_". But they still close when I hit "X".

For some reason MS wants to be above this. When you hit "X", it doesn't close Communicator. No, you didn't REALLY mean to close (exit) it, you just wanted to minimize it to the tray icon. For some funny reason, 99.9% of all tray icon applications actually CLOSE and EXIT when I hit "X". Even other MS tray icon applications follow this rule. I use Virtual PC 2007, and when I hit "X", it exits the whole application. When I hit "_", it minimizes to the tray. But now I have to think twice when I have VPC 2007 open. I have to wonder, is this one of those MS applications where close means minimize? Oops, I clicked "X", and that was really "Close" for this one. Time to start over. So now with MS tray icon applications I always click "_" first to try and minimize to tray, and if that doesn't work, I'll click "X" next. Forever a two-step process, thanks a bunch. Real intuitive.

Lack of consistency in the behavior of common tasks such as clicking the "X" button just kill me. You may want a New and Improved Way of doing things, but if you violate the consistency and expected behavior of an operation, you'll likely infuriate your end users no matter how great the new behavior may be.

Thursday, May 3, 2007

Another reason to love ReSharper

Whenever I look at a class in code for the first time, I try to figure out what its responsibilities are. Ideally, I would look at unit tests exercising the public interface. These unit tests will give me a description of functionality, code showing usage, and most likely some assertions describing and verifying behavior. If there aren't any unit tests, I have to resort to figuring out what the code is doing by examining the code (which is about 10 times slower than looking at unit tests).

First off, I'll look at the using directives at the top of the file. Using directives can provide a key insight into the behaviors and responsibilities of a class, since the using directives show me what the dependencies are. For highly cohesive, loosely coupled code, you probably won't see a ton of using directives, as a general rule of thumb. I was a little disheartened when I saw these using directives:

using System;

using System.Collections;

using System.ComponentModel;

using System.Data;

using System.Diagnostics;

using System.Web;

using System.Web.Services;

using System.DirectoryServices;

using System.Data.SqlClient;

using System.Drawing;

using System.Drawing.Design;

using System.ComponentModel.Design;

using System.Configuration;

using System.Xml.Serialization;

using System.Xml;

using System.IO;



using GL = System.Globalization;

using CMP = System.IO.Compression;

using MSG = System.Messaging;



using System.Web.SessionState;

using System.Web.UI;

using System.Web.UI.WebControls;

using System.Web.UI.HtmlControls;

using System.Security;

using MyCompany.Commerce.BusinessObjects.Shipping;

using MyCompany.Commerce.Core;

using SCF = MyCompany.Commerce.StoreConfiguration;

using MyCompany.Commerce.Core.BusinessObjectManager;

using SH = MyCompany.Commerce.BusinessObjects.Ecomm.Security.SecurityHelper;

using ctx = MyCompany.Commerce.BusinessObjects.StoreContext;

using MyCompany.Sales.Security;

using MyCompany.Sales.Profile;

using Microsoft.ApplicationBlocks.ExceptionManagement;

Wow. That's a lot of using directives. Boooo. It looks like this class cares about web services, ADO.NET, XML serialization, IO, session state, and the list goes on and on. Waaaay too much for one class to care about. If there were a such thing as code referees, this class would get red carded for a flagrant foul of the Single Responsibility Principle. Luckily, I have Resharper.

ReSharper to the rescue

ReSharper is pretty much crack for developers. Once you use it for a week or even a day, you'll find it nearly impossible to develop without it. So how did ReSharper help me in this situation? With a nifty "Optimize using directives" refactoring of course. The "Optimize using directives" refactoring will remove unused using directives, and can also add new ones where you've qualified type names with namespaces. Here are the using directives after the refactoring:

using System;

using System.IO;

using System.Web.Services;

using System.Xml.Serialization;

using System.Web.SessionState;

using System.Web.UI;

using System.Web.UI.HtmlControls;

using System.Web.UI.WebControls;

using BO = MyCompany.Commerce.BusinessObjects;

using Microsoft.ApplicationBlocks.ExceptionManagement;

using MyCompany.Commerce.BusinessObjects.Shipping;

using MyCompany.Commerce.Core;

using SCF = MyCompany.Commerce.StoreConfiguration;

using MyCompany.Commerce.Core.BusinessObjectManager;

using SH = MyCompany.Commerce.BusinessObjects.Ecomm.Security.SecurityHelper;

using ctx = MyCompany.Commerce.BusinessObjects.StoreContext;

using MyCompany.Sales.Security;

using MyCompany.Sales.Profile;

HUGE improvement in readability. Now I can be fairly certain what the external dependencies of this class are, as we reduced the number of using directives from 34 (!) to 18, around half. There's still way too many dependencies, I'd like to see this number reduced to around 5 or so. ReSharper will also sort the using statements to group common directives together (it doesn't work for using aliases). Now it's not as big a leap as it was before to try and figure out what this class does. That's one less ibuprofen I'll need today :). Of course, if this class had been properly unit tested, I wouldn't need the ibuprofen in the first place, but that's another story.