Tuesday, May 15, 2007

Parsing strings with the TryParse method

I recently posted on the out and ref keywords in C#, and mentioned the only time I'd see the "out" keyword was in the Tester-Doer pattern.  Well, I was really looking for the Try-Parse pattern (near the end of the post).  The Try-Parse pattern is ideal for situations where exceptions might be thrown in common scenarios, like parsing strings for numeric or date-time data.

A simple example

Let's say I've read some text in from an outside source into a string.  The outside source could be a querystring, XML, a database row, user input, etc.  The problem is that I need the value in terms of an integer, date, or some other primitive type.  So how would we do this in .NET 1.0 or 1.1?

string rawCustomerNumber = GetCustomerNumber();

try
{
    int customerNumber = int.Parse(rawCustomerNumber);
    DoSomethingWithCustomerNumber(customerNumber);
}
catch
{
}

So what's so bad with this code?  The real issue is that exceptions are very expensive to handle in .NET.  If "rawCustomerNumber" often has bad values, this code snippet could kill the performance of our application.  Whenever I profile application performance, number of exceptions thrown and caught are one of the first things I'll look at since they're so expensive.  Besides, exceptions are supposed to be exceptional, but in the snippet above, exceptions could happen quite often when parsing text.

A new way

So how should we parse text going forward?  Versions of the .NET Framework starting with 2.0 introduced a new method for most primitive types, "TryParse".  Here's what Int32.TryParse looks like: 

public static bool TryParse (
    string s,
    out int result
)

Before, the parse method would return the parsed integer value.  Now, the return value is a bool, specifying whether or not parsing was successful or not.  Exceptions won't get thrown if the string isn't a valid value anymore, and I now use the "out" param to get the parsed value back from the function.  Here's the modified code:

string rawCustomerNumber = GetCustomerNumber();

int customerNumber;
if (Int32.TryParse(rawCustomerNumber, out customerNumber))
{
    DoSomethingWithCustomerNumber(customerNumber);
}

Although "out" params should be generally avoided, in this situation they are perfectly reasonable because the readability has improved.  I don't like relying on exceptions for flow control logic, which can kill readability.  Nothing is more confusing than trying to follow a bunch of nested try-catch blocks to see what the real behavior is supposed to be.  Now I have a very clear flow control path, "If parsing was successful, do something with the result" instead of "Try to parse, and if I don't get an exception, do something with the result".

A look at the numbers

I timed the two methods calling them 10,000 times with bad values.  The original example took nearly 4 seconds to execute, while the TryParse method took less than 100 milliseconds to complete.  That's over a 40x difference!  If this code was deep down in a large stack trace, the difference would be even greater.  That's some good incentive to pick TryParse over the original Parse method.

Closing thoughts

The Try-Parse pattern is fairly common in the .NET Framework, and you can find it on numeric types, dates, even the Dictionary class.  Since it's a pattern, you can implement it yourself by following the FDG recommendations detailed here.  I've used it in the past for search methods and other situations where I want a result and also a boolean telling me if the operation was successful.  The pattern isn't for every situation, but it's another tool in your repertoire.

2 comments:

Anonymous said...

i found your blog on google. i just want to note that the performance of exceptions is not as bad as you claim once you turn on release optimizations and don't run it in a debugger.

Jimmy Bogard said...

My performance numbers are exactly what I claimed given the exact environment I ran it in, but I see your point, there were ways for it to get better. I don't pay too much attention to exact numbers in any case, as order of magnitudes and growth trends are better measures.

In any case, using exceptions as flow control is a bad idea. My main point wasn't that exceptions are expensive (which they are), but exceptions are, well, exceptional and shouldn't be relied on to make decisions if there are alternative means.