Mark Needham

Thoughts on Software Development

C#: Overcomplicating with LINQ

with 11 comments

I recently came across an interesting bit of code which was going through a collection of strings and then only taking the first ‘x’ number of characters and discarding the rest.

The code looked roughly like this:

var words = new[] {"hello", "to", "the", "world"};
var newWords = new List<string>();
foreach (string word in words)  
{
    if (word.Length > 3)
    {
        newWords.Add(word.Substring(0, 3));
        continue;
    }
    newWords.Add(word);
}

For this initial collection of words we would expect ‘newWords’ to contain [“hel”, “to”, “the”, “wor”]

In a way it’s quite annoying that the API for ‘Substring’ throws an exception if you try and get just the first 3 characters of a string which contains less than 3 characters. If it didn’t do that then we would have an easy ‘Select’ call on the collection.

Instead we have an annoying if statement which stops us from treating the collection as a whole – we do two different things depending on whether or not the string contains more than 3 characters.

In the spirit of the transformational mindset I tried to write some code using functional collection parameters which didn’t make use of an if statement.

Following this idea we pretty much have to split the collection into two resulting in this initial attempt:

var newWords = words
    .Where(w => w.Length > 3)
    .Select(w => w.Substring(0, 3))
    .Union(words.Where(w => w.Length <= 3).Select(w => w));

This resulted in a collection containing [“hel”, “wor”, “to”, “the”] which is now in a different order to the original!

To keep the original order I figured that we needed to keep track of the original index position of the words, resulting in this massively overcomplicated version:

var wordsWithIndex = words.Select((w, index) => new { w, index });
 
var newWords = wordsWithIndex
               .Where(a => a.w.Length >= 3)
               .Select((a, index) => new {w = a.w.Substring(0, 3), a.index})
               .Union(wordsWithIndex.Where(a => a.w.Length < 3).Select(a => new { a.w, a.index }))
               .OrderBy(a => a.index);

We end up with a collection of anonymous types from which we can get the transformed words but it’s a far worse solution than any of the others because it takes way longer to understand what’s going on.

I couldn’t see a good way to make use of functional collection parameters to solve this problem but luckily at this stage Chris Owen came over and pointed out that we could just do this:

var newWords = words.Select(w => w.Length > 3 ? w.Substring(0, 3) : w);

I’d been trying to avoid doing what is effectively an if statement inside a ‘Select’ but I think in this case it makes a lot of sense and results in a simple and easy to read solution.

Be Sociable, Share!

Written by Mark Needham

February 21st, 2010 at 12:01 pm

Posted in .NET

Tagged with ,

  • anon

    words.Select(w => w.Substring(0, Math.Min(3, w.Length)));

  • Tim Rossiter

    How about this:

    words.Select(w => w.Substring(0, Math.Min(3, w.Length)));

  • Ah neat that’s even better!

  • Juanma

    Another option would be create a method (either an extension method or a standard one) to encapsulate the “if”, and the write something like words.Select(x => MySubString(x));

  • Paul Cox

    I agree with having as little logic as possible in the lambda method. Maybe you could use an extension method to reproduce the old VB Left function?

    public static class StringExtensions
    {
    public static string Left(this string s, int length)
    {
    return s.Substring(0, Math.Min(length, s.Length));
    }
    }

    var newWords = words.Select(w => w.Left(3));

  • rob

    Whilst the code you gave was interesting to read I found myself asking ‘why bother?’ Maybe I’m missing something, but what value did you add to the code? It’s more concise, but not much more. Is it more testable, better performing it just different for differents sake?

  • @rob – I’m slightly obsessed with looking for ways to make code more declarative rather than imperative so I find it a bit of a challenge trying to see whether I can do it and in this case I was failing for a lot of the time.

    In general I think code written in a declarative way is easier to understand than imperative code in the sense that I find I don’t have to think about it as much because I can quickly scan it and see what it’s intent is rather than implementation.

    I think you could probably argue the opposite case here but if it’s easy to make the code declarative then I don’t see why not!

  • “Take” does what you want. (at least on Mono in the csharp repl)

    csharp>”foobar”.Take(3);
    —-> { ‘f’, ‘o’, ‘o’ }

    csharp> var words = new[] {“hello”, “to”, “the”, “world”};

    csharp> words.Select(w => w.Take(3));
    —-> { { ‘h’, ‘e’, ‘l’ }, { ‘t’, ‘o’ }, { ‘t’, ‘h’, ‘e’ }, { ‘w’, ‘o’, ‘r’ } }

    So now we have a list of list of characters, which need to get back into strings.

    csharp> words.Select(w => w.Take(3).Aggregate(“”, (acc,c) => acc + c.ToString()));
    —-> { “hel”, “to”, “the”, “wor” }

  • And I’d probably make an extension method called “Sum” on a list of characters, so the end result would look like:

    csharp> words.Select(w => w.Take(3).Sum());
    —-> { “hel”, “to”, “the”, “wor” }

  • Not sure if this is any better…just another way to skin a cat:

    words.Select(s => new string(s.ToCharArray().Take(3).ToArray()));

  • Pingback: Twitted by Jalpesh()