Mark Needham

Thoughts on Software Development

Functional C#: Using Join and GroupJoin

with 7 comments

An interesting problem which I’ve come across a few times recently is where we have two collections which we want to use together in some way and get a result which could either be another collection or some other value.

In one which Chris and I were playing around with we had a collection of years and a collection of cars with corresponding years and the requirement was to show all the years on the page with the first car we found for that year or an empty value if there was no car for that year.

We effectively needed to do a left join on the cars collection.

This is an imperative way of solving the problem:

public class Car
{
     public int Year { get; set; }
     public string Description { get; set; }
}
var years = new[] { 2000, 2001, 2002, 2003 };
var cars = new[] { new Car { Year = 2000, Description = "Honda" }, new Car { Year = 2003, Description = "Ford" } };
 
var newCars = new List<Car>();
foreach (var year in years)
{
    var car = cars.Where(x => x.Year == year).FirstOrDefault() ?? new Car  { Year = year, Description = ""};
    newCars.Add(car);
}

We can actually achieve the same result in a more declarative way by making use of ‘GroupJoin‘:

var newCars = years.GroupJoin(cars, 
                              year => year, 
                              car => car.Year,
                              (year, theCars) =>  theCars.FirstOrDefault() ??  new Car { Year = year, Description = ""  });

‘GroupJoin’ is useful if we want to keep all of the items in the first collection and get a collection of the items in the second collection which match for the specified keys.

In this case it allows us to identify where there are no matching cars for a specific year and then just set a blank description for those years.

One nice side effect is that if we later want to include multiple cars for a year then we shouldn’t have to change the code too much to achieve that.

Another example which I came across is where we have one collection which contains filter criteria which it needs to apply against the other collection.

We have a collection of years and need to indicate whether there is a matching car for each of those years.

[Test]
public void JoinExample()
{
    var years = new[] { 2000, 2003 };
    var cars = new[] { new Car { Year = 2000, Description = "Honda" },
                       new Car { Year = 2003, Description = "Ford" },
                       new Car { Year = 2003, Description = "Mercedes"}};
 
    Assert.That(AreThereMatchingCars(years, cars), Is.True);
}
public bool AreThereMatchingCars(IEnumerable<int> years, IEnumerable<Car> cars)
{
    foreach (var year in years)
    {
        if(cars.Where(c => c.Year == year).Count() == 0)
        {
            return false;
        }
    }
    return true;
}

We can rewrite this function like so:

public bool AreThereMatchingCars(IEnumerable<int> years, IEnumerable<Car> cars)
{
    var distinctCars = cars.GroupBy(x => x.Year).Select(x => x.First());
    return years.Join(distinctCars, y => y, c => c.Year, (y, c) => c).Count() == years.Count();
}

This actually become more complicated than we expected because we were working out if there were matching cars for each of the specified years by checking the number of filter items and then comparing it to the number of items when we joined that collection with our collection of cars.

If we have more than one car for the same year that logic falls down so we needed to get just one car per year which is what the first line of the function does.

I can’t decide whether or not the code is easier to read and understand by making use of these functions but it’s an approach that I picked up when playing around with F# so it’s interesting that it can still be applied in C# code as well.

Be Sociable, Share!

Written by Mark Needham

March 4th, 2010 at 6:55 pm

Posted in .NET

Tagged with ,

  • Instead of the GroupJoin couldn’t you, in this case, just Select over the years and output a car if there is one?

    var newCars = years.Select(year => cars.FirstOrDefault(car => car.Year == year) ?? new Car {Year=year, Description=””});

  • And in the same spirit:

    var areThereMatchingCars = years.All(year => cars.FirstOrDefault(car => car.Year == year) == null ? false : true);

    Am I missing something here?

  • I’m also wondering if I’m missing something 🙂

    return years.All(y => cars.Any(c => c.Year == y));

  • @Jonas, Paul – at first glance it looks like you guys are right, I’ve massively overcomplicated those again, d’oh!

    I’m kinda intrigued as to why I’m doing that though – a simpler solution was pointed out for another problem I described a couple of weeks ago as well!

  • @Paul nice catch!

    @Mark I think that all of us do that a lot more than we realize. Even while commenting on exactly that, I still overcomplicated above. By being kind and posting your code in public you get the power of a thousand pair programmers. It’s great because both you and the readers are being educated.

  • I haven’t verified it, but it wouldn’t surprise me if my version (and similarly, Jonas’s) was less efficient. I haven’t investigated how the join operators work under the covers but it certainly seems possible that Mark’s version involves fewer iterations through the list of cars.

    Still, hopefully these lists aren’t big and we can go with the easier to read (if perhaps slower) approach 🙂

  • if(cars.Where(c => c.Year == year).Count() == 0)
    should really be
    if(cars.Where(c => c.Year == year).Any() == false)