Archive for the ‘c#’ tag
Functional C#: Using Join and GroupJoin
An interesting problem which I've come across a few times recently is where we have two collections which we want to use together in some way and get a result which could either be another collection or some other value.
In one which Chris and I were playing around with we had a collection of years and a collection of cars with corresponding years and the requirement was to show all the years on the page with the first car we found for that year or an empty value if there was no car for that year.
We effectively needed to do a left join on the cars collection.
This is an imperative way of solving the problem:
public class Car { public int Year { get; set; } public string Description { get; set; } }
var years = new[] { 2000, 2001, 2002, 2003 }; var cars = new[] { new Car { Year = 2000, Description = "Honda" }, new Car { Year = 2003, Description = "Ford" } }; var newCars = new List<Car>(); foreach (var year in years) { var car = cars.Where(x => x.Year == year).FirstOrDefault() ?? new Car { Year = year, Description = ""}; newCars.Add(car); }
We can actually achieve the same result in a more declarative way by making use of 'GroupJoin':
var newCars = years.GroupJoin(cars, year => year, car => car.Year, (year, theCars) => theCars.FirstOrDefault() ?? new Car { Year = year, Description = "" });
'GroupJoin' is useful if we want to keep all of the items in the first collection and get a collection of the items in the second collection which match for the specified keys.
In this case it allows us to identify where there are no matching cars for a specific year and then just set a blank description for those years.
One nice side effect is that if we later want to include multiple cars for a year then we shouldn't have to change the code too much to achieve that.
Another example which I came across is where we have one collection which contains filter criteria which it needs to apply against the other collection.
We have a collection of years and need to indicate whether there is a matching car for each of those years.
[Test] public void JoinExample() { var years = new[] { 2000, 2003 }; var cars = new[] { new Car { Year = 2000, Description = "Honda" }, new Car { Year = 2003, Description = "Ford" }, new Car { Year = 2003, Description = "Mercedes"}}; Assert.That(AreThereMatchingCars(years, cars), Is.True); }
public bool AreThereMatchingCars(IEnumerable<int> years, IEnumerable<Car> cars) { foreach (var year in years) { if(cars.Where(c => c.Year == year).Count() == 0) { return false; } } return true; }
We can rewrite this function like so:
public bool AreThereMatchingCars(IEnumerable<int> years, IEnumerable<Car> cars) { var distinctCars = cars.GroupBy(x => x.Year).Select(x => x.First()); return years.Join(distinctCars, y => y, c => c.Year, (y, c) => c).Count() == years.Count(); }
This actually become more complicated than we expected because we were working out if there were matching cars for each of the specified years by checking the number of filter items and then comparing it to the number of items when we joined that collection with our collection of cars.
If we have more than one car for the same year that logic falls down so we needed to get just one car per year which is what the first line of the function does.
I can't decide whether or not the code is easier to read and understand by making use of these functions but it's an approach that I picked up when playing around with F# so it's interesting that it can still be applied in C# code as well.
C#: Overcomplicating with LINQ
I recently came across an interesting bit of code which was going through a collection of strings and then only taking the first 'x' number of characters and discarding the rest.
The code looked roughly like this:
var words = new[] {"hello", "to", "the", "world"}; var newWords = new List<string>(); foreach (string word in words) { if (word.Length > 3) { newWords.Add(word.Substring(0, 3)); continue; } newWords.Add(word); }
For this initial collection of words we would expect 'newWords' to contain ["hel", "to", "the", "wor"]
In a way it's quite annoying that the API for 'Substring' throws an exception if you try and get just the first 3 characters of a string which contains less than 3 characters. If it didn't do that then we would have an easy 'Select' call on the collection.
Instead we have an annoying if statement which stops us from treating the collection as a whole – we do two different things depending on whether or not the string contains more than 3 characters.
In the spirit of the transformational mindset I tried to write some code using functional collection parameters which didn't make use of an if statement.
Following this idea we pretty much have to split the collection into two resulting in this initial attempt:
var newWords = words .Where(w => w.Length > 3) .Select(w => w.Substring(0, 3)) .Union(words.Where(w => w.Length <= 3).Select(w => w));
This resulted in a collection containing ["hel", "wor", "to", "the"] which is now in a different order to the original!
To keep the original order I figured that we needed to keep track of the original index position of the words, resulting in this massively overcomplicated version:
var wordsWithIndex = words.Select((w, index) => new { w, index }); var newWords = wordsWithIndex .Where(a => a.w.Length >= 3) .Select((a, index) => new {w = a.w.Substring(0, 3), a.index}) .Union(wordsWithIndex.Where(a => a.w.Length < 3).Select(a => new { a.w, a.index })) .OrderBy(a => a.index);
We end up with a collection of anonymous types from which we can get the transformed words but it's a far worse solution than any of the others because it takes way longer to understand what's going on.
I couldn't see a good way to make use of functional collection parameters to solve this problem but luckily at this stage Chris Owen came over and pointed out that we could just do this:
var newWords = words.Select(w => w.Length > 3 ? w.Substring(0, 3) : w);
I'd been trying to avoid doing what is effectively an if statement inside a 'Select' but I think in this case it makes a lot of sense and results in a simple and easy to read solution.
C#: A lack of covariance with generics example
One of the things I find most confusing when reading about programming languages is the idea of covariance and contravariance and while I've previously read that covariance is not possible when using generics in C# I recently came across an example where I saw that this was true.
I came across this problem while looking at how to refactor some code which has been written in an imperative style:
public interface IFoo { string Bar { get; set; } } public class Foo : IFoo { public string Bar { get; set; } }
private IEnumerable<IFoo> GetMeFoos() { var someStrings = new[] { "mike", "mark" }; var someFoos = new List<IFoo>(); foreach (var s in someStrings) { someFoos.Add(new Foo { Bar = s }); } return someFoos; }
I changed the code to read like so:
private IEnumerable<IFoo> GetMeFoos() { var someStrings = new[] { "mike", "mark" }; return someStrings.Select(s => new Foo { Bar = s }); }
Which fails with the following compilation error:
Error 1 Cannot implicitly convert type 'System.Collections.Generic.IEnumerable<Test.Foo>' to 'System.Collections.Generic.IEnumerable<Test.IFoo>'. An explicit conversion exists (are you missing a cast?)
I thought the compiler would infer that I actually wanted a collection of 'IFoo' given that I was returning from the method directly after the call to Select but it doesn't.
As I understand it the reason that we can't downcast an IEnumerable of 'Foo' to an IEnumberable of 'IFoo' is that we would run into problems if we worked of the assumption that our original collection only contained Foos in it later on in our program.
For example it would be possible to add any item which implemented the 'IFoo' interface into the collection even if it wasn't a 'Foo':
// this code won't compile List<Foo> foos = new List<Foo>(); // add some foos List<IFoo> ifoos = foos; foos.Add(new SomeOtherTypeThatImplementsIFoo());
It's not possible to convert 'SomeOtherTypeThatImplementsIFoo' to 'Foo' so we would run ourself into problems.
Rick Byers has a post from a few years ago where he explains how this works in more detail and also points out that covariance of generics is actually supported by the CLR, just not by C#.
In the case I described we can get around the problem by casting 'Foo' to 'IFoo' inside the 'Select':
private IEnumerable<IFoo> GetMeFoos() { var someStrings = new[] { "mike", "mark" }; return someStrings.Select(s => (IFoo) new Foo { Bar = s }); }
C#: Causing myself pain with LINQ's delayed evaluation
I recently came across some code was imperatively looping through a collection and then mapping each value to go to something else by using an injected dependency to do that.
I thought I'd try to make use of functional collection parameters to try and simplify the code a bit but actually ended up breaking one of the tests.
About a month ago I wrote about how I'd written a hand rolled stub to simplify a test and this was actually where I caused myself the problem!
The hand rolled stub was defined like this:
public class AValueOnFirstCallThenAnotherValueService : IService { private int numberOfCalls = 0; public string SomeMethod(string parameter) { if(numberOfCalls == 0) { numberOfCalls++; return "aValue"; } else { numberOfCalls++; return "differentValue"; } } }
The test was something like this:
[Test] public void SomeTest() { var fooOne = new Foo { Bar = "barOne" }; var fooTwo = new Foo { Bar = "barTwo" }; var aCollectionOfFoos = new List<Foo> { fooOne, fooTwo }; var service = new AValueOnFirstCallThenAnotherValueService(); var someObject = new SomeObject(service); var fooBars = someObject.Method(aCollectionOfFoos); Assert.That(fooBars[0].Other, Is.EqualTo("aValue")); // and so on }
The object under test looked something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | public class SomeObject { private IService service; public SomeObject(IService service) { this.service = service; } public IEnumerable<FooBar> Method(List<Foo> foos) { var fooBars = new List<FooBar(); foreach(var foo in foos) { fooBars.Add(new FooBar { Bar = foo.Bar, Other = service.SomeMethod(foo.Bar) }; } // a bit further down var sortedFooBars = fooBars.OrderBy(f => f.Other); return fooBars; } } |
I decided to try and incrementally refactor the code like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | public class SomeObject { ... public IEnumerable<FooBar> Method(List<Foo> foos) { var fooBars = foos.Select(f => new FooBar { Bar = f.Bar, Other = service.SomeMethod(f.Bar) }; // a bit further down var sortedFooBars = fooBars.OrderBy(f => f.Other); return fooBars; } } |
I ran the tests after doing this and the test I described above failed – it was expecting a return value for 'Other' of 'aValue' but was actually returning 'differentValue'.
I was a bit confused about what was going on until I started watching what the test was doing through the debugger and realised that on the 'OrderBy' call on line 10 the 'Select' call on line 7 was being reevaluated which meant that the value returned by 'service.SomeMethod' would be 'differentValue' since it was being called for the 3rd and 4th time and it's set up to return 'aValue' only on the 1st time.
The way to get around this problem was to force the evaluation of 'fooBars' to happen immediately by calling 'ToList()':
1 2 3 4 5 6 7 8 9 10 11 | public class SomeObject { ... public IEnumerable<FooBar> Method(List<Foo> foos) { var fooBars = foos.Select(f => new FooBar { Bar = f.Bar, Other = service.SomeMethod(f.Bar) }.ToList(); ... } } |
In this case it was fairly easy to identify the problem but I've written similar code before which has ended up reordering collections with thousands of items in because it's been lazy evaluated every time the collection is needed.
In Jeremy Miller's article about functional C# he suggests the idea of memoization as an optimisation technique to stop expensive calls being made more times than they need to be so perhaps this would be another way to solve the problem although I haven't tried that approach before.
Functional C#: Extracting a higher order function with generics
While working on some code with Toni we realised that we'd managed to create two functions that were almost exactly the same except they made different service calls and returned collections of a different type.
The similar functions were like this:
private IEnumerable<Foo> GetFoos(Guid id) { IEnumerable<Foo> foos = new List<Foo>(); try { foos = fooService.GetFoosFor(id); } catch (Exception e) { // do some logging of the exception } return foos; }
private IEnumerable<Bar> GetBars(Guid id) { IEnumerable<Bar> bars = new List<Bar>(); try { bars = barService.GetBarsFor(id); } catch (Exception e) { // do some logging of the exception } return bars; }
We're defining the empty lists so that if the service throws an exception we can make use of an empty list further on in the code. A failure of the service in this context doesn't mean that the application should stop functioning.
My thinking here was that we should be able to pull out the service calls into a function but the annoying thing is that they return different types of collections so I initially thought that we'd be unable to remove the duplication.
Thinking about the problem later on I realised we could just define the return value of the service call in the function to use generics.
We therefore end up with this solution:
private IEnumerable<Bar> GetBars(Guid id) { return GetValues(() => barService.GetBarsFor(id)); }
private IEnumerable<Foo> GetFoos(Guid id) { return GetValues(() => fooService.GetFoosFor(id)); }
private IEnumerable<T> GetValues<T>(Func<IEnumerable<T>> getValues) { IEnumerable<T> values = new List<T>(); try { values = getValues(); } catch (Exception e) { // do some logging of the exception } return values; }
I think the code is still quite readable and it's relatively obvious what it's supposed to be doing.
Functional C#: LINQ vs Method chaining
One of the common discussions that I've had with several colleagues when we're making use of some of the higher order functions that can be applied on collections is whether to use the LINQ style syntax or to chain the different methods together.
I tend to prefer the latter approach although when asked the question after my talk at Developer Developer Developer I didn't really have a good answer other than to suggest that it seemed to just be a personal preference thing.
Damian Marshall suggested that he preferred the method chaining approach because it more clearly describes the idea of passing a collection through a pipeline where we can apply different operations to that collection.
I quite like that explanation and I think my preference for it would have probably been influenced by the fact that when coding in F# we can use the forward piping operator to achieve code which reads like this.
For example if we had a list and wanted to get all the even numbers, double them and then add them up we might do this:
[1..10] |> List.filter (fun x -> x % 2 = 0) |> List.map (fun x -> x * 2) |> List.fold (fun acc x -> acc + x) 0
If I was in C# I'd probably do this:
Enumerable.Range(1, 10) .Where(x => x % 2 == 0) .Select(x => x * 2) .Sum(x => x);
I found it quite difficult to work out what the equivalent LINQ syntax would be because I don't use it but I think something like this would be what you'd need to write to do the same thing:
from x in Enumerable.Range(1, 10) where x%2 == 0 select x * 2).Sum(x => x);
I'm not sure if there's a way to do the sum within the LINQ statement or whether you need to do it using the method as I have here.
Even just writing this example I found that the way I had to write the LINQ code seemed quite counter intuitive for me with the way that I typically try to solve problems like this.
At least now thanks to Damian I now understand why that is!
Functional C#: Writing a 'partition' function
One of the more interesting higher order functions that I've come across while playing with F# is the partition function which is similar to the filter function except it returns the values which meet the predicate passed in as well as the ones which don't.
I came across an interesting problem recently where we needed to do exactly this and had ended up taking a more imperative for each style approach to solve the problem because this function doesn't exist in C# as far as I know.
In F# the function makes use of a tuple to do this so if we want to create the function in C# then we need to define a tuple object first.
public class Tuple<TFirst, TSecond> { private readonly TFirst first; private readonly TSecond second; public Tuple(TFirst first, TSecond second) { this.first = first; this.second = second; } public TFirst First { get { return first; } } public TSecond Second { get { return second; } } }
public static class IEnumerableExtensions { public static Tuple<IEnumerable<T>, IEnumerable<T>> Partition<T>(this IEnumerable<T> enumerableOf, Func<T, bool> predicate) { var positives = enumerableOf.Where(predicate); var negatives = enumerableOf.Where(e => !predicate(e)); return new Tuple<IEnumerable<T>, IEnumerable<T>>(positives, negatives); } }
I'm not sure of the best way to write this function – at the moment we end up creating two iterators to cover the two different filters that we're running over the collection which seems a bit strange.
In F# 'partition' is on List so the whole collection would be evaluated whereas in this case we're still only evaluating each item as it's needed so maybe there isn't a way to do it without using two iterators.
If we wanted to use this function to get the evens and odds from a collection we could write the following code:
var evensAndOdds = Enumerable.Range(1, 10).Partition(x => x % 2 == 0); var evens = evensAndOdds.First; var odds = evensAndOdds.Second;
The other thing that's nice about F# is that we can assign the result of the expression to two separate values in one go and I don't know of a way to do that in C#.
let evens, odds = [1..10] |> List.partition (fun x -> x % 2 = 0)
We don't need to have the intermediate variable 'evensAndOdds' which doesn't really add much to the code.
I'd be interested in knowing if there's a better way to do this than what I'm trying out.
Functional collectional parameters: Some thoughts
I've been reading through a bit of Steve Freeman and Nat Pryce's 'Growing Object Oriented Software guided by tests' book and I found the following observation in chapter 7 quite interesting:
When starting a new area of code, we might temporarily suspend our design judgment and just write code without attempting to impose much structure.
It's interesting that they don't try and write perfect code the first time around which is actually something I thought experienced developers did until I came across Uncle Bob's Clean Code book where he suggested something similar.
One thing I've noticed when working with collections is that if we want to do something more complicated than just doing a simple map or filter then I find myself initially trying to work through the problem in an imperative hacky way.
When pairing it sometimes also seems easier to talk through the code in an imperative way and then after we've got that figured out then we can work out a way to solve the problem in a more declarative way by making use of functional collection parameters.
An example of this which we came across recently was while looking to parse a file which had data like this:
some,random,data,that,i,made,up
The file was being processed later on and the values inserted into the database in field order. The problem was that we had removed two database fields so we needed to get rid of the 2nd and 3rd values from each line.
var stringBuilder = new StringBuilder(); using (var sr = new StreamReader("c:\\test.txt")) { string line; while ((line = sr.ReadLine()) != null) { var values = line.Split(','); var localBuilder = new StringBuilder(); var count = 0; foreach (var value in values) { if (!(count == 1 || count == 2)) { localBuilder.Append(value); localBuilder.Append(","); } count++; } stringBuilder.AppendLine(localBuilder.ToString().Remove(localBuilder.ToString().Length - 1)); } } using(var writer = new StreamWriter("c:\\newfile.txt")) { writer.Write(stringBuilder.ToString()); writer.Flush(); }
If we wanted to refactor that to use a more declarative style then the first thing we'd look to change is the for loop populating the localBuilder.
We have a temporary 'count' variable which is keeping track of which column we're up to and suggests that we should be able to use one of the higher order functions over collection which allows us to refer to the index of the item.
In this case we can use the 'Where' function to achieve this:
... while ((line = sr.ReadLine()) != null) { var localBuilder = line.Split(','). Where((_, index) => !(index == 1 || index == 2)). Aggregate(new StringBuilder(), (builder, v) => builder.Append(v).Append(",")); stringBuilder.AppendLine(localBuilder.ToString().Remove(localBuilder.ToString().Length - 1)); }
I've been playing around with 'Aggregate' a little bit and it seems like it's quite easy to overcomplicate code using that. It also seems that when using 'Aggregate' it makes sense if the method that we call on our seed returns itself rather than void.
I didn't realise that 'Append' did that so my original code was like this:
var localBuilder = line.Split(','). Where((_, index) => !(index == 1 || index == 2)). Aggregate(new StringBuilder(), (builder, v) => { builder.Append(v); builder.Append(","); return builder; });
I think if we end up having to call functions which return void or some other type then it would probably make sense to add on an extension method which allows us to use the object in a fluent interface style.
Of course this isn't the best solution since we would ideally avoid the need to remove the last character to get rid of the trailling comma which could be done by creating an array of values and then using 'String.Join' on that.
Given that I still think the solution written using functional collection parameters is easier to follow since we've managed to get rid of two variable assignments which weren't interesting as part of what we wanted to do but were details about that specific implementation.
Coding: Missing abstractions and LINQ
Something which I've noticed quite a lot on the projects that I've worked on since C# 3.0 was released is that lists seem to be passed around code much more and have LINQ style filters and transformations performed on them while failing to describe the underlying abstraction explcitly in the code.
As a result of this we quite frequently we end up with this code being in multiple places and since it's usually not very much code the repetition goes unnoticed more than other types of duplication might do.
A typical example of this might be the following:
public class SomeFooHolder { public List<Foo> Foos { get; set } }
An example of how this might be used is like so:
var someFooHolder = new FooHolder(...); someFooHolder.Foos.Select(f => f.Completed);
That code would typically be repeated in other places in the code where we want to get all the completed foos.
Although it's a simple change, as a first step I prefer to make that concept more explicit by putting 'CompletedFoos' on 'SomeFooHolder':
public class SomeFooHolder { public List<Foo> Foos { get; set; } public List<Foo> CompletedFoos { get { return Foos.Select(f => f.Completed); } } }
Perhaps an even better solution would be to create an object 'Foos' to encapsulate that logic further:
public class Foos { private readonly List<Foo> foos; public Foos(List<Foo> foos) { this.foos = new List<Foo>(foos.AsReadOnly()); } public Foos Completed { get { return new Foos(foos.Select(f => f.Completed)); } } }
As I've written about previously I prefer to wrap the list rather than extend it as the API of 'Foos' is more expressive since we don't have all the list operations available to any potential users of the class.
C# Test Builder Pattern: My current thinking
I've written previously about the test builder pattern in C# and having noticed some different implementations of this pattern I thought it'd be interesting to post my current thinking on how to use it.
One thing I've noticed is that we often end up just creating methods which effectively act as setters rather than easing the construction of an object.
This seems to happen most commonly when the value we want to set is a boolean value. The following is quite typical:
public CarBuilder WithSomething(boolean value) { this.something = value; return this; }
The usage would be like so:
new CarBuilder().WithSomething(true).Build();
It doesn't read too badly but it seems to be unnecessary typing for the user of the API. If we're going to use the fluent interface approach then I would prefer to have two methods, defined like so:
public CarBuilder WithSomething() { this.something = true; return this; } public CarBuilder WithoutSomething() { this.something = false; return this; }
We could then use this like so:
new CarBuilder().WithSomething().Build(); ... new CarBuilder().WithoutSomething().Build();
That requires more code but I think it's a bit more expressive and makes life easier for the user of the API.
An alternative approach which my colleague Lu Ning showed me and which I think is actually better is to make use of the object initializer syntax if all we have are setter methods on a builder.
We might therefore end up with something like this:
public class FooBuilder { public string Something = "DefaultSomething"; public boolean SomethingElse = false; public Foo Build() { return new Foo(Something, SomethingElse); } }
new FooBuilder { SomethingElse = true }.Build();
With this approach we end up writing less code and although we use public fields on the builder I don't think it's a big deal since it allows us to achieve our goal more quickly. If we need other methods that take out the complexity of construction then we can easily just add those as well.
Another thing to note when using this pattern is that we don't need to override all the object's attributes on every single test. We only need to override those ones which we are using in our test. The rest of the values can just be defaulted.
[Test] public void ShouldDoSomething() { var foo = new FooBuilder { Bar = "myBar", Baz = "myBaz", SomethingElse = "mySE", AndAgain = "..." }.Build(); // and then further on we only check the value of Bar for example Assert.That(someValue, Is.EqualTo("myBar"); }
We don't need to specifically set any of the values except 'Bar' because they are irrelevant and create a clutter in the test which means it takes longer to understand what's going on.
This would be preferable:
[Test] public void ShouldDoSomething() { var foo = new FooBuilder { Bar = "myBar" }.Build(); // and then further on we only check the value of Bar for example Assert.That(someValue, Is.EqualTo("myBar"); }
Something which I've been wondering about recently is understanding the best way to describe the case where we don't want to define a specific value i.e. we want it as null.
I'd normally just set it to null like so:
new FooBuilder { Bar = null }.Build();
But we could make the API have a specific method and I haven't really decided whether there's much value to doing so yet:
new FooBuilder().WithNoBar().Build();