Archive for the ‘.NET’ Category
Functional C#: Using Join and GroupJoin
An interesting problem which I've come across a few times recently is where we have two collections which we want to use together in some way and get a result which could either be another collection or some other value.
In one which Chris and I were playing around with we had a collection of years and a collection of cars with corresponding years and the requirement was to show all the years on the page with the first car we found for that year or an empty value if there was no car for that year.
We effectively needed to do a left join on the cars collection.
This is an imperative way of solving the problem:
public class Car { public int Year { get; set; } public string Description { get; set; } }
var years = new[] { 2000, 2001, 2002, 2003 }; var cars = new[] { new Car { Year = 2000, Description = "Honda" }, new Car { Year = 2003, Description = "Ford" } }; var newCars = new List<Car>(); foreach (var year in years) { var car = cars.Where(x => x.Year == year).FirstOrDefault() ?? new Car { Year = year, Description = ""}; newCars.Add(car); }
We can actually achieve the same result in a more declarative way by making use of 'GroupJoin':
var newCars = years.GroupJoin(cars, year => year, car => car.Year, (year, theCars) => theCars.FirstOrDefault() ?? new Car { Year = year, Description = "" });
'GroupJoin' is useful if we want to keep all of the items in the first collection and get a collection of the items in the second collection which match for the specified keys.
In this case it allows us to identify where there are no matching cars for a specific year and then just set a blank description for those years.
One nice side effect is that if we later want to include multiple cars for a year then we shouldn't have to change the code too much to achieve that.
Another example which I came across is where we have one collection which contains filter criteria which it needs to apply against the other collection.
We have a collection of years and need to indicate whether there is a matching car for each of those years.
[Test] public void JoinExample() { var years = new[] { 2000, 2003 }; var cars = new[] { new Car { Year = 2000, Description = "Honda" }, new Car { Year = 2003, Description = "Ford" }, new Car { Year = 2003, Description = "Mercedes"}}; Assert.That(AreThereMatchingCars(years, cars), Is.True); }
public bool AreThereMatchingCars(IEnumerable<int> years, IEnumerable<Car> cars) { foreach (var year in years) { if(cars.Where(c => c.Year == year).Count() == 0) { return false; } } return true; }
We can rewrite this function like so:
public bool AreThereMatchingCars(IEnumerable<int> years, IEnumerable<Car> cars) { var distinctCars = cars.GroupBy(x => x.Year).Select(x => x.First()); return years.Join(distinctCars, y => y, c => c.Year, (y, c) => c).Count() == years.Count(); }
This actually become more complicated than we expected because we were working out if there were matching cars for each of the specified years by checking the number of filter items and then comparing it to the number of items when we joined that collection with our collection of cars.
If we have more than one car for the same year that logic falls down so we needed to get just one car per year which is what the first line of the function does.
I can't decide whether or not the code is easier to read and understand by making use of these functions but it's an approach that I picked up when playing around with F# so it's interesting that it can still be applied in C# code as well.
C#: Overcomplicating with LINQ
I recently came across an interesting bit of code which was going through a collection of strings and then only taking the first 'x' number of characters and discarding the rest.
The code looked roughly like this:
var words = new[] {"hello", "to", "the", "world"}; var newWords = new List<string>(); foreach (string word in words) { if (word.Length > 3) { newWords.Add(word.Substring(0, 3)); continue; } newWords.Add(word); }
For this initial collection of words we would expect 'newWords' to contain ["hel", "to", "the", "wor"]
In a way it's quite annoying that the API for 'Substring' throws an exception if you try and get just the first 3 characters of a string which contains less than 3 characters. If it didn't do that then we would have an easy 'Select' call on the collection.
Instead we have an annoying if statement which stops us from treating the collection as a whole – we do two different things depending on whether or not the string contains more than 3 characters.
In the spirit of the transformational mindset I tried to write some code using functional collection parameters which didn't make use of an if statement.
Following this idea we pretty much have to split the collection into two resulting in this initial attempt:
var newWords = words .Where(w => w.Length > 3) .Select(w => w.Substring(0, 3)) .Union(words.Where(w => w.Length <= 3).Select(w => w));
This resulted in a collection containing ["hel", "wor", "to", "the"] which is now in a different order to the original!
To keep the original order I figured that we needed to keep track of the original index position of the words, resulting in this massively overcomplicated version:
var wordsWithIndex = words.Select((w, index) => new { w, index }); var newWords = wordsWithIndex .Where(a => a.w.Length >= 3) .Select((a, index) => new {w = a.w.Substring(0, 3), a.index}) .Union(wordsWithIndex.Where(a => a.w.Length < 3).Select(a => new { a.w, a.index })) .OrderBy(a => a.index);
We end up with a collection of anonymous types from which we can get the transformed words but it's a far worse solution than any of the others because it takes way longer to understand what's going on.
I couldn't see a good way to make use of functional collection parameters to solve this problem but luckily at this stage Chris Owen came over and pointed out that we could just do this:
var newWords = words.Select(w => w.Length > 3 ? w.Substring(0, 3) : w);
I'd been trying to avoid doing what is effectively an if statement inside a 'Select' but I think in this case it makes a lot of sense and results in a simple and easy to read solution.
C#: A lack of covariance with generics example
One of the things I find most confusing when reading about programming languages is the idea of covariance and contravariance and while I've previously read that covariance is not possible when using generics in C# I recently came across an example where I saw that this was true.
I came across this problem while looking at how to refactor some code which has been written in an imperative style:
public interface IFoo { string Bar { get; set; } } public class Foo : IFoo { public string Bar { get; set; } }
private IEnumerable<IFoo> GetMeFoos() { var someStrings = new[] { "mike", "mark" }; var someFoos = new List<IFoo>(); foreach (var s in someStrings) { someFoos.Add(new Foo { Bar = s }); } return someFoos; }
I changed the code to read like so:
private IEnumerable<IFoo> GetMeFoos() { var someStrings = new[] { "mike", "mark" }; return someStrings.Select(s => new Foo { Bar = s }); }
Which fails with the following compilation error:
Error 1 Cannot implicitly convert type 'System.Collections.Generic.IEnumerable<Test.Foo>' to 'System.Collections.Generic.IEnumerable<Test.IFoo>'. An explicit conversion exists (are you missing a cast?)
I thought the compiler would infer that I actually wanted a collection of 'IFoo' given that I was returning from the method directly after the call to Select but it doesn't.
As I understand it the reason that we can't downcast an IEnumerable of 'Foo' to an IEnumberable of 'IFoo' is that we would run into problems if we worked of the assumption that our original collection only contained Foos in it later on in our program.
For example it would be possible to add any item which implemented the 'IFoo' interface into the collection even if it wasn't a 'Foo':
// this code won't compile List<Foo> foos = new List<Foo>(); // add some foos List<IFoo> ifoos = foos; foos.Add(new SomeOtherTypeThatImplementsIFoo());
It's not possible to convert 'SomeOtherTypeThatImplementsIFoo' to 'Foo' so we would run ourself into problems.
Rick Byers has a post from a few years ago where he explains how this works in more detail and also points out that covariance of generics is actually supported by the CLR, just not by C#.
In the case I described we can get around the problem by casting 'Foo' to 'IFoo' inside the 'Select':
private IEnumerable<IFoo> GetMeFoos() { var someStrings = new[] { "mike", "mark" }; return someStrings.Select(s => (IFoo) new Foo { Bar = s }); }
C#: Causing myself pain with LINQ's delayed evaluation
I recently came across some code was imperatively looping through a collection and then mapping each value to go to something else by using an injected dependency to do that.
I thought I'd try to make use of functional collection parameters to try and simplify the code a bit but actually ended up breaking one of the tests.
About a month ago I wrote about how I'd written a hand rolled stub to simplify a test and this was actually where I caused myself the problem!
The hand rolled stub was defined like this:
public class AValueOnFirstCallThenAnotherValueService : IService { private int numberOfCalls = 0; public string SomeMethod(string parameter) { if(numberOfCalls == 0) { numberOfCalls++; return "aValue"; } else { numberOfCalls++; return "differentValue"; } } }
The test was something like this:
[Test] public void SomeTest() { var fooOne = new Foo { Bar = "barOne" }; var fooTwo = new Foo { Bar = "barTwo" }; var aCollectionOfFoos = new List<Foo> { fooOne, fooTwo }; var service = new AValueOnFirstCallThenAnotherValueService(); var someObject = new SomeObject(service); var fooBars = someObject.Method(aCollectionOfFoos); Assert.That(fooBars[0].Other, Is.EqualTo("aValue")); // and so on }
The object under test looked something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | public class SomeObject { private IService service; public SomeObject(IService service) { this.service = service; } public IEnumerable<FooBar> Method(List<Foo> foos) { var fooBars = new List<FooBar(); foreach(var foo in foos) { fooBars.Add(new FooBar { Bar = foo.Bar, Other = service.SomeMethod(foo.Bar) }; } // a bit further down var sortedFooBars = fooBars.OrderBy(f => f.Other); return fooBars; } } |
I decided to try and incrementally refactor the code like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | public class SomeObject { ... public IEnumerable<FooBar> Method(List<Foo> foos) { var fooBars = foos.Select(f => new FooBar { Bar = f.Bar, Other = service.SomeMethod(f.Bar) }; // a bit further down var sortedFooBars = fooBars.OrderBy(f => f.Other); return fooBars; } } |
I ran the tests after doing this and the test I described above failed – it was expecting a return value for 'Other' of 'aValue' but was actually returning 'differentValue'.
I was a bit confused about what was going on until I started watching what the test was doing through the debugger and realised that on the 'OrderBy' call on line 10 the 'Select' call on line 7 was being reevaluated which meant that the value returned by 'service.SomeMethod' would be 'differentValue' since it was being called for the 3rd and 4th time and it's set up to return 'aValue' only on the 1st time.
The way to get around this problem was to force the evaluation of 'fooBars' to happen immediately by calling 'ToList()':
1 2 3 4 5 6 7 8 9 10 11 | public class SomeObject { ... public IEnumerable<FooBar> Method(List<Foo> foos) { var fooBars = foos.Select(f => new FooBar { Bar = f.Bar, Other = service.SomeMethod(f.Bar) }.ToList(); ... } } |
In this case it was fairly easy to identify the problem but I've written similar code before which has ended up reordering collections with thousands of items in because it's been lazy evaluated every time the collection is needed.
In Jeremy Miller's article about functional C# he suggests the idea of memoization as an optimisation technique to stop expensive calls being made more times than they need to be so perhaps this would be another way to solve the problem although I haven't tried that approach before.
F#: Passing an argument to a member constraint
I've written previously about function overloading in F# and my struggles working out how to do it and last week I came across the concept of inline functions and statically resolved parameters as a potential way to solve that problem.
I came across a problem where I thought I would be able to make use of this while playing around with some code parsing Xml today.
I had a 'descendants' function which I wanted to be applicable against 'XDocument' and 'XElement' so I originally just defined the functions separately forgetting that the compiler wouldn't allow me to do so as we would have a duplicate definition of the function:
let descendants name (xDocument:XDocument) = xDocument.Descendants name let descendants name (xElement:XElement) = xElement.Descendants name
I wanted to make use of the inline function to define a function which would allow any type which supported the 'Descendants' member:
let inline descendants name (xml:^x) = (^x : (member Descendants : XName -> seq<XElement>) (xml))
I couldn't work out how I could pass the 'name' input parameter to 'Descendants' so I was getting the following error:
expected 2 expressions, got 1
I posted the problem to StackOverflow and 'Brian' pointed out the syntax that would allow me to do what I wanted:
let inline descendants name (xml:^x) = (^x : (member Descendants : XName -> seq<XElement>) (xml,name))
Tomas Petricek pointed out that in this case we could just write a function which took in 'XContainer' since both the other two types derive from that anyway:
let descendants name (xml:XContainer) = xml.Descendants name
In this situation that certainly makes more sense but it's good to know how to write the version using member constraints for any future problems I come across.
F#: Unexpected identifier in implementation file
I've been playing around with some F# code this evening and one of the bits of code needs to make a HTTP call and return the result.
I wrote this code and then tried to make use of the 'Async.RunSynchronously' function to execute the call.
The code I had looked roughly like this:
namespace Twitter module RetrieveLinks open System.Net open System.IO open System.Web open Microsoft.FSharp.Control let AsyncHttp (url:string) = async { let request = HttpWebRequest.Create(url) let! response = request.AsyncGetResponse() let stream = response.GetResponseStream() use reader = new StreamReader(stream ) return! reader.AsyncReadToEnd() } let getData = let request = "http://some.url" AsyncHttp <| request Async.RunSynchronously getData
The problem was I was getting the following error on the last line:
Error 3 Unexpected identifier in implementation file
I've seen that error before and it often means that you haven't imported a reference correctly and hence the compiler doesn't know what you're trying to refer to.
In this case I was fairly sure all my references were correct and I was still getting the same error when I used the full namespace to 'Async.RunSynchronously' which seemed to suggest I'd done something else wrong.
After comparing this file with another one which was quite similar but didn't throw this error I realised that I'd left of the '=' after the module definition. Putting that in solved the problem.
namespace Twitter module RetrieveLinks = // and so on
As I understand it if we don't use the '=' then we've created a top level module declaration and if we do use the '=' then we've created a local module declaration.
You do not have to indent declarations in a top-level module. You do have to indent all declarations in local modules. In a local module declaration, only the declarations that are indented under that module declaration are part of the module.
Given this understanding another way to solve my problem would be to remove the indentation of the functions inside the module like so:
module RetrieveLinks open System.Net open System.IO open System.Web open Microsoft.FSharp.Control // and so on until... Async.RunSynchronously getData
That compiles as expected.
From reading the MSDN page it would suggest that in my first example I'd created a top level module declaration but indenting the code inside that module somehow meant that the 'Async.RunSynchronously' function wasn't recognised.
I don't quite understand why that is so if anyone can enlighten me that would be cool!
F#: Inline functions and statically resolved type parameters
One thing which I've often wondered when playing around with F# is that when writing the following function the type of the function is inferred to be 'int -> int -> int' rather than allowing any values which can be added together:
let add x y = x + y > val add : int -> int -> int
It turns out if you use the 'inline' keyword then the compiler does exactly what we want:
> let inline add x y = x + y val inline add : ^a -> ^b -> ^c when ( ^a or ^b) : (static member ( + ) : ^a * ^b -> ^c)
Without the inline modifier type inference forces the function to take a specific type, in this case int. With it the function has a statically resolved type parameter which means that "the type parameter is replaced with an actual type at compile time rather than run time".
In this case it's useful to us because it allows us to implicitly define a member constraint on the two input parameters to 'add'. From the MSDN page:
Statically resolved type parameters are primarily useful in conjunction with member constraints, which are constraints that allow you to specify that a type argument must have a particular member or members in order to be used. There is no way to create this kind of constraint by using a regular generic type parameter.
The neat thing about the second definition is that we can add values of any types which support the '+' operator:
add "mark" "needham";; > val it : string = "markneedham"
> add 1.0 2.0;; val it : float = 3.0
From a quick look at the IL code in Reflector it looks like the 'add' function defined here makes use of the 'AdditionDynamic' function internally to allow it to be this flexible.
One thing which I found quite interesting while reading about inline functions is that it sounds like it's quite similar to duck typing in that we're saying a function can be passed any value which supports a particular method.
Michael Giagnocavo has a post where he covers the idea of statically type resolved parameters in more detail and describes what he refers to as 'statically typed duck typing'.
Functional C#: Extracting a higher order function with generics
While working on some code with Toni we realised that we'd managed to create two functions that were almost exactly the same except they made different service calls and returned collections of a different type.
The similar functions were like this:
private IEnumerable<Foo> GetFoos(Guid id) { IEnumerable<Foo> foos = new List<Foo>(); try { foos = fooService.GetFoosFor(id); } catch (Exception e) { // do some logging of the exception } return foos; }
private IEnumerable<Bar> GetBars(Guid id) { IEnumerable<Bar> bars = new List<Bar>(); try { bars = barService.GetBarsFor(id); } catch (Exception e) { // do some logging of the exception } return bars; }
We're defining the empty lists so that if the service throws an exception we can make use of an empty list further on in the code. A failure of the service in this context doesn't mean that the application should stop functioning.
My thinking here was that we should be able to pull out the service calls into a function but the annoying thing is that they return different types of collections so I initially thought that we'd be unable to remove the duplication.
Thinking about the problem later on I realised we could just define the return value of the service call in the function to use generics.
We therefore end up with this solution:
private IEnumerable<Bar> GetBars(Guid id) { return GetValues(() => barService.GetBarsFor(id)); }
private IEnumerable<Foo> GetFoos(Guid id) { return GetValues(() => fooService.GetFoosFor(id)); }
private IEnumerable<T> GetValues<T>(Func<IEnumerable<T>> getValues) { IEnumerable<T> values = new List<T>(); try { values = getValues(); } catch (Exception e) { // do some logging of the exception } return values; }
I think the code is still quite readable and it's relatively obvious what it's supposed to be doing.
F#: function keyword
I've been browsing through Chris Smith's Programming F# book and in the chapter on pattern matching he describes the 'function' key word which I haven't used before.
It's used in pattern matching expressions when we want to match against one of the parameters passed into the function which contains the pattern match.
For example if we have this somewhat contrived example:
let isEven value = match value with | x when (x % 2) = 0 -> true | _ -> false
That could be rewritten using the function keyword to the following:
let isEven = function | x when (x % 2) = 0 -> true | _ -> false
It's a relatively straight forward way to simplify code like this although one thing I noticed while looking back through some old code I've written is that if we use this syntax then we need to ensure that the parameter we want to pattern match against is passed as the last parameter to a function.
For example this function which is used to parse the arguments passed to a script was originally written like this:
let GetArgs initialArgs = let rec find args matches = match args with | hd::_ when hd = "--" -> List.to_array (matches) | hd::tl -> find tl (hd::matches) | [] -> Array.empty find (List.rev (Array.to_list initialArgs) ) []
If we want to use 'function' then we'd need to put 'args' implicitly as the second argument passed to the recursive 'find' function:
let GetArgs initialArgs = let rec find matches = function | hd::_ when hd = "--" -> List.to_array (matches) | hd::tl -> find (hd::matches) tl | [] -> Array.empty find [] (List.rev (Array.to_list initialArgs) )
I'm not sure that the resulting code is necessarily more intention revealing if the function has more than one argument passed to it. The second version of this function could be very confusing if you didn't know what the 'function' keyword actually did.
Functional C#: LINQ vs Method chaining
One of the common discussions that I've had with several colleagues when we're making use of some of the higher order functions that can be applied on collections is whether to use the LINQ style syntax or to chain the different methods together.
I tend to prefer the latter approach although when asked the question after my talk at Developer Developer Developer I didn't really have a good answer other than to suggest that it seemed to just be a personal preference thing.
Damian Marshall suggested that he preferred the method chaining approach because it more clearly describes the idea of passing a collection through a pipeline where we can apply different operations to that collection.
I quite like that explanation and I think my preference for it would have probably been influenced by the fact that when coding in F# we can use the forward piping operator to achieve code which reads like this.
For example if we had a list and wanted to get all the even numbers, double them and then add them up we might do this:
[1..10] |> List.filter (fun x -> x % 2 = 0) |> List.map (fun x -> x * 2) |> List.fold (fun acc x -> acc + x) 0
If I was in C# I'd probably do this:
Enumerable.Range(1, 10) .Where(x => x % 2 == 0) .Select(x => x * 2) .Sum(x => x);
I found it quite difficult to work out what the equivalent LINQ syntax would be because I don't use it but I think something like this would be what you'd need to write to do the same thing:
from x in Enumerable.Range(1, 10) where x%2 == 0 select x * 2).Sum(x => x);
I'm not sure if there's a way to do the sum within the LINQ statement or whether you need to do it using the method as I have here.
Even just writing this example I found that the way I had to write the LINQ code seemed quite counter intuitive for me with the way that I typically try to solve problems like this.
At least now thanks to Damian I now understand why that is!