Archive for the ‘c#’ tag
Functional C#: Extracting a higher order function with generics
While working on some code with Toni we realised that we’d managed to create two functions that were almost exactly the same except they made different service calls and returned collections of a different type.
The similar functions were like this:
private IEnumerable<Foo> GetFoos(Guid id) { IEnumerable<Foo> foos = new List<Foo>(); try { foos = fooService.GetFoosFor(id); } catch (Exception e) { // do some logging of the exception } return foos; }
private IEnumerable<Bar> GetBars(Guid id) { IEnumerable<Bar> bars = new List<Bar>(); try { bars = barService.GetBarsFor(id); } catch (Exception e) { // do some logging of the exception } return bars; }
We’re defining the empty lists so that if the service throws an exception we can make use of an empty list further on in the code. A failure of the service in this context doesn’t mean that the application should stop functioning.
My thinking here was that we should be able to pull out the service calls into a function but the annoying thing is that they return different types of collections so I initially thought that we’d be unable to remove the duplication.
Thinking about the problem later on I realised we could just define the return value of the service call in the function to use generics.
We therefore end up with this solution:
private IEnumerable<Bar> GetBars(Guid id) { return GetValues(() => barService.GetBarsFor(id)); }
private IEnumerable<Foo> GetFoos(Guid id) { return GetValues(() => fooService.GetFoosFor(id)); }
private IEnumerable<T> GetValues<T>(Func<IEnumerable<T>> getValues) { IEnumerable<T> values = new List<T>(); try { values = getValues(); } catch (Exception e) { // do some logging of the exception } return values; }
I think the code is still quite readable and it’s relatively obvious what it’s supposed to be doing.
Functional C#: LINQ vs Method chaining
One of the common discussions that I’ve had with several colleagues when we’re making use of some of the higher order functions that can be applied on collections is whether to use the LINQ style syntax or to chain the different methods together.
I tend to prefer the latter approach although when asked the question after my talk at Developer Developer Developer I didn’t really have a good answer other than to suggest that it seemed to just be a personal preference thing.
Damian Marshall suggested that he preferred the method chaining approach because it more clearly describes the idea of passing a collection through a pipeline where we can apply different operations to that collection.
I quite like that explanation and I think my preference for it would have probably been influenced by the fact that when coding in F# we can use the forward piping operator to achieve code which reads like this.
For example if we had a list and wanted to get all the even numbers, double them and then add them up we might do this:
[1..10] |> List.filter (fun x -> x % 2 = 0) |> List.map (fun x -> x * 2) |> List.fold (fun acc x -> acc + x) 0
If I was in C# I’d probably do this:
Enumerable.Range(1, 10) .Where(x => x % 2 == 0) .Select(x => x * 2) .Sum(x => x);
I found it quite difficult to work out what the equivalent LINQ syntax would be because I don’t use it but I think something like this would be what you’d need to write to do the same thing:
from x in Enumerable.Range(1, 10) where x%2 == 0 select x * 2).Sum(x => x);
I’m not sure if there’s a way to do the sum within the LINQ statement or whether you need to do it using the method as I have here.
Even just writing this example I found that the way I had to write the LINQ code seemed quite counter intuitive for me with the way that I typically try to solve problems like this.
At least now thanks to Damian I now understand why that is!
Functional C#: Writing a ‘partition’ function
One of the more interesting higher order functions that I’ve come across while playing with F# is the partition function which is similar to the filter function except it returns the values which meet the predicate passed in as well as the ones which don’t.
I came across an interesting problem recently where we needed to do exactly this and had ended up taking a more imperative for each style approach to solve the problem because this function doesn’t exist in C# as far as I know.
In F# the function makes use of a tuple to do this so if we want to create the function in C# then we need to define a tuple object first.
public class Tuple<TFirst, TSecond> { private readonly TFirst first; private readonly TSecond second; public Tuple(TFirst first, TSecond second) { this.first = first; this.second = second; } public TFirst First { get { return first; } } public TSecond Second { get { return second; } } }
public static class IEnumerableExtensions { public static Tuple<IEnumerable<T>, IEnumerable<T>> Partition<T>(this IEnumerable<T> enumerableOf, Func<T, bool> predicate) { var positives = enumerableOf.Where(predicate); var negatives = enumerableOf.Where(e => !predicate(e)); return new Tuple<IEnumerable<T>, IEnumerable<T>>(positives, negatives); } }
I’m not sure of the best way to write this function – at the moment we end up creating two iterators to cover the two different filters that we’re running over the collection which seems a bit strange.
In F# ‘partition’ is on List so the whole collection would be evaluated whereas in this case we’re still only evaluating each item as it’s needed so maybe there isn’t a way to do it without using two iterators.
If we wanted to use this function to get the evens and odds from a collection we could write the following code:
var evensAndOdds = Enumerable.Range(1, 10).Partition(x => x % 2 == 0); var evens = evensAndOdds.First; var odds = evensAndOdds.Second;
The other thing that’s nice about F# is that we can assign the result of the expression to two separate values in one go and I don’t know of a way to do that in C#.
let evens, odds = [1..10] |> List.partition (fun x -> x % 2 = 0)
We don’t need to have the intermediate variable ‘evensAndOdds’ which doesn’t really add much to the code.
I’d be interested in knowing if there’s a better way to do this than what I’m trying out.
Functional collectional parameters: Some thoughts
I’ve been reading through a bit of Steve Freeman and Nat Pryce’s ‘Growing Object Oriented Software guided by tests‘ book and I found the following observation in chapter 7 quite interesting:
When starting a new area of code, we might temporarily suspend our design judgment and just write code without attempting to impose much structure.
It’s interesting that they don’t try and write perfect code the first time around which is actually something I thought experienced developers did until I came across Uncle Bob’s Clean Code book where he suggested something similar.
One thing I’ve noticed when working with collections is that if we want to do something more complicated than just doing a simple map or filter then I find myself initially trying to work through the problem in an imperative hacky way.
When pairing it sometimes also seems easier to talk through the code in an imperative way and then after we’ve got that figured out then we can work out a way to solve the problem in a more declarative way by making use of functional collection parameters.
An example of this which we came across recently was while looking to parse a file which had data like this:
some,random,data,that,i,made,up
The file was being processed later on and the values inserted into the database in field order. The problem was that we had removed two database fields so we needed to get rid of the 2nd and 3rd values from each line.
var stringBuilder = new StringBuilder(); using (var sr = new StreamReader("c:\\test.txt")) { string line; while ((line = sr.ReadLine()) != null) { var values = line.Split(','); var localBuilder = new StringBuilder(); var count = 0; foreach (var value in values) { if (!(count == 1 || count == 2)) { localBuilder.Append(value); localBuilder.Append(","); } count++; } stringBuilder.AppendLine(localBuilder.ToString().Remove(localBuilder.ToString().Length - 1)); } } using(var writer = new StreamWriter("c:\\newfile.txt")) { writer.Write(stringBuilder.ToString()); writer.Flush(); }
If we wanted to refactor that to use a more declarative style then the first thing we’d look to change is the for loop populating the localBuilder.
We have a temporary ‘count’ variable which is keeping track of which column we’re up to and suggests that we should be able to use one of the higher order functions over collection which allows us to refer to the index of the item.
In this case we can use the ‘Where’ function to achieve this:
... while ((line = sr.ReadLine()) != null) { var localBuilder = line.Split(','). Where((_, index) => !(index == 1 || index == 2)). Aggregate(new StringBuilder(), (builder, v) => builder.Append(v).Append(",")); stringBuilder.AppendLine(localBuilder.ToString().Remove(localBuilder.ToString().Length - 1)); }
I’ve been playing around with ‘Aggregate’ a little bit and it seems like it’s quite easy to overcomplicate code using that. It also seems that when using ‘Aggregate’ it makes sense if the method that we call on our seed returns itself rather than void.
I didn’t realise that ‘Append’ did that so my original code was like this:
var localBuilder = line.Split(','). Where((_, index) => !(index == 1 || index == 2)). Aggregate(new StringBuilder(), (builder, v) => { builder.Append(v); builder.Append(","); return builder; });
I think if we end up having to call functions which return void or some other type then it would probably make sense to add on an extension method which allows us to use the object in a fluent interface style.
Of course this isn’t the best solution since we would ideally avoid the need to remove the last character to get rid of the trailling comma which could be done by creating an array of values and then using ‘String.Join’ on that.
Given that I still think the solution written using functional collection parameters is easier to follow since we’ve managed to get rid of two variable assignments which weren’t interesting as part of what we wanted to do but were details about that specific implementation.
Coding: Missing abstractions and LINQ
Something which I’ve noticed quite a lot on the projects that I’ve worked on since C# 3.0 was released is that lists seem to be passed around code much more and have LINQ style filters and transformations performed on them while failing to describe the underlying abstraction explcitly in the code.
As a result of this we quite frequently we end up with this code being in multiple places and since it’s usually not very much code the repetition goes unnoticed more than other types of duplication might do.
A typical example of this might be the following:
public class SomeFooHolder { public List<Foo> Foos { get; set } }
An example of how this might be used is like so:
var someFooHolder = new FooHolder(...); someFooHolder.Foos.Select(f => f.Completed);
That code would typically be repeated in other places in the code where we want to get all the completed foos.
Although it’s a simple change, as a first step I prefer to make that concept more explicit by putting ‘CompletedFoos’ on ‘SomeFooHolder’:
public class SomeFooHolder { public List<Foo> Foos { get; set; } public List<Foo> CompletedFoos { get { return Foos.Select(f => f.Completed); } } }
Perhaps an even better solution would be to create an object ‘Foos’ to encapsulate that logic further:
public class Foos { private readonly List<Foo> foos; public Foos(List<Foo> foos) { this.foos = new List<Foo>(foos.AsReadOnly()); } public Foos Completed { get { return new Foos(foos.Select(f => f.Completed)); } } }
As I’ve written about previously I prefer to wrap the list rather than extend it as the API of ‘Foos’ is more expressive since we don’t have all the list operations available to any potential users of the class.
C# Test Builder Pattern: My current thinking
I’ve written previously about the test builder pattern in C# and having noticed some different implementations of this pattern I thought it’d be interesting to post my current thinking on how to use it.
One thing I’ve noticed is that we often end up just creating methods which effectively act as setters rather than easing the construction of an object.
This seems to happen most commonly when the value we want to set is a boolean value. The following is quite typical:
public CarBuilder WithSomething(boolean value) { this.something = value; return this; }
The usage would be like so:
new CarBuilder().WithSomething(true).Build();
It doesn’t read too badly but it seems to be unnecessary typing for the user of the API. If we’re going to use the fluent interface approach then I would prefer to have two methods, defined like so:
public CarBuilder WithSomething() { this.something = true; return this; } public CarBuilder WithoutSomething() { this.something = false; return this; }
We could then use this like so:
new CarBuilder().WithSomething().Build(); ... new CarBuilder().WithoutSomething().Build();
That requires more code but I think it’s a bit more expressive and makes life easier for the user of the API.
An alternative approach which my colleague Lu Ning showed me and which I think is actually better is to make use of the object initializer syntax if all we have are setter methods on a builder.
We might therefore end up with something like this:
public class FooBuilder { public string Something = "DefaultSomething"; public boolean SomethingElse = false; public Foo Build() { return new Foo(Something, SomethingElse); } }
new FooBuilder { SomethingElse = true }.Build();
With this approach we end up writing less code and although we use public fields on the builder I don’t think it’s a big deal since it allows us to achieve our goal more quickly. If we need other methods that take out the complexity of construction then we can easily just add those as well.
Another thing to note when using this pattern is that we don’t need to override all the object’s attributes on every single test. We only need to override those ones which we are using in our test. The rest of the values can just be defaulted.
[Test] public void ShouldDoSomething() { var foo = new FooBuilder { Bar = "myBar", Baz = "myBaz", SomethingElse = "mySE", AndAgain = "..." }.Build(); // and then further on we only check the value of Bar for example Assert.That(someValue, Is.EqualTo("myBar"); }
We don’t need to specifically set any of the values except ‘Bar’ because they are irrelevant and create a clutter in the test which means it takes longer to understand what’s going on.
This would be preferable:
[Test] public void ShouldDoSomething() { var foo = new FooBuilder { Bar = "myBar" }.Build(); // and then further on we only check the value of Bar for example Assert.That(someValue, Is.EqualTo("myBar"); }
Something which I’ve been wondering about recently is understanding the best way to describe the case where we don’t want to define a specific value i.e. we want it as null.
I’d normally just set it to null like so:
new FooBuilder { Bar = null }.Build();
But we could make the API have a specific method and I haven’t really decided whether there’s much value to doing so yet:
new FooBuilder().WithNoBar().Build();
C# Object Initializer: More thoughts
I wrote previously about my dislike of C#’s object initializer syntax and while I still think those arguments hold I came across an interesting argument for why it is a useful feature in Jeremy Miller’s MSDN article on creating internal DSLs in C#.
In the article Jeremy works through an example where he builds up a ‘SendMessageRequest’ first by using a fluent interface and then by making use of object initializer syntax.
The fluent interface example reads like this:
public void SendMessageFluently(FluentMessageSender sender) { sender .SendText("the message body") .From("PARTNER001").To("PARTNER002"); }
Compared with the object initializer version which looks like this:
public void SendMessageAsParameter(ParameterObjectMessageSender sender) { sender.Send(new SendMessageRequest() { Text = "the message body", Receiver = "PARTNER001", Sender = "PARTNER002" }); }
As Jeremy points out:
this third incarnation of the API reduces errors in usage with much simpler mechanics than the fluent interface version.
You need much less code to get the second version to work which my colleague Lu Ning pointed out to me when we were discussing how to use the builder pattern on a project we worked on together.
In general terms the benefit of using object initializer is that we get more expressive code and the intent of that code is clearer than we would be able to achieve by constructing the object using its constructor.
The down side is that it creates a mentality of using setters and the initialisation of the object potentially leaves several fields with null values.
We then end up either getting unexpected null reference exceptions throughout the code or we code defensively and check for nulls whenever we try to access one of the properties. Seeing as one of the goals of the object initializer is to reduce the code we write it’s somewhat ironic that we end up writing this extra code to ensure the object has been setup correctly.
One way to solve this is to make use of tiny types so that the API is still expressive even though what we want to pass in is a string. That takes more code but it allows us to make use of the constructor to construct the object which is something which seems to have lost favour.
C# 4.0 will introduced named parameters into the language which seems like it will help us achieve expressiveness of our code without having to write extra classes and code to achieve this.
* Updated the examples as I had them the wrong way around. Thanks to Corey Haines and Mike Wagg for pointing it out.
DSLs: Violating the builder pattern
I recently came across an interesting post by Dave Thomas where he discussed several domain specific languages (DSLs) he’s come across and suggests that a lot of them seem to be trying too hard to read like the english language instead of focusing on describing a vocabulary for their specific domain
Reading this post reminded me that I fell into this trap earlier in the year while doing some work to create a builder pattern in our code which didn’t need to make use of a ‘Build’ method but instead would make use of C#’s implicit operator to automatically convert the builder to an object at the appropriate moment.
My motivation was to make the code more readable but the code actually ended up being quite misleading for anyone else who came across it.
When someone sees the builder pattern they have an expectation that the method chaining sequence should end with ‘Build()’ so by using the implicit operator I was essentially breaking a well defined DSL for creating new objects.
To add to our problems, we nearly always use ‘var’ to describe local variables in our code base so there we times when you couldn’t tell if what you had was a ‘Foo’ or a ‘FooBuilder’.
Quite frequently the implicit conversion only happens when the variable is passed into another method or construct which explicitly defines which value is needed.
[Test] public void SomeRandomTest() { var foo = new FooBuilder().Foo("someValue"); var someFoos = new Foo[] { foo }; }
In this example ‘foo’ is only implicitly converted to the type ‘Foo’ when it is put into the array and its actual type is ‘FooBuilder’.
In terms of what I was trying to achieve I created a piece of code that was easier to read but more difficult to understand which means that it’s not intention revealing as my colleague Sai Venkatakrishnan pointed out.
It was an interesting experiment but I think in the future I’ll stick with the more obvious approach which is a bit more verbose but a bit more intention revealing as well.
C#: Removing duplication in mapping code with partial classes
One of the problems that we’ve come across while writing the mapping code for our anti corruption layer is that there is quite a lot of duplication of mapping similar types due to the fact that each service has different auto generated classes representing the same data structure.
We are making SOAP web service calls and generating classes to represent the requests and responses to those end points using SvcUtil. We then translate from those auto generated classes to our domain model using various mapper classes.
One example of duplication which really stood out is the creation of a ‘ShortAddress’ which is a data structure consisting of a postcode, suburb and state.
In order to map address we have a lot of code similar to this:
private ShortAddress MapAddress(XsdGeneratedAddress xsdGeneratedAddress) { return new ShortAddress(xsdGeneratedAddress.Postcode, xsdGeneratedAddress.Suburb, xsdGeneratedAddress.State); }
private ShortAddress MapAddress(AnotherXsdGeneratedAddress xsdGeneratedAddress) { return new ShortAddress(xsdGeneratedAddress.Postcode, xsdGeneratedAddress.Suburb, xsdGeneratedAddress.State); }
Where the XsdGeneratedAddress might be something like this:
public class XsdGeneratedAddress { string Postcode { get; } string Suburb { get; } string State { get; } // random other code }
It’s really quite boring code to write and it’s pretty much exactly the same apart from the class name.
I realise here that if we were using a dynamic language we wouldn’t have a problem since we could just write the code as if the object being passed into the method had those properties on it.
Sadly we are in C# which doesn’t yet have that capability!
Luckily for us the SvcUtil generated classes are partial classes so (as Dave pointed out) we can create another partial class which inherits from an interface that we define. We can then refer to types which implement this interface in our mapping code, helping to reduce the duplication.
In this case we create a ‘ShortAddressDTO’ with properties that match those on the auto generated class:
public interface ShortAddressDTO { string Postcode { get; } string Suburb { get; } string State { get; } }
We then need to make the generated classes inherit from this:
public partial class XsdGeneratedAddress : ShortAddressDTO {}
Which means in our mapping code we can now do the following:
private ShortAddress MapAddress(ShortAddressDTO shortAddressDTO) { return xsdGeneratedAddress.ConvertToShortAddress(); }
Which uses this extension method:
public static class ServiceDTOExtensions { public static ShortAddress ConvertToShortAddress(ShortAddressDTO shortAddressDTO) { return new ShortAddress(shortAddressDTO.Postcode, shortAddressDTO.Suburb, shortAddressDTO.State); } }
Which seems much cleaner than what we had to do before.
Functional Collection Parameters: Handling the null collection
One of the interesting cases where I’ve noticed we tend to avoid functional collection parameters in our code base is when there’s the possibility of the collection being null.
The code is on the boundary of our application’s interaction with another service so it is actually a valid scenario that we could receive a null collection.
When using extension methods, although we wouldn’t get a null pointer exception by calling one on a null collection, we would get a ‘source is null’ exception when the expression is evaluated so we need to protect ourself against this.
As a result of defending against this scenario we have quite a lot of code that looks like this:
public IEnumerable<Foo> MapFooMessages(IEnumerable<FooMessage> fooMessages) { var result = new List<Foo>(); if(fooMessagaes != null) { foreach(var fooMessage in fooMessages) { result.Add(new Foo(fooMessage)); } } return result; }
The method that we want to apply here is ‘Select’ and even though we can’t just apply that directly to the collection we can still make use of it.
private IEnumerable<Foo> MapFooMessages(IEnumerable<FooMessage> fooMessages) { if(fooMessages == null) return new List<Foo>(); return fooMessages.Select(eachFooMessage => new Foo(eachFooMessage)); }
There’s still duplication doing it this way though so I pulled it up into a ‘SafeSelect’ extension method:
public static class ICollectionExtensions { public static IEnumerable<TResult> SafeSelect<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector) { return source == null ? new List<TResult>() : source.Select(selector) ; } }
We can then make use of this extension method like so:
private IEnumerable<Foo> MapFooMessages(IEnumerable<FooMessage> fooMessages) { return fooMessages.SafeSelect(eachFooMessage => new Foo(fooMessage)); }
The extension method is a bit different to the original way that we did this as I’m not explicitly converting the result into a list at the end which means that it will only be evaluated when the data is actually needed.
In this particular case I don’t think that decision will have a big impact but it’s something interesting to keep in mind.