Archive for July, 2009
Coding Dojo #20: Groovy Sales Tax Problem
Continuing with the Groovy theme, this week we worked on the ThoughtWorks code review tax problem which involved modeling different items that a customer could buy and the associated tax rules that different types of goods had.
The Format
We had 3 people this week so most of the time we had all 3 of us involved in driving the code, which was projected onto the television screen again, while rotating every 10 minutes or so.
What We Learnt
- Although we weren’t intentionally following the ideas laid out in object calisthenics we actually ended up with code which fairly closely followed those ideas. We had very small classes, didn’t have any getter or setter methods and also wrapped the collection (Basket) which we had in our code.
- We ended up writing tests in classes which were quite context sensitive – normally when I write tests we would have one test class for each object under test but this time we created different classes depending on the way that we were making use of that object in the test. For example we ended up with ‘BasketWithOneItem’ and ‘BasketWithMultipleItems’ test classes by the end.
This seems pretty much like the Context-Spec approach to testing that I’ve read about previously in Scott Bellware’s framework. I’m noticing on the project I’m currently working on that our tests are drifting in that way almost naturally due to the different ways that objects are typically used.
- Dave took the lead in driving out a fluent interface which worked out quite well – we were able to chain together different items by including an ‘and’ method on objects which created a ‘Basket’ containing both items and then returned the Basket, therefore allowing us to add other items too. I was initially against this approach as I felt we should just create a basket with all the items that it held rather than adding them one at a time.
Objects needed to define an ‘and’ method and a ‘totalCost’ method in order for them to be chained. Due to the dynamic nature of Groovy we could just pass in objects and assume they would respond to the appropriate methods.
We therefore ended up with code that looked a bit like this:
new Book(12.49).and(new Chocolate(1.89)).totalCost()
If we were working in a static language then we would have needed to create a small interface with the two methods on and then have our objects implement that. This seemed to fit in quite nicely with the Interface Segregation Principle where the idea is to define thin cohesive interfaces which don’t inadvertently couple their different clients together.
In a discussion with a few colleagues recently it seems that one way of looking at methods on objects in dynamic languages is that each method would be the equivalent in a static language of a one method interface which our object implements.
For next time
- I’ve found the dojos we’ve done using Groovy to be quite fun – everyone who participates has worked with Java so it doesn’t seem to be too much of a jump to use Groovy. Writing less code to achieve the same goal is also nice. We may well continue with Groovy next time!
Book Club: Hexagonal Architecture (Alistair Cockburn)
In our latest book club we discussed Alistair Cockburn’s Hexagonal Architecture which I first heard about around a year ago and was another of Dave Cameron‘s recommendations.
As I understand it, the article describes an architecture for our systems where the domain sits in the centre and other parts of the system depend on the domain while the domain doesn’t depend on anything concrete but is interacted with by various adapters.
These are some of my thoughts and our discussion of the article:
- It seems like the collection of adapters that Cockburn describes as interacting with the ‘application’ form lots of different anti corruption layers in Domain Driven Design language.
I think tools like Automapper and JSON.NET might be useful when writing some of these adaptors although Dave pointed out that we need to be careful that we’re not just copying every bit of data between different representations of our model otherwise we are indirectly creating the coupling that we intended to avoid.
- I was intrigued as to how rich user interfaces which have a lot of javascript in them would fit into the idea of mainly testing the application via the API and from our discussion we came to the conclusion that perhaps the javascript code would be an application by itself which server side code would interact with by using an adaptor.
This seems to lead towards an understanding of code as consisting of lots of different hexagons which interact with each other through pipes and filters.
- Dave described how designing our code according to the hexagonal architecture can help us avoid the zone of pain whereby we have lots of concrete classes inside a package and a lot of other packages depending on us. Scott Hanselman discusses this concept as part of a post on the different graphs & metrics NDepend provides.
From my understanding the idea seems to be to try not to have our application depending on unstable packages such as the data layer which we might decide to change and will have great difficulty in doing so if it is heavily coupled to our business code. Instead we should look to rely on an abstraction which sits inside the domain package and is implemented by one of the adaptors. I haven’t read the whole paper but it sounds quite similar to Uncle Bob’s Stable Dependencies Principle.
- I’m not sure whether these applications are following the hexagonal architecture but twitter, Google Maps and WordPress all have APIs which provide us with the ability to drive at least some part of their applications using adaptors that we need to write. This seems to be the way that quite a few applications are going which I imagine would influence the way that they organise their code in some way. In twitter’s case the external adaptors that drive their application are the main source of use.
Reading Code: Rhino Mocks
I spent a bit of time recently reading through some of the Rhino Mocks to get a basic understanding of how some features work under the hood.
As well as just getting some practice at reading unfamiliar code I also wanted to know the following:
- How does the ‘VerifyAllExpectations’ extension method work?
- What’s the difference between the ‘GenerateMock’ and ‘GenerateStub’ methods on MockRepository?
- How does the ‘AssertWasNotCalled’ extension method actually work?
These are some of the things I learnt from my exploration:
- I’m using a Mac so I originally started out trying to read the code in TextMate with the C# plugin before eventually realising that I couldn’t really tell the difference between a delegate being passed around the code and a method call so I wanted an editor that would help me out with this.
I decided to try out MonoDevelop to see if I could keep on reading the code outside my VM. Unfortunately I kept making that crash every time I tried to move between classes so back to Visual Studio it was! MonoDevelop looks like quite a nice tool but it just isn’t for me at the moment.
- I’ve been playing around with an idea adapted from Michael Feathers’ Working Effectively With Legacy Code by drawing out the different classes and how they interact with each other. Where I could see a grouping between classes I’ve been drawing that into the diagram as well.
Some of the guys on Twitter showed me a cool web based tool at yuml.me that allows you to easily draw class diagrams and then grab the png/jpeg file and do whatever you want with it.

Although these diagrams are quite simple I find them more useful than I had expected and I’ve started drawing more diagrams at work to help understand bits of code that I’m not very familiar with.
I realised a couple of years ago when reading one of Scott Young’s posts about drawing diagrams that I seem to understand ideas more quickly if I’m able to draw them out so I should probably look to do it more frequently!
- The way that stubs and mocks are generated is essentially the same which I’m told is also the case with Mockito although I haven’t read Mockito’s code yet.
‘GenerateMock’ eventually calls this bit of code:
public T DynamicMock<T>(params object[] argumentsForConstructor) where T : class { if (ShouldUseRemotingProxy(typeof(T), argumentsForConstructor)) return (T)RemotingMock(typeof(T), CreateDynamicRecordState); return (T)CreateMockObject(typeof(T), CreateDynamicRecordState, new Type[0], argumentsForConstructor); }
‘GenerateStub’ eventually calls this bit of code:
public object Stub(Type type, params object[] argumentsForConstructor) { CreateMockState createStub = mockedObject => new StubRecordMockState(mockedObject, this); if (ShouldUseRemotingProxy(type, argumentsForConstructor)) return RemotingMock(type, createStub); return CreateMockObject(type, createStub, new Type[0], argumentsForConstructor); }
The main difference is that when the ‘Verify’ method is called on stubs (which would call the ‘StubReplayMockState’ class) we don’t do anything whereas with mocks a check is done to ensure that all the expectations setup in the test are met.
I found it quite interesting that the new ‘Arrange-Act-Assert’ style syntax has been written to make use of the older ‘Record-Replay’ syntax which I guess is quite a nice metaphor for the two states that you use the framework.
I haven’t looked at the Moq code yet but it would be interesting to see how the code for that differs as it was built from the ground up with the ‘Arrange-Act-Assert’ syntax.
- The ‘AssertWasNotCalled’ method works fairly similarly to how I had imagined at a high level in that it goes and gets all the expectations for the mock for that method call and then checks that they aren’t called although I found the implementation of that first bit quite interesting.
private static ExpectationVerificationInformation GetExpectationsToVerify<T>(T mock, Action<T> action, Action<IMethodOptions<object>> setupConstraints) { // ... var mockToRecordExpectation = (T)mocks.DynamicMock(FindAppropriteType(mockedObject), mockedObject.ConstructorArguments); action(mockToRecordExpectation); // .. }
The code actually creates a mock in order to record the expectation that we are checking does not happen before it goes on to check which methods were called against that method. We therefore need to ensure that any calls to ‘AssertWasNotCalled’ are made after the call to the system under test otherwise it will always return true which may not be accurate.
- I didn’t realise quite how much code there is in Rhino Mocks and I’ve only read a very small part of it. A few of the interesting parts of the code seem to be making calls to Castle DynamicProxy which is being used to do the intercepting of method calls.
I’m never sure what the best way to approach a new code base is but this time I had a couple of starting points that I wanted to investigate and starting from those points seemed to work out quite well. I still find that I struggle to know when to stop digging down into how specific code works and when to just have a general understanding of that bit and move on to the next bit of code. I often find that I click all the way through to deeper methods and then I can’t remember why I did it in the first place.
F#: Playing around with asynchronous workflows
I spent a bit of time over the weekend playing around with F# asynchronous workflows and seeing how they could be used to launch Firefox windows asynchronously for my FeedBurner graph creator.
Initially I decided to try out the ‘Async.RunWithContinuations’ function which I recently read about on Matthew Podwysocki’s blog.
Matthew describes this as being a function which is useful for executing a single operation asynchronously and this worked out quite well for me as my application only has the ability to get one feed and then create a graph from its data.
I changed the Execute function (which takes in arguments from the command line as I wrote about previously) to launch a firefox window with the graph loaded:
let launchInFirefox url = async { System.Diagnostics.Process.Start(@"C:\Program Files\Mozilla Firefox\firefox.exe", url) |> ignore } let timeNow () = System.DateTime.Now.ToLongTimeString() let Execute args = if (Array.length args <> 3) then printfn "Expected usage: [feedName] [startDate yyyy-mm-dd] [endDate yyyy-mm-dd]" else let feedName, startDate, endDate = args.[0], args.[1], args.[2] let graphUri = (ShowFeedBurnerStats feedName startDate endDate).AbsoluteUri Async.RunWithContinuations ( (fun cont -> printfn "Downloaded feed graph for %s at %s" feedName (timeNow())), (fun ex -> printfn "Failed to download feed graph for %s - %s %s " feedName (ex.Message) (ex.StackTrace)), (fun cancelled -> printfn "Feed graph downloading for %s was cancelled" feedName), (launchInFirefox graphUri) )
The function actually takes in three continuations as well as the asynchronous computation to run:
- A continuation to run if the computation completes successfully
- An exception continuation to run if the computation throws an exception. I was able to test this out by trying to launch a process which did not exist
- A cancellation continuation to run if there is a signal for the computation to be cancelled
We then pass in the asynchronous computation as the last argument to the function which in this case is a process which launches FireFox with the url of the graph.
This works quite nicely but you don’t really notice that much different between launching the browser this way and just doing it using the ‘Async.RunSynchronously’ function.
It becomes a bit more interesting if we try to execute more than one asynchronous computations which in this case means creating multiple graphs at the same time.
My first attempt was to launch each of these computations synchronously:
let CreateGraphs (feeds:seq<string>) = feeds |> Seq.map (fun f -> ShowFeedBurnerStats f "2009-03-01" "2009-07-25") |> Seq.map (fun uri -> launchInFirefox uri.AbsoluteUri ) |> Seq.iter (Async.RunSynchronously)
I called this function like so:
let feeds = seq { yield "markneedham"; yield "scotthanselman"; yield "codethinked"; yield "haacked"; yield "Iserializable" }; CreateGraphs feeds
That works fine although there is a noticeable pause as each of these is loaded into the browser one after the other.
One way to get around this is to make use of the ‘Async.Parallel’ function which converts a sequence of asynchronous computations into a single asynchronous computation which can execute all of the individual asynchronous computations.
I initially got confused here and thought that passing a sequence of asynchronous computations to ‘Async.Parallel’ actually executed them but as I learnt you actually need to pass the result to one of the functions which actually runs the asynchronous computations.
We don’t need to change our function too much to achieve this:
let CreateGraphsParallel (feeds:seq<string>) = feeds |> Seq.map (fun f -> ShowFeedBurnerStats f "2009-03-01" "2009-07-25") |> Seq.map (fun uri -> launchInFirefox uri.AbsoluteUri ) |> Async.Parallel |> Async.RunSynchronously
let feeds = seq { yield "markneedham"; yield "scotthanselman"; yield "codethinked"; yield "haacked"; yield "Iserializable" }; CreateGraphsParallel feeds
The graphs seem to get launched in FireFox much more quickly using this method and there is no real pause, just a flurry of new tabs being launched as each of the graphs is opened.
I thought the ‘Async.Start’ function might allow us to achieve a similar result as the API comments state ‘Start the asynchronous computation in the thread pool. Do not await its result’ but I saw similar behaviour to when I used ‘Async.RunSynchronously’
let CreateGraphsSpawn (feeds:seq<string>) = feeds |> Seq.map (fun f -> ShowFeedBurnerStats f "2009-03-01" "2009-07-25") |> Seq.map (fun uri -> launchInFirefox uri.AbsoluteUri ) |> Seq.iter(Async.Start)
let feeds = seq { yield "markneedham"; yield "scotthanselman"; yield "codethinked"; yield "haacked"; yield "Iserializable" }; CreateGraphsSpawn feeds
Out of these ‘Async.Parallel’ seems to be the best function to use to get quick feedback from multiple computations.
There are also a few other functions that I haven’t tried out yet and I’m intrigued as to whether we would achieve good results by making use of MailBoxProcessors or not.
F#: Values, functions and DateTime
One of the things I’ve noticed recently in my playing around with F# is that when we decide to wrap calls to the .NET DateTime methods there is a need to be quite careful that we are wrapping those calls with an F# function and not an F# value.
If we don’t do this then the DateTime method will only be evaluated once and then return the same value for every call which is probably not the behaviour we’re looking for.
The following shows how we could wrap call to get the current time in string format inside a value:
let timeNow = System.DateTime.Now.ToLongTimeString()
If we then execute ‘timeNow’ to show the current time before and after a sleep this is what we see:
printfn "The time now is %s" timeNow System.Threading.Thread.Sleep(2000) printfn "The time now is %s" timeNow
The time now is 2:00:29 PM The time now is 2:00:29 PM
As we can see the time has remained the same despite the fact that we put a sleep in between thew two calls.
Looking at the C# version of this code via Reflector we can see that ‘timeNow’ is just a string:
public string timeNow;
The way to get the real current time is to define a function to get the time so that it will be reevaluated each time we ask for the time:
let timeNowUpToDate () = System.DateTime.Now.ToLongTimeString()
If we do the same test as we did above:
printfn "The time now is %s" (timeNowUpToDate()) System.Threading.Thread.Sleep(2000) printfn "The time now is %s" (timeNowUpToDate())
We get the following results:
The time now is 2:05:13 PM The time now is 2:05:15 PM
Which is what we were looking for in the first place!
Via Reflector again this is what that code would look like in C#:
[Serializable] internal class timeNowUpToDate@127 : FastFunc<Unit, string> { // Methods internal timeNowUpToDate@127() { } public override string Invoke(Unit unitVar0) { return DateTime.Now.ToLongTimeString(); } }
As we can see every time the function is invoked a call to the DateTime API will be made which is what we want to happen.
Cruise Agents: Reducing ‘random’ build failures
As I mentioned previously we’re making use of multiple cruise agents in our build to allow us to run our acceptance tests in parallel, therefore allowing a build which would be nearly 2 hours if run in sequence to be completed in around 10 minutes.
Early on with this approach we were getting a lot of failures in our builds which weren’t directly related to the code being changed and were more to do with the various dependencies we were making use of.
We needed these dependencies but we were only finding out if they were actually setup correctly much later on instead of doing an environment check first.
One example of this was the Shibboleth daemon which needed to be started in order for our tests to be able to login to the single sign on system.
We have it running as a Windows service and despite the fact we had it setup to ‘auto start’ whenever the operating system was running we often had failures where it had just failed to start with no trace in any of the error logs as to why that was the case.
The way we have currently got around this problem is by writing a custom nant task which checks if the service is running and if not then starts it up by making use of the ‘ServiceController‘ class.
[TaskName ("startShibboleth")] public class StartShibbolethDaemon : Task { protected override void ExecuteTask() { var shibbolethDaemon = new ServiceController("shibbolethServiceName"); if(shibbolethDaemon.Status == ServiceControllerStatus.Running) { Project.Log(Level.Info, "Shibboleth already running"); } else { shibbolethDaemon.Start(); Project.Log(Level.Info, "Shibboleth started"); } } }
We can then reference that task in our build before we run any tests which rely on it. We decided to write a custom task instead of using the built in servicecontroller task so that we could record whether or not it was already running and I couldn’t see a way to do that with the built in task.
Another quite common trend of build failures came about when our tests tried to connect to selenium server and were unable to do so because it wasn’t currently running – we have a batch file to manually start it up on our agents as we are running the build from a Windows service which means we can’t directly start processes from the file system the way we can when we run the build locally.
The original way we got around this problem was to add the selenium server startup script to the ‘StartUp’ folder of the ‘Administrator’ user on each of the cruise agents.
While this works fine when a user is actually logged into the virtual machine that hosts the agent we noticed that quite often an agent would register itself with the cruise server but we had forgotten to login on that virtual machine and any builds assigned to it would fail.
My colleague came up with quite a neat way of getting around this problem by automatically logging in the cruise agents which can be done by adding the following registry entry by putting the following in a file with the extension ‘reg’ and then double clicking it.
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon] "AutoAdminLogon"="1" "DefaultPassword"="password"
I think the key for us here was ensuring that the build was pretty much self contained and able to get itself into a working state if its dependencies aren’t setup correctly.
We spent a bit of time tracking the ‘random’ failures and working out which ones were addressable and which ones we needed to gather more information on. We certainly have some work to do in this area but we’ve managed to sort out some of the persistent offenders!
It’s really easy to just ignore the supposedly random failures and just re-run the build but it doesn’t really solve the underlying problem. We’ve found it’s quite rare that there are ever actually random failures in the build, just failures that we don’t know enough about yet.
Wrapping collections: Inheritance vs Composition
I wrote previously about the differences between wrapping collections and just creating extension methods to make our use of collections in the code base more descriptive but I’ve noticed in code I’ve been reading recently that there appear to be two ways of wrapping the collection – using composition as I described previously but also extending the collection by using inheritance.
I was discussing this with Lu Ning recently and he pointed out that if what we have is actually a collection then it might make more sense to extend the collection with a custom class whereas if the collection is just an implementation detail of some other domain concept then it would be better to use composition.
In the latter case we probably don’t want to expose all of the methods available on the collection since we don’t want to make it possible for clients of the object to perform all of these operations.
We have both of these approaches in our code base – in the case where we have used inheritance we are extending a ‘SelectList’ from the .NET API to be a ‘PleaseSelectList’ so that it will add the value ‘please select’ to the top of drop down lists that we create on our UI instructing the user to select an option.
In this case I think we really do have a ‘SelectList’ so I’m not too bothered that we’re using inheritance instead of composition – in general though my preference is to use composition because I’ve found that most of the time we don’t actually want to expose all the methods available on a collection API when we pass this data around our code.
For example a fairly common scenario might be that we load up a collection of values from a persistence mechanism and then maybe do some querying on them before displaying something to the user.
I’ve noticed that it’s quite rare that we would actually want to allow any clients of this code to have the ability to remove an item from this collection but this is one of the methods that would typically be exposed if we decided to pass around a ‘List’ which is often the case.
Even if it’s not the case we can still convert an ‘IEnumerable’ value into a ‘List’ and then do whatever we want to it unless the collection had been defined as being ‘read only‘ in which case you now have an API which is potentially misleading.
I’m finding myself moving towards the opinion that it only makes sense to create a new type if we actually get some added value in terms of expressing the intent of our code more easily by doing so and a lot of the time it seems that the added value that comes by extending a collection so that we have our own named type doesn’t seem to provide a lot of value. In addition, it can actually be quite painful later on if we decide we want to change the way we represent that data.
I think using the inheritance option is often the short term quickest choice since we can just make use of all the methods that already exist on the collection being extended instead of having to write code to delegate to those methods which is the case if we use composition.
It does seem to be a fine line between using inheritance for good and just using it because then you don’t have to spend a lot of time thinking about the best solution to the problem you’re trying to solve.
Perhaps the choice between the way that we choose to do this comes down to analysing the trade offs between using composition and inheritance as Phil Haack points out in a post he wrote a couple of years ago.
Good Lazy and Bad Lazy
One of the things I remember picking up from reading The Pragmatic Programmer is that developers need to be lazy in order to find better ways to solve problems and I came across a post by Philipp Lensson from a few years ago where he also suggests good developers are lazy and dumb.
Something which I’ve come to realise more recently is that it’s not necessarily true that being lazy as a developer is always a good thing – it depends in what way you are being lazy because there are certainly good and bad ways in which you can express your laziness!
I think bad laziness is often linked to the path of least resistance and is where we just take the easiest route to solving our problem without necessarily considering whether that solution fits in with the way the code is being written or the problems that we might have later as a result of our approach.
I’ve noticed (by doing most of them!) that there are some fairly common ways that we can fall into this trap:
- Adding setters to a class when we have some extra data that we want to put inside that class instead of taking the time to either change the constructor to take in the new data or considering if we now need to create a new class to better represent the system. The problem here is that we end up with objects that may be half initialised which makes them really difficult to work with later on.
- Wanting to change some code which has problems with it and instead of following the boy scout rule and leaving it in a better state than we found it in we hack in our fix and then get out of there as quickly as possible. A colleague of mine likens this to fixing a broken window with duct tape instead of properly fixing the underlying problem.
- Copying and pasting test code instead of writing each one individually and noticing any duplication from this process and then taking steps to remove this. I’m still unsure what the best way is to write tests which are both readable and remove duplication but copying and pasting entire tests is certainly not the way to go.
- Writing tests which make use of expectations and then not calling the ‘Verify’ method to ensure that those expectations are called – the test is now not even testing what it’s supposed to be testing
I’m sure there are more but these are just some of the ones that come to mind.
In most of these cases the laziness actually comes from not really spending a lot of time thinking about what we are actually trying to do – we just picked the simplest approach that came to mind.
On the other hand if we take the time to think when we have problems to solve we can still come up with ways to be lazy but do so in a positive way.
I think the key to good lazy is that we get longer term benefits from this laziness as compared to bad lazy where we might get some immediate benefits but will probably suffer later on as a result of that.
- Automating the startup of a service that our build depends on because we don’t want to have to remember to keep turning it on by ourself manually.
- Extracting common code out into methods so that we don’t have to keep duplicating the same code. This approach is also useful for helping to reduce the complexity we need to deal with in each method which allows us pay less attention when browsing the code.
- Extracting small classes when we see a class getting too big so that we can test the logic more easily without having to write dozens of lines of code just to get the class into a state where assertions can be made against it.
- Writing automated tests to get quick feedback so that we don’t have to launch the application and then click through all the screens to get to the place that we want to test.
Again I’m sure there are more of these and I’m still striving to make sure that when I’m lazy it’s in the second category rather than the first.
Coding: Quick feedback
One of the most important things to achieve if we are to get any sort of productivity when writing code is to find ways to get the quickest feedback possible.
My general default stance with respect to this has always been to TDD code although I’ve found when coding in F# that I’m not actually sure what the overall best way to get quick feedback is.
This is partly because I haven’t been able to find a way to run tests easily from inside Visual Studio but also partly because even when you do this the code for the whole project needs to be recompiled before the tests can be run which takes time.
I often just load functions into F# interactive and then manually test them in there with a couple of values. If the results are what I expect then I move on.
By doing this I’m getting feedback that the code works for the inputs that I’ve tried but I’m not getting feedback that I haven’t broken some old functionality with my changes which has happened a couple of times now.
I’m coming to the conclusion that when I really am just testing out a small function it is quite valuable to be able to make use of F# interactive to play around with this code and quickly know if it is working but once I start composing several different functions together in the REPL and end up in trial and error mode then it might be more effective to write a unit test to help out.
There are certainly other occasions where we can get quicker feedback by other means, and my colleague Pat Kua wrote a post last year where he pointed out that you don’t always need to use TDD – it depends what type of code you’re writing.
When you’re spiking or working in experimentation mode as he refers to it TDD won’t be as effective and if you try to use TDD you will probably end up very frustrated because it will slow down your ability to get the type of feedback that you want.
When we are spiking we are looking for feedback on whether the approach we are trying out will actually work and the quickest way to find this out is often just to hack some code together and not spend a great deal of time worrying about the quality of what we’re writing.
Kent Beck recently wrote about the trade off between the amount of time it takes to write an automated test and the feedback it gives you giving an example when developing JUnit Max of a time when his feedback cycle was quicker by not writing the test since it would take several hours to work out how to do so.
While Kent Beck probably has the experience to make this sort of trade off call fairly accurately others may need to be more careful about doing this.
When I first started coding my way of working out whether something worked was to launch the application and check that way.
That often does provide quicker immediate feedback than working out how to write a test but if we make a change to our code and want to know if it still works we need to launch the application again to find out. We frequently end up having to do this cycle multiple times since we always make mistakes and have to go back and correct these.
I think the current popularity of javascript heavy applications is leading us into the trap of thinking that we don’t need to write automated tests to cover this functionality since we can just verify it works by launching the application – sadly the number of bugs that creep in when we take this approach is quite substantial so any initial speed gains are cancelled out by the re-work we need to do later.
Another interesting area where we need to get feedback quickly is at the points at which we integrate with external systems.
An example of this on a project I worked on was that we had logic in our application which relied on the value returned from a service layer dependency.
We got caught out by this value changing and breaking our application so after this we put in an integration test which specifically tested for that value and therefore allowed us to know earlier on if there had been a breaking change made.
These are just some of the areas that I’ve come across where there are different ways that we can get feedback and we need to make trade offs to work out which is the best way for our given situation.
F#: Active patterns for parsing xml
I decided to spend some time doing some refactoring on the FeedBurner application that I started working on last week and the first area I worked on was cleaning up the way that the xml we get from FeedBurner is parsed.
While playing around with the application from the command line I realised that it didn’t actually cover error conditions – such as passing in an invalid feed name – very well and I thought this would be a good opportunity to make use of an active pattern to handle this.
I wanted to try and test drive this bit of code so my first idea was to try and call the active pattern directly from my test – I am testing using NUnit 2.5 which now allows us to create tests without the need for a class with a [TestFixture] attribute on:
[<Test>] let should_return_no_feed_given_invalid_xml () = let feedType = Xml.(|NoFeedFound|FeedBurnerFeed|) "invalid xml" // other code
let (|NoFeedFound|FeedBurnerFeed|) xml = NoFeedFound()
The problem I ran into with this approach is that the value of feedType when this test ran was ‘Microsoft.FSharp.Core.Choice`2+_Choice1Of2′ and I couldn’t see a way to access this at compile time in order to assert against it. Either way a test asserting that the return value was ‘Choice1Of2′ doesn’t seem to be the most expressive test anyway.
I chatted with about this a bit with Dave and he suggested that it would probably be easier to test the active pattern via the function while actually makes use of it.
I ended up with the following three tests:
open FeedBurnerService [<Test>] let should_throw_exception_if_feed_xml_is_invalid () = Assert.Throws<FailureException>(fun () -> FeedBurnerService.Parse "some broken xml" |> ignore) |> ignore [<Test>] let should_throw_exception_if_no_feed_found () = let feedXml = @"<?xml version=""1.0"" encoding=""utf-8"" ?> <rsp stat=""fail""> <err code=""1"" msg=""Feed Not Found"" /> </rsp>" Assert.Throws<FailureException>((fun () -> FeedBurnerService.Parse feedXml |> ignore), "Failed to process feed: Feed Not Found") |> ignore [<Test>] let should_retrieve_circulation_and_date_if_valid_xml () = let feedXml = @"<?xml version=""1.0"" encoding=""UTF-8""?> <rsp stat=""ok""> <feed id=""tdv0bg210cr731gc3nssn512cg"" uri=""MarkNeedham""> <entry date=""2009-07-16"" circulation=""630"" hits=""1389"" reach=""629"" /> </feed> </rsp>" let feedBurnerApi = FeedBurnerService.Parse feedXml let entry = feedBurnerApi |> Entries |> Seq.hd Assert.AreEqual(entry.Circulation, 630) Assert.AreEqual(entry.Date, "2009-07-16")
The interesting thing here is that the ‘Assert.Throws’ method takes in a C# delegate so we need to wrap the call to ‘FeedBurnerService.Parse’ inside a function. As with xUnit.NET’s equivalent method we need to ignore the results of the function call in these tests.
module FeedBurnerService = open System.Xml.Linq open System let GetDescendants element (xDocument:XDocument) = xDocument.Descendants(xName element) let GetAttribute element (xElement:XElement) = xElement.Attribute(xName element) type FeedBurnerApi(entries:seq<Entry>) = member x.Entries = entries and Entry(date : string, circulation : int) = member x.Date = date member x.Circulation = circulation let Entries (feedBurnerApi:FeedBurnerApi) = feedBurnerApi.Entries let (|NoFeedFound|FeedBurnerFeed|) xml = try let document = xml |> XDocument.Parse let entries = document |> GetDescendants "entry" |> Seq.map (fun element -> GetAttribute "circulation" element, GetAttribute "date" element) |> Seq.map (fun attribute -> new Entry(circulation = Int32.Parse((fst attribute).Value), date = (snd attribute).Value) ) match Seq.length entries with | 0 -> NoFeedFound((document |> GetDescendants "err" |> Seq.hd |> GetAttribute "msg").Value) | _ -> FeedBurnerFeed(new FeedBurnerApi(entries)) with | :? System.Xml.XmlException as ex -> NoFeedFound(ex.Message) let Parse xml = match xml with | NoFeedFound(error) -> failwith ("Failed to process feed: " + error) | FeedBurnerFeed(entries) -> entries
I continued using the idea of creating F# functions to wrap C# style method calls with the ‘Entries’ function which delegates to the ‘Entries’ property on ‘FeedBurnerApi’ which reduces the need to store intermediate state. I probably could have done the same for the ‘Date’ and ‘Circulation’ properties although I couldn’t see a significant improvement in the readability of the code by doing this.
I have also made use of the ‘and’ keyword to define the ‘Entry’ type because it is referenced by the ‘FeedBurnerApi’ type and therefore needs to be defined at that stage. The other way to ensure this was the case would be to define ‘Entry’ before ‘FeedBurnerApi’ although this doesn’t seem to read as nicely to me.
We are making use of a multi case active pattern in the code which means that the input we are processing with the active pattern can be split into two different things in this case. Don Syme goes into more detail on the different types of active patterns in his paper and Chris Smith also covers them in his post.
The code for the active pattern feels a bit too imperative at the moment although I wasn’t sure of the best way to cover the different scenarios without writing it this way – no doubt there is a more functional way to do this but I can’t see it yet.
Making use of the active pattern in the code has made it much easier to work with than passing around a sequence of tuples as I was doing previously. It has also made it easy to exit from the program early if there is a problem with the data inputted.