Archive for April, 2010
Coding: Generalising too early
I’ve previously written about the value of adding duplication to code before removing it and we had an interesting situation this week where we failed to do that and ended up generalising a piece of code too early to the point where it actually didn’t solve the problem anymore.
The problem we were trying to solve was around the validation of some dependent fields and to start with we had this requirement:
Given 2 fields... Field 1 Field 2 - Can be empty if Field 1 is 'Foo'. Otherwise must have a value.
We wrote an object that looked a bit like this to do that logic:
public class DependentOnField1 : IValidateFields { public bool IsSatisfied(Field field) { if(OtherFieldIsFoo()) { return true; } return string.IsNullOrEmpty(field.Value); } }
Later on we had the following requirement:
Given 2 more fields... Field 3 Field 4 - Must have a value of more than 0 if Field 3 is 'Bar'. Otherwise can be empty.
It seemed at first glance that they were the same type of validation because in both cases there is talk of dependence between fields.
We started refactoring the ‘DependentOnField1′ object into a state where we would be able to use it for both cases. This meant that we needed to parameterise the object so that we could vary the dependent field.
We eventually ended up with an interface that could be called like this:
var field1DependenceValidator = DependentOn.Field(field1);
public class DependentOn : IValidateFields { public static DependentOn Field(Field field) { return new DependentOn(field); } }
Unfortunately while refactoring this object we hadn’t taken into account the fact that the requirement for fields 3 and 4 is slightly different to that for fields 1 and 2. With fields 1 & 2 we only needed one condition to be true whereas with fields 3 and 4 two conditions needed to be true – field 3 must be ‘Bar’ AND field 4 must be greater than 0.
We initially tried to hack in the fields 3/4 requirement like this:
public class DependentOn : IValidateFields { public bool IsSatisfied(Field field) { if(OtherFieldIsX() && Double.Parse(field.Value) > 0) { return true; } return string.IsNullOrEmpty(field.Value); } }
Unfortunately that breaks the way that validation works for fields 1 & 2 and every other attempt we made seemed to make the situation worse and worse until after about half an hour of wrestling with this we decided to revert everything and just create a new object for this new type of validation.
We would accept some duplication until we were able to see a more meaningful abstraction that we could pull out.
Dave Cameron, who first taught me this approach, recently wrote a post describing how he removed some duplication in JavaScript code and you can see from the initial example in his post that he waited until he was exactly sure where the duplication was before getting rid of it.
I think this is the best approach and the rule of thumb seems to be that we should wait until we have the same thing 3 times before trying to remove the duplication.
QTB: thetrainline.com – ‘Scale at speed’
About 18 months on from the first ThoughtWorks QTB that I saw about offshoring, on Wednesday night I attended the latest QTB in Manchester titled ‘thetrainline.com – Scale at speed‘.
The presenters were thetrainline.com’s CIO David Jack and the Managing Director of ThoughtWorks India, Mahesh Baxi.
They took us on the journey that thetrainline.com have taken while working with ThoughtWorks to re-architect part of their system to allow them to quickly deliver new functionality on the 2,500 websites that their portal technology powers.
These were some of the key points that stood out for me:
- Early on in the talk David described how they needed to make a decision about whether to rewrite the system from scratch or build on top of the current one and strategically rewrite parts of it. The latter option was the chosen one and others seem to be in agreement that this is the best way to approach the problem.
The argument for doing this that I’ve often heard is that you end up playing catch up with the old system and eventually end up in the situation where you need to re-write the re-write but in this case David suggested that re-writing the whole system was not an option for the business.
- David described the early trainline approach to development where it sounded like the main goal was to get features out to market as quickly as possible. He spoke about the ‘first mover advantage’ whereby if you’re the only product available to customers then the quality of your offering isn’t the most important aspect since you are the only option.
James Shore describes this as ‘voluntary technical debt‘ and I think we need to at least consider whether it’s more important to get quick feedback on our work by getting it into production quickly and perhaps taking some shortcuts along the way. David was quick to point out that at some stage you do need to address that technical debt otherwise you’ll end up in a world of pain.
I’m not sure how exactly you know where to draw the line with respect to this because it would be very easy to just get into a hacking culture until you get to the stage where it’s almost impossible to add any new features to the code base.
- A bit of time was spent talking about various metrics which were used including looking at the cost per story point delivered. They didn’t go into further detail on that but it seems like it’s the dangerous area of metrics which are easy to game that Dan North talks about in his ‘Our obsession with efficiency‘ presentation.
I’ve not really come across a good low level metric which allows you to accurately tell how well a team is doing. The best measure seems to be just looking at whether a team is consistently delivering valuable features to their users. That’s often considered a bit wishy washy so it’d be interesting to know if anyone has a better way to measure performance that works well.
- They both spoke of the need to ensure effective communication between 13 teams operating across 3 different locations (Bangalore, Pune and London) and suggested that rotating people and making extensive use of video conferencing allowed this to be achieved.
The rotation included having subject matter experts from thetrainline.com going to India and working with the teams and David suggested that you still need to have a big commitment to a project if you want it to be successful even if it is being done offshore.
- In his summary David emphasised the importance of releasing frequently into production. In a lot of organisations releasing into production ends up becoming a big deal and so it happens less and less frequently but David suggested that thetrainline.com are releasing new features every 6 weeks and are aiming to get that down to every fortnight.
I guess the ultimate is to have continuous delivery as described by Timothy Fitz in his blog post.
Overall this was quite an interesting talk to attend and gave me a bit more insight into some of the similar and different challenges that are faced by projects operating across multiple time zones.
Listening to your tests: An example
I was recently reading a blog post by Esko Luontola where he talks about the direct and indirect effects of TDD and one particularly interesting point he makes is that driving our code with a TDD approach helps to amplify the problems caused by writing bad code.
if the code is not maintainable, it will be hard to change. Also if the code is not testable, it will be hard to write tests for it. If the code would not be written iteratively and no tests would be written for it, life would be much easier to the developer. In other words, TDD increases the pain caused by bad code, because it’s not possible to avoid changing the code, nor avoid writing tests for it.
This is something Steve Freeman and Nat Pryce talk about in their book and we recently had an example of this which really stemmed from a failure to drive this particular piece of code from the outside in.
We’d reached the stage where one object was taking in 8 different parameters in the constructor and then only used 2 of them in any one path through the code.
The code was pretty bad but it was even more noticeable how much of a mess we’d made when we had to write a new test.
This is a rough example of what the test fixture had begun to look like:
[TestFixture] public class BadObjectTests { private double parameter1 = 0.10; private double parameter2 = 0.20; private double parameter3 = 0.30 [Test] public void ATestGoneABitWrong() { var badObject = CreateBadObject(); var result = badObject.CalculateSomething() Assert.That(result, Is.EqualTo(parameter1)); } private BadObject CreateBadObject() { return new BadObject(parameter1, parameter2, parameter3...); } }
As a general guideline it’s good if we’re able to keep the context of a test all in one place but since the ‘BadObject’ had become increasingly difficult to construct we’d pulled that out into another method and extracted all its parameters as fields in the test fixture.
The alternative was to have all 8 parameters created in each test and then construct the ‘BadObject’ each time and in retrospect that would actually have made it more obvious that we were doing something wrong.
With that second approach we would undoubtably start copy/pasting tests which would provide another signal that we need to look at the design of the code and make it easier to test.
In this case the solution was to create several smaller objects which actually used all the parameters being passed in. I think we ended up with around 4 objects instead of 1 and each had simple tests that were easy to write.
Small step refactoring: Overload constructor
I’ve previously written about some approaches that I’ve been taught with respect to taking small steps when refactoring code and another approach which a couple of colleagues have been using recently is the idea of overloading the constructor when refactoring objects.
On a couple of occasions we’ve been trying to completely change the way an object was designed and changing the current constructor would mean that we’d have to change all the tests against that object before checking if the new design was actually going to work or not.
Given a simple example where we want to change the following object to take in a completely different type and not use ‘field1′ or ‘field2′ anymore:
public class Foo { public Foo(string field1, int field2) { // and so on } }
One approach would be to change that constructor to take in the extra parameter and then gradually phase out the other parameters:
public class Foo { public Foo(string field1, int field2, Bar bar) { // and so on } }
An alternative is to create another constructor and leave the old one as it is before changing the tests one by one to make use of the new approach.
public class Foo { public Foo(string field1, int field2) { // we gradually phase this one out } public(Bar bar) { // the new constructor } }
Even if we want to still use ‘field1′ and ‘field2′ we can still make use of this approach and just default the extra value until we’ve migrated our tests across.
public class Foo { public Foo(string field1, int field2, Bar bar) { // and so on } public Foo(string field1, int field2) : this(field1, field2, new Bar()) { } }
I quite like this approach as it allows us to keep the code compiling while we’re making changes to it to improve the design.
The main thing to remember with this approach is not to keep the old approach around for too long and to make sure we move our code across to use the new approach as quickly as possible otherwise it can become very confusing for other pairs which come across the code in this half way state.
Iron Ruby: ‘unitialized constant…NameError’
I’ve been playing around a bit with Iron Ruby and cucumber following Rupak Ganguly’s tutorial and I tried to change the .NET example provided in the 0.4.2 release of cucumber to call a class wrapping Castle’s WindsorContainer.
The feature file now looks like this:
# 'MyAssembly.dll' is in the 'C:/Ruby/lib/ruby/gems/1.8/gems/cucumber-0.6.4/examples/cs' folder require 'MyAssembly' ... Before do @container = Our::Namespace::OurContainer.new.Container end
The class is defined roughly like this:
public class OurContainer : IContainerAccessor { private WindsorContainer container = new WindsorContainer(); public SwintonContainer() { container.RegisterControllers(Assembly.GetExecutingAssembly()).AddComponent<IFoo, Foo>(); // and so on } public IWindsorContainer Container { get { return container; } } }
When I tried to run the feature like so:
icucumber features
I kept getting the following error:
uninitialized constant Our::Namespace::OurContainer (NameError) C:/Ruby/lib/ruby/gems/1.8/gems/cucumber-0.6.4/examples/cs/features/step_definitons/calculator_steps.rb:13:in `Before'
I’ve come across a few posts where people described the same error and they all suggested that IronRuby was unable to find the class that I was trying to call in the code.
I decided to try calling another class from the assembly to see if that was the problem but that worked fine so there wasn’t a problem with locating the class.
Somewhat by coincidence I was looking at the assembly again in Reflector and tried to look at the constructor of the ‘OurContainer’ class and was asked to give the location of the ‘Castle.Windsor’ assembly which it uses internally.
I didn’t have that assembly or any of its dependencies in the ‘C:/Ruby/lib/ruby/gems/1.8/gems/cucumber-0.6.4/examples/cs’ folder but once I’d included those it all worked fine again!
Haskell: parse error on input `=’
I’ve been trying to follow the ‘Monads for Java/C++ programmers‘ post in ghci and getting the following type of error when trying out the code snippets:
Prelude> a = 3 <interactive>:1:2: parse error on input `='
I figured there must be something wrong with my installation of the compiler since I was copying and pasting the example across and having this problem. Having reinstalled that, however, I still had the same problem.
I eventually came across this blog post which points to a mailing list thread from a few years ago where pjd explains that the ‘let’ construct is required when defining a variable from ghci and wouldn’t necessarily be needed in a normal program:
pjd osfameron: about the ghci thing, you have to prefix definitions with “let”
as in: let simple x y z = x * (y + z)
pjd the reason for this is that ghci is in an implicit do block
pjd so it’s not exactly like top-level haskell
We have to use a ‘let’ in front of any variable/function definitions and then it will work as expected:
Prelude> let a = 3 3
According to Real World Haskell:
This syntax is ghci-specific
The syntax for let that ghci accepts is not the same as we would use at the “top level” of a normal Haskell program.
Lured in by the complexity
We recently ran into an interesting problem when running the website we’re building on our ‘user replica machine’ where you can access the application via a web browser running on Citrix.
The problem we were having was that the result of a post redirect get request that we were making via the jQuery Form plugin was failing to update the fragment of the page correctly. It looked like it was replacing it with the original HTML.
We’re running on Internet Explorer 6 on that machine but it was working fine on that browser on a couple of our development machines.
We decided to temporarily change the code to not do a post request get and instead just to return the new page straight from the POST. Everything updated correctly using that approach.
Having come up with all sorts of grand theories on how the problem was somehow related to the fact that we were running through Citrix we somewhat came to/were somewhat guided on the internal mailing list by Ian Robinson that the page was probably just getting cached, either by WinINet or the corporate proxy, because we hadn’t put a ‘no-store’ directive in the headers of the response.
no-store
The purpose of the no-store directive is to prevent the
inadvertent release or retention of sensitive information (for
example, on backup tapes). The no-store directive applies to the
entire message, and MAY be sent either in a response or in a
request. If sent in a request, a cache MUST NOT store any part of
either this request or any response to it. If sent in a response,
a cache MUST NOT store any part of either this response or the
request that elicited it. This directive applies to both non-
shared and shared caches. “MUST NOT store” in this context means
that the cache MUST NOT intentionally store the information in
non-volatile storage, and MUST make a best-effort attempt to
remove the information from volatile storage as promptly as
possible after forwarding it.Even when this directive is associated with a response, users
might explicitly store such a response outside of the caching
system (e.g., with a “Save As” dialog). History buffers MAY store
such responses as part of their normal operation.
JKG posted some code defining an attribute that we can place on actions that we don’t want to be cached in an ASP.NET MVC application:
public class NoCache : ActionFilterAttribute, IActionFilter { public override void OnResultExecuting(ResultExecutingContext filterContext) { filterContext.HttpContext.Response.Cache.SetExpires(DateTime.UtcNow.AddDays(-1)); filterContext.HttpContext.Response.Cache.SetValidUntilExpires(false); filterContext.HttpContext.Response.Cache.SetRevalidation(HttpCacheRevalidation.AllCaches); filterContext.HttpContext.Response.Cache.SetCacheability(HttpCacheability.NoCache); filterContext.HttpContext.Response.Cache.SetNoStore(); base.OnResultExecuting(filterContext); } public override void OnResultExecuted(ResultExecutedContext filterContext) { base.OnResultExecuted(filterContext); } }
We put that code in and the page started refreshing correctly.
Chris Leishman pointed out that another approach would be to make use of ‘ETags‘ on the response header so that the page would only be retrieved from the server if it had actually changed.
Either way we found it amusing that we automatically assumed something complicated was going wrong when actually it was anything but!
Functional C#: An imperative to declarative example
I wrote previously about how we’ve been working on some calculations on my current project and one thing we’ve been trying to do is write this code in a fairly declarative way.
Since we’ve been test driving the code it initially started off being quite imperative and looked a bit like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | public class TheCalculator { ... public double CalculateFrom(UserData userData) { return Calculation1(userData) + Calculation2(userData) + Calculation3(userData); } public double Calculation1(UserData userData) { // do calculation stuff here } public double Calculation2(UserData userData) { // do calculation stuff here } ... } |
What we have on line 7 is a series of calculations which we can put in a collection and then sum together:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | public class TheCalculator { ... public double CalculateFrom(UserData userData) { var calculations = new Func<UserData, double>[] { Calculation1, Calculation2, Calculation3 }; return calculations.Sum(calculation => calculation(userData)); } public double Calculation1(UserData userData) { // do calculation stuff here } ... } |
We can pull out a ‘Calculation’ delegate to make that a bit more readable:
public class TheCalculator { private delegate double Calculation(UserData userData); public double CalculateFrom(UserData userData) { var calculations = new Calculation[] { Calculation1, Calculation2, Calculation3 }; return calculations.Sum(calculation => calculation(userData)); } ... }
One of the cool things about structuring the code like this is that if we want to add a new Calculation we can just go to the end of the array, type in the name of the method and then Resharper will create it for us with the proper signature.
We eventually came across some calculations which needed to be subtracted from the other ones, which seems like quite an imperative thing to do!
Luckily Christian saw a way to wrap these calculations in a ‘Subtract’ function so that we could stay in declarative land:
public class TheCalculator { private delegate double Calculation(UserData userData); public double CalculateFrom(UserData userData) { var calculations = new [] { Calculation1, Calculation2, Calculation3, Subtract(Calculation4) }; return calculations.Sum(calculation => calculation(userData)); } ... public Calculation Subtract(Calculation calculation) { return userData => calculation(userData) * -1; } }
Having a method which explicitly has the ‘Calculation’ signature allows us to remove it from the array declarative which is pretty neat.
We can also change the method signature of ‘Subtract’ to take in a variable number of calculations if we need to:
public class TheCalculator { ... public double CalculateFrom(UserData userData) { var calculations = new [] { Calculation1, Calculation2, Calculation3, Subtract(Calculation4, Calculation5) }; return calculations.Sum(calculation => calculation(userData)); } public Calculation Subtract(params Calculation[] calculations) { return userData => calculations.Sum(calculation => calculation(userData)) * -1; } }
The other nice thing about coding it this way is that we ran into a problem where when we fed real data through the code we were getting the wrong values returned and we wanted to understand where it was falling down.
We could easily temporarily add in a ‘Console.WriteLine’ statement like this to help us out:
public class TheCalculator { ... public double CalculateFrom(UserData userData) { var calculations = new [] { Calculation1, Calculation2, Calculation3, Subtract(Calculation4, Calculation5) }; return calculations .Select(calculation => { Console.WriteLine(calculation.Method.Name + " = " + calculation(userData)); return calculation; }) .Sum(calculation => calculation(userData)); } ... }
It then printed the results down the page like so:
Calculation1: 23.34 Calculation2: 45.45 ...
Coding: Another outside in example
I’ve written before about my thoughts on outside in development and we came across another example last week where we made our life difficult by not initially following this approach.
The rough design of what we were working on looked like this:

My pair and I were working on the code to do the calculations and we deliberately chose not to drive the functionality from the UI because the other pair were reworking all our validation code and we didn’t want to step on each others toes.
We therefore started driving the code directly from the individual calculations and decided to create an object to represent each of the types of calculation. It worked quite well for all but one of the calculation classes, ‘Calculation3, which became increasingly complicated as we added more test cases.
At this stage we rotated pairs and still unable to see a way to simplify ‘Calculation3′ we decided to try and drive it from the object which would actually make use of it.
As Darren Hobbs pointed out in another discussion:
Don’t design an API. Write the code that you’d want to be able to write to use your API.
In this case it became clear that the missing object which could drive out the Calculation objects was a ‘Calculator’:
We started writing tests for the calculator and initially it didn’t seem to make that much difference – the design of nearly all the individual ‘Calculation’ objects remained the same.
However, once we got to the tests which would drive out the functionality currently in ‘Calculation3′ it became clear that we had actually got 3 objects inside 1 and that what we really needed in this case was an orchestrating class which could delegate down to small objects to do the calculations.

The most interesting thing about this situation for me is that I often do drive code from the outside in and I’ve even previously written about the benefits of doing so but in this context I got it wrong and it was a slightly painful lesson!
Late integration: Some thoughts
John Daniels has an interesting post summarising GOOSgaggle, an event run a few weeks ago where people met up to talk about the ideas in ‘Growing Object Oriented Software, Guided by Tests‘.
It’s an interesting post and towards the end he states the following:
Given these two compelling justifications for starting with end-to-end tests, why is it that many people apparently don’t start there? We came up with two possibilities, although there may be many others:
Starting with the domain model can provide an illusion of rapid progress. You can show business features working while ignoring the realities of the larger system environment. Clearly, this is not normally an approach that addresses the biggest risks first. But it’s an easy option and attractive when you’re under pressure.
For some reason the system environment is not available to you; perhaps, for example, the team creating the infrastructure is late delivering. So rather than taking the correct – and brave – option of loudly declaring progress on your project to be blocked, you restrict yourself to creating those parts of the system that are within your control.
My first thought on reading this was that it seems like a reasonable thing to say but what if we can’t do anything about the fact that it’s blocked?
I’ve previously worked on projects where we’ve had components that are integral to the whole application delivered late by another team and despite doing exactly as John suggests we’ve struggled to influence that situation.
In terms of systems thinking it might be said that we didn’t have sufficient leverage to change the system we were operating within to be the way that we wanted it.
Either way we were in the situation we could just sit there and stop doing anything or we could keep going and then accept that there would be some difficult late integration work later on.
We decided to go with the second option and we had exactly those illusions of ‘rapid progress’ until we actually had to integrate with those components.
However, we did made it clear that there was going to be a high cost with respect to re-work of the code from late integration. That was accepted as being a limitation of the system that we were working within and although the situation did improve slightly as time went on we never completely fixed it.
In an ideal world it would be good if just shouting loudly that you were blocked was enough to make everything fine but in reality it just becomes very repetitive and annoying!