Mark Needham

Thoughts on Software Development

Archive for the ‘Coding’ Category

Consistency in the code base and incremental refactoring

with 6 comments

I wrote a post a while ago about keeping consistency in the code base where I covered some of the reasons that you might want to rewrite parts of a code base and the potential impact of those changes but an interesting side to this discussion which I didn’t cover that much but which seems to play a big role is the role of incremental refactoring.

In our code base we recently realised that the naming of the fields in some parts of a form don’t really make sense and I wanted to start naming new fields with the new naming style and then go back and change the existing ones incrementally when it was a good time to do so.

Richard and Christian suggested that this wouldn’t be such a good approach because it would make the naming issue even more confusing until all the fields had been renamed.

In order to avoid this problem we had to either go and change every single field to follow the new naming approach immediately or settle on the old names even though they might not be as descriptive.

Since doing the former would involve changing the names of around 15-20 fields across several different objects, in Hibernate mapping code, probably in the database, on HTML forms and in Watin tests we decided not to do that – the project is only for a short amount of time so the investment probably wouldn’t be worthwhile.

Although in this case it makes sense not to make the improvement it doesn’t strike me as being entirely satisfactory that we would need to make this type of change in a big bang fashion.

From my experience there are often insights into the code or improvements in the ubiquitous language as time goes on and while consistency is of course an important thing in any code base it’s not the only thing.

When do we decide that actually gradually moving to a better approach is worth the temporary pain that having this inconsistency will cause?

Written by Mark Needham

May 5th, 2010 at 10:34 pm

Posted in Coding,Incremental Refactoring

Tagged with

Coding: Make the mutation obvious

with 7 comments

Although I’m generally quite opposed to coding approaches whereby we mutate objects, sometimes the way a framework is designed seems to make this a preferable option.

We came across a situation like this last week when we wanted to hydrate an object with data coming back from the browser.

The signature of the action in question looked like this:

public class SomeController
{
	public ActionResult SomeAction(string id, UserData userData)
	{
	}

We were able to automatically bind most of the values onto ‘UserData’ except for the ‘id’ which was coming in from the URL.

We could have included the ‘id’ in a hidden field in the page to get around this problem but that seemed like a more complicated/hacky solution.

Instead what we needed to do is the following:

public class SomeController
{
	public ActionResult SomeAction(string id, UserData userData)
	{
		userData.Id = id.ParseIntoId();
	}

‘ParseIntoId’ is an extension method which converts a string representation of an ‘id’ into an object representation.

While this method is fine as it is, I find that as code gets added it can sometimes be less than obvious that we’ve actually mutated ‘UserData’ so I now prefer to explicitly call that out like so:

public class SomeController
{
	public ActionResult SomeAction(string id, UserData userData)
	{
		var userDataWithId = UpdateUserDataWithId(userData, id);
	}
 
	private UserData UpdateUserDataWithId(UserData userData, string id)
	{
		userData.Id = id.ParseIntoId();
		return userData;
	}

We would then want to use ‘userDataWithId’ after this point in the function.

Of course the original ‘UserData’ has still been mutated and it doesn’t seem worth the extra code that it would take to avoid this so we’ll just trust that anyone reading the code would realise that they should use ‘userDataWithId’ instead!

Written by Mark Needham

May 4th, 2010 at 6:32 pm

Posted in Coding

Tagged with

Coding: The Kestrel

with 4 comments

Reg Braithwaite has a cool series of posts where he covers the different combinators from Raymond Smullyan’s ‘To Mock a Mockingbird’ book and one of my favourites is the ‘Kestrel’ or ‘K Combinator’ which describes a function that returns a constant function.

It’s described like so:

Kxy = x

The Kestrel function would take in 2 arguments and return the value of the first one. The second argument would probably be a function that takes in the first argument and then performs some side effects with that value.

Braithwaite descirbes the ‘returning’ function from Rails as an example of this combinator whereby instead of writing code like this:

def registered_person(params = {})
  person = Person.new(params.merge(:registered => true))
  Registry.register(person)
  person.send_email_notification
  person
end

We can write this:

def registered_person(params = {})
  returning Person.new(params.merge(:registered => true)) do |person|
    Registry.register(person)
    person.send_email_notification
  end
end

i.e. we can group all the side effects together and it’s more obvious to the reader that we’re returning the value of ‘Person.new(…)’ from this method.

I’ve been writing a bit of code in F# to generate some test objects and I realised that I had code like this in a few places:

let build (t:Type) =
      let theType = Activator.CreateInstance(t)
      theType.GetType().GetProperties() |> Array.iter (fun p -> p.SetValue(t, createValueFor p, null))
      theType

We’re creating ‘theType’ and then mutating it straight away using reflection before returning the value.

We can implement a ‘returning’ function like so to simplify the code a little:

let returning t f = f(t); t
let build (t:Type) =
    returning (Activator.CreateInstance(t)) (fun t -> 
        t.GetType().GetProperties() |> Array.iter (fun p -> p.SetValue(t, createValueFor p, null)))

I think this is the same as what Martin Fowler refers to as a nested closure and it seems quite a neat way of encapsulating side effects and making the code a bit more expressive.

Written by Mark Needham

May 3rd, 2010 at 12:28 am

Posted in Coding

Tagged with

Coding: Generalising too early

with 3 comments

I’ve previously written about the value of adding duplication to code before removing it and we had an interesting situation this week where we failed to do that and ended up generalising a piece of code too early to the point where it actually didn’t solve the problem anymore.

The problem we were trying to solve was around the validation of some dependent fields and to start with we had this requirement:

Given 2 fields...
 
Field 1
Field 2 - Can be empty if Field 1 is 'Foo'. Otherwise must have a value.

We wrote an object that looked a bit like this to do that logic:

public class DependentOnField1 : IValidateFields
{
	public bool IsSatisfied(Field field)
	{
		if(OtherFieldIsFoo())
		{
			return true;
		}		
 
		return string.IsNullOrEmpty(field.Value);
	}
}

Later on we had the following requirement:

Given 2 more fields...
 
Field 3 
Field 4 - Must have a value of more than 0 if Field 3 is 'Bar'. Otherwise can be empty.

It seemed at first glance that they were the same type of validation because in both cases there is talk of dependence between fields.

We started refactoring the ‘DependentOnField1′ object into a state where we would be able to use it for both cases. This meant that we needed to parameterise the object so that we could vary the dependent field.

We eventually ended up with an interface that could be called like this:

var field1DependenceValidator = DependentOn.Field(field1);
public class DependentOn : IValidateFields
{
	public static DependentOn Field(Field field)
	{
		return new DependentOn(field);
	}
}

Unfortunately while refactoring this object we hadn’t taken into account the fact that the requirement for fields 3 and 4 is slightly different to that for fields 1 and 2. With fields 1 & 2 we only needed one condition to be true whereas with fields 3 and 4 two conditions needed to be true – field 3 must be ‘Bar’ AND field 4 must be greater than 0.

We initially tried to hack in the fields 3/4 requirement like this:

public class DependentOn : IValidateFields
{
	public bool IsSatisfied(Field field)
	{		
		if(OtherFieldIsX() && Double.Parse(field.Value) > 0)
		{
			return true;
		}		
 
		return string.IsNullOrEmpty(field.Value);
	}
}

Unfortunately that breaks the way that validation works for fields 1 & 2 and every other attempt we made seemed to make the situation worse and worse until after about half an hour of wrestling with this we decided to revert everything and just create a new object for this new type of validation.

We would accept some duplication until we were able to see a more meaningful abstraction that we could pull out.

Dave Cameron, who first taught me this approach, recently wrote a post describing how he removed some duplication in JavaScript code and you can see from the initial example in his post that he waited until he was exactly sure where the duplication was before getting rid of it.

I think this is the best approach and the rule of thumb seems to be that we should wait until we have the same thing 3 times before trying to remove the duplication.

Written by Mark Needham

April 30th, 2010 at 7:12 am

Posted in Coding

Tagged with

Listening to your tests: An example

without comments

I was recently reading a blog post by Esko Luontola where he talks about the direct and indirect effects of TDD and one particularly interesting point he makes is that driving our code with a TDD approach helps to amplify the problems caused by writing bad code.

if the code is not maintainable, it will be hard to change. Also if the code is not testable, it will be hard to write tests for it. If the code would not be written iteratively and no tests would be written for it, life would be much easier to the developer. In other words, TDD increases the pain caused by bad code, because it’s not possible to avoid changing the code, nor avoid writing tests for it.

This is something Steve Freeman and Nat Pryce talk about in their book and we recently had an example of this which really stemmed from a failure to drive this particular piece of code from the outside in.

We’d reached the stage where one object was taking in 8 different parameters in the constructor and then only used 2 of them in any one path through the code.

The code was pretty bad but it was even more noticeable how much of a mess we’d made when we had to write a new test.

This is a rough example of what the test fixture had begun to look like:

[TestFixture]
public class BadObjectTests
{
	private double parameter1 = 0.10;
	private double parameter2 = 0.20;
	private double parameter3 = 0.30
 
	[Test]
	public void ATestGoneABitWrong()
	{
		var badObject = CreateBadObject();
		var result = badObject.CalculateSomething()
 
		Assert.That(result, Is.EqualTo(parameter1));
	}
 
	private BadObject CreateBadObject()
	{
		return new BadObject(parameter1, parameter2, parameter3...);
	}
}

As a general guideline it’s good if we’re able to keep the context of a test all in one place but since the ‘BadObject’ had become increasingly difficult to construct we’d pulled that out into another method and extracted all its parameters as fields in the test fixture.

The alternative was to have all 8 parameters created in each test and then construct the ‘BadObject’ each time and in retrospect that would actually have made it more obvious that we were doing something wrong.

With that second approach we would undoubtably start copy/pasting tests which would provide another signal that we need to look at the design of the code and make it easier to test.

In this case the solution was to create several smaller objects which actually used all the parameters being passed in. I think we ended up with around 4 objects instead of 1 and each had simple tests that were easy to write.

Written by Mark Needham

April 27th, 2010 at 10:34 pm

Posted in Coding

Tagged with

Small step refactoring: Overload constructor

with 2 comments

I’ve previously written about some approaches that I’ve been taught with respect to taking small steps when refactoring code and another approach which a couple of colleagues have been using recently is the idea of overloading the constructor when refactoring objects.

On a couple of occasions we’ve been trying to completely change the way an object was designed and changing the current constructor would mean that we’d have to change all the tests against that object before checking if the new design was actually going to work or not.

Given a simple example where we want to change the following object to take in a completely different type and not use ‘field1′ or ‘field2′ anymore:

public class Foo
{
	public Foo(string field1, int field2)
	{
		// and so on
	}
}

One approach would be to change that constructor to take in the extra parameter and then gradually phase out the other parameters:

public class Foo
{
	public Foo(string field1, int field2, Bar bar)
	{
		// and so on
	}
}

An alternative is to create another constructor and leave the old one as it is before changing the tests one by one to make use of the new approach.

public class Foo
{
	public Foo(string field1, int field2)
	{
		// we gradually phase this one out
	}
 
	public(Bar bar)
	{
		// the new constructor
	}
}

Even if we want to still use ‘field1′ and ‘field2′ we can still make use of this approach and just default the extra value until we’ve migrated our tests across.

public class Foo
{
	public Foo(string field1, int field2, Bar bar)
	{
		// and so on
	}
 
	public Foo(string field1, int field2) : this(field1, field2, new Bar())
	{
	}
}

I quite like this approach as it allows us to keep the code compiling while we’re making changes to it to improve the design.

The main thing to remember with this approach is not to keep the old approach around for too long and to make sure we move our code across to use the new approach as quickly as possible otherwise it can become very confusing for other pairs which come across the code in this half way state.

Written by Mark Needham

April 25th, 2010 at 10:48 pm

Coding: Another outside in example

with 4 comments

I’ve written before about my thoughts on outside in development and we came across another example last week where we made our life difficult by not initially following this approach.

The rough design of what we were working on looked like this:

outsideIn1.gif

My pair and I were working on the code to do the calculations and we deliberately chose not to drive the functionality from the UI because the other pair were reworking all our validation code and we didn’t want to step on each others toes.

We therefore started driving the code directly from the individual calculations and decided to create an object to represent each of the types of calculation. It worked quite well for all but one of the calculation classes, ‘Calculation3, which became increasingly complicated as we added more test cases.

At this stage we rotated pairs and still unable to see a way to simplify ‘Calculation3′ we decided to try and drive it from the object which would actually make use of it.

As Darren Hobbs pointed out in another discussion:

Don’t design an API. Write the code that you’d want to be able to write to use your API.

In this case it became clear that the missing object which could drive out the Calculation objects was a ‘Calculator’:

outsideIn2.gif

We started writing tests for the calculator and initially it didn’t seem to make that much difference – the design of nearly all the individual ‘Calculation’ objects remained the same.

However, once we got to the tests which would drive out the functionality currently in ‘Calculation3′ it became clear that we had actually got 3 objects inside 1 and that what we really needed in this case was an orchestrating class which could delegate down to small objects to do the calculations.

outsideIn3.gif

The most interesting thing about this situation for me is that I often do drive code from the outside in and I’ve even previously written about the benefits of doing so but in this context I got it wrong and it was a slightly painful lesson!

Written by Mark Needham

April 18th, 2010 at 10:46 pm

Posted in Coding

Tagged with

Coding: Maybe vs Null Object patterns

with 13 comments

On the project I’m currently working on my colleague Christian Blunden has introduced a version of the Maybe type into the code base, a concept that originally derives from the world of functional programming.

The code looks a bit like this:

public interface Maybe<T>
{
	bool HasValue();
	T Value();
}
public class Some<T> : Maybe<T>
{
	private readonly T t;
 
	public Some(T t)
	{
		this.t = t;
	}
 
	public bool HasValue()
	{	
		return true;
	}
 
	public T Value()
	{		
		return t;
	}	
}
public class None<T> : Maybe<T>
{
	public bool HasValue()
	{	
		return false;
	}
 
	public T Value()
	{		
		throw new NotImplementedException();
	}	
}

We would then use it in the code like this:

public FooRepository
{
	public Maybe<Foo> Find(int fooId)
	{
		var foo = LookUpFooFromDatabase();
 
		if(foo == null)
		{
			return new None<Foo>();
		}
		return new Some<Foo>(foo);
	}
var maybeFoo = fooRepository.Find(1);
 
if(maybeFoo.HasValue())
{
	// do something with it
}
// fail in misery

The benefit we get from using this pattern is that we’re explicitly defining in the contract of ‘FooRepository.Find’ that the method might not return a ‘Foo’ rather than leaving the callee to work out whether or not they need to check for a null value.

It’s effectively the Nullable pattern except we can use it for reference types and not just primitives.

An alternative approach which Dermot pointed out is the null object pattern.

Typically when using that pattern we would treat the result of calling ‘FooRepository.Find’ the same regardless of whether we get a real ‘Foo’ or not.

That pattern would work quite well if we have to show a list of items in a grid, for example, and just showed blank cells if there isn’t a real ‘Foo’.

In our case we want to distinguish between whether we did or did not find a ‘Foo’ because the application behaves differently if we can’t find one. Therefore in this case the null object pattern doesn’t work so well.

Written by Mark Needham

April 10th, 2010 at 11:21 am

Posted in Coding

Tagged with

Coding: FindOrCreateUser and similar methods

with 8 comments

One of the general guidelines that I like to follow when writing methods is trying to ensure that it’s only doing one thing but on several recent projects I’ve noticed us breaking this guideline and it feels like the right thing to do.

The method in question typically takes in some user details, looks up that user in some data store and then returning it if there is an existing user and creating a new user if not.

It would probably look something like this:

public class UserService
{
	public User FindOrCreateUser(string username)
	{
		var user = userRepository.FindUserBy(username);
		if(user == null)
		{
			user = CreateUserFrom(username);
		}	
		return user;
	}
}

My initial thought when we wrote it was that it seemed wrong but the method name does clearly describe our requirement so I’m not so sure it’s a bad thing.

NHibernate has a ‘SaveOrUpdate’ method which to me seems to cover similar ground although Save/Update are much closer in functionality then Find/Create so maybe that’s more justifiable.

I’d be interested to know whether you think this type of method is perfectly fine or whether we’re very very bad people for writing it!

Written by Mark Needham

April 9th, 2010 at 7:09 am

Posted in Coding

Tagged with

Coding: Shared libraries

with 6 comments

On a few projects that I’ve worked on one of the things that we’ve done is create a shared library of objects which can be used across several different projects and while at the time it seemed like a good idea, in hindsight I’m not sure if it’s an entirely successful strategy.

I’m quite a fan of not recreating effort which is generally the goal when trying to pull out common code and within one team this seems to be a good approach the majority of the time.

When it comes to sharing across teams then I think we need to consider the perceived benefits a bit more because it doesn’t come without costs.

These are some of the types of code that we’ve shared previously:

Domain objects

I think this is the most dangerous type of code to share because although we often do have the same domain concepts in different projects, it’s quite rare that they mean exactly the same thing.

In addition there is an implicit coupling created with our database since we pretty much now have to make sure that our database schema matches up with the current version of that domain object.

Either that or we do have a shared database for all the applications which use that shared domain object in which case we have an even stronger coupling between applications.

We’re assuming that the two application have exactly the same domain concept and from my experience quite often that isn’t the case – even if there is a concept with the same name it may be used in different ways or mean something completely different in different applications.

This is quite similar to the problem with having a universal domain model which Dan points out in his classic SOA article.

In general I don’t think it makes sense to share this type of code.

Test code

This one seems like it should fairly universally a good idea – after all we often import 3rd party testing libraries so it seems like just sharing some common testing code shouldn’t be much different.

One piece of code that we shared was the Selenium bootstrapping code and this approach worked reasonably well until we wanted to adjust the amount of time between each command because commands were being sent to the browser before elements had the chance to load.

Apart from the fact that the other users of the library didn’t want anything change with respect to how they used the code we had to go and make the change in another project, build that and then update the reference that we had to the library.

Certainly this process would have been made easier if we’d used something like Ivy but the amount of duplication of code that we were saving didn’t seem worth the hassle it caused so we ended up inlining the code.

Infrastructure code

General infrastructure code e.g. code to handle NHibernate transactions which is quite unlikely to change seems one candidate which can work quite well in a shared library and so far I haven’t seen many problems arise from doing this.

I think the key with these bits of reusable code is that we keep them quite small and ensure that they have only one responsibility which will be useful for all the applications.

We eventually ended up slimming down our shared library and the majority of the code that remains in there is solving specific infrastructure type problems which will be the same across any applications using the same technical stack.

Things to be careful about when sharing code

One reason that we may share code is so that if there is a change then it only needs to be done in one place.

We need to have a degree of confident that if we put code in a shared library that this is actually the case.

If it’s likely that different applications might need shared code to change in different ways then we might not want to make that bit of code shared otherwise we’ll just end up with application specific code in a shared library.

From what I’ve noticed it makes most sense to put code which is unlikely to change and is generic enough to be useful across several applications as is into shared libraries.

For any other code it might actually be beneficial to accept that there will be some duplication between applications in the same organisation and not try and pull out a common piece.

Written by Mark Needham

February 26th, 2010 at 12:36 am

Posted in Coding

Tagged with