Mark Needham

Thoughts on Software Development

Archive for the ‘Incremental Refactoring’ Category

Ruby: Refactoring from hash to object

without comments

Something I’ve noticed when I play around with Ruby in my own time is that I nearly always end up with the situation where I’m passing hashes all over my code and to start with it’s not a big deal.

Unfortunately I eventually get to the stage where I’m effectively modelling an object inside a hash and it all gets very difficult to understand.

I’ve written a few times before about incrementally refactoring code so this seemed like a pretty good chance for me to try that out.

The code in the view looked something like this:

<% @tweets.each do |tweet| %>
  <%= tweet[:key] %>  <%= tweet[:value][:something_else] %>
<% end %>

@tweets was being populated directly from a call to CouchDB so to start with I needed to change it from being a collection of hashes to a collection of objects:

I changed the Sinatra calling code from:

get '/' do
  @tweets = get_the_couchdb_tweets_hash
end

to:

get '/' do
  tweets_hash = get_the_couchdb_tweets_hash
  @tweets = tweets_hash.map { |tweet| TweetViewModel.new(tweet) }
end

where TweetViewModel is defined like so:

class TweetViewModel
  attr_accessor :key, :value
 
  def initialize(tweet_hash)
    @key = tweet_hash[:key]
    @value = tweet_hash[:value]
  end
 
  def get(lookup)
    if lookup == :key
      key
    else
      value
    end
  end
 
  alias_method :[], :get
end

The next step was to get rid of the get method and rename those attr_accessor methods to something more intention revealing.

class TweetViewModel
  attr_accessor :url, :messages
 
  def initialize(tweet_hash)
    @url = tweet_hash[:key]
    @messages = tweet_hash[:value]
  end
end
<% @tweets.each do |tweet| %>
  <%= tweet.url %>  <%= tweet.messages[:something_else] %>
<% end %>

I originally didn’t realise how easy it would be to make the TweetViewModel pretend to temporarily be a Hash but it actually made it really easy for me to change the code and know that it was working the whole way.

For someone with more Ruby experience perhaps it wouldn’t be necessary to break out the refactoring like this because they could fairly confidently do it in one go.

Written by Mark Needham

February 27th, 2011 at 8:10 pm

Posted in Incremental Refactoring,Ruby

Tagged with

Incremental Refactoring: Create factory method

with 7 comments

Dermot and I spent a bit of time today refactoring some code where the logic had ended up in the wrong place.

The code originally looked a bit like this:

public class LookupService
{
	public LookUp Find(UserData userData)
	{
		var param1 = GetParam1From(userData);
		var param2 = GetParam2From(userData);
		var param3 = GetParam3From(userData);
 
		var lookupKey = new LookUpKey(param1, param2, param3);
		return lookupRepository.Find(lookupKey);
	}
}
public class LookUpKey
{
	private readonly string param1;
	private readonly string param2;
	private readonly string param3;
 
	public LookUpKey(string param1, string param2, string param3)
	{
		this.param1 = param1;
		this.param2 = param2;
		this.param3 = param3;
	}
}

We wanted to push the logic used to create the params in ‘LookUpKey’ into the ‘LookUpKey’ since it seemed that was a more natural place for that behaviour to live.

Unfortunately there were also a few other places that were use LookUpKey’s constructor so we couldn’t just go and change that to take in a ‘UserData’ otherwise we’d break other places in the code base.

My initial thought was that we could overload the constructor and temporarily have two constructors until we got rid of the original.

Dermot pointed out a better approach which was to add a static factory method to ‘LookUpKey’ and initially push the logic into that method.

public class LookUpKey
{
	...
 
	public static LookUpKey CreateFrom(UserData userData)
	{
		var param1 = GetParam1From(userData);
		var param2 = GetParam2From(userData);
		var param3 = GetParam3From(userData);
 
		return new LookUpKey(param1, param2, param3);
	}
}

The service is now much simplified:

public class LookupService
{
	public LookUp Find(UserData userData)
	{
		var lookupKey = LookUpKey.CreateFrom(userData);
		return lookupRepository.Find(lookupKey);
	}
}

We can also move tests that were originally indirectly testing the key creation by checking what was passed to the repository to be directly against the ‘LookUpKey’.

Eventually we want to make LookUpKey’s constructor private but for now we’ve been able to make our change without breaking the code base elsewhere.

Written by Mark Needham

June 17th, 2010 at 12:43 am

Consistency in the code base and incremental refactoring

with 6 comments

I wrote a post a while ago about keeping consistency in the code base where I covered some of the reasons that you might want to rewrite parts of a code base and the potential impact of those changes but an interesting side to this discussion which I didn’t cover that much but which seems to play a big role is the role of incremental refactoring.

In our code base we recently realised that the naming of the fields in some parts of a form don’t really make sense and I wanted to start naming new fields with the new naming style and then go back and change the existing ones incrementally when it was a good time to do so.

Richard and Christian suggested that this wouldn’t be such a good approach because it would make the naming issue even more confusing until all the fields had been renamed.

In order to avoid this problem we had to either go and change every single field to follow the new naming approach immediately or settle on the old names even though they might not be as descriptive.

Since doing the former would involve changing the names of around 15-20 fields across several different objects, in Hibernate mapping code, probably in the database, on HTML forms and in Watin tests we decided not to do that – the project is only for a short amount of time so the investment probably wouldn’t be worthwhile.

Although in this case it makes sense not to make the improvement it doesn’t strike me as being entirely satisfactory that we would need to make this type of change in a big bang fashion.

From my experience there are often insights into the code or improvements in the ubiquitous language as time goes on and while consistency is of course an important thing in any code base it’s not the only thing.

When do we decide that actually gradually moving to a better approach is worth the temporary pain that having this inconsistency will cause?

Written by Mark Needham

May 5th, 2010 at 10:34 pm

Posted in Coding,Incremental Refactoring

Tagged with

Small step refactoring: Overload constructor

with 2 comments

I’ve previously written about some approaches that I’ve been taught with respect to taking small steps when refactoring code and another approach which a couple of colleagues have been using recently is the idea of overloading the constructor when refactoring objects.

On a couple of occasions we’ve been trying to completely change the way an object was designed and changing the current constructor would mean that we’d have to change all the tests against that object before checking if the new design was actually going to work or not.

Given a simple example where we want to change the following object to take in a completely different type and not use ‘field1′ or ‘field2′ anymore:

public class Foo
{
	public Foo(string field1, int field2)
	{
		// and so on
	}
}

One approach would be to change that constructor to take in the extra parameter and then gradually phase out the other parameters:

public class Foo
{
	public Foo(string field1, int field2, Bar bar)
	{
		// and so on
	}
}

An alternative is to create another constructor and leave the old one as it is before changing the tests one by one to make use of the new approach.

public class Foo
{
	public Foo(string field1, int field2)
	{
		// we gradually phase this one out
	}
 
	public(Bar bar)
	{
		// the new constructor
	}
}

Even if we want to still use ‘field1′ and ‘field2′ we can still make use of this approach and just default the extra value until we’ve migrated our tests across.

public class Foo
{
	public Foo(string field1, int field2, Bar bar)
	{
		// and so on
	}
 
	public Foo(string field1, int field2) : this(field1, field2, new Bar())
	{
	}
}

I quite like this approach as it allows us to keep the code compiling while we’re making changes to it to improve the design.

The main thing to remember with this approach is not to keep the old approach around for too long and to make sure we move our code across to use the new approach as quickly as possible otherwise it can become very confusing for other pairs which come across the code in this half way state.

Written by Mark Needham

April 25th, 2010 at 10:48 pm

Refactoring: Small steps to pull out responsibilities

with 3 comments

I wrote previously about how I’ve been using effect sketches to identify responsibilities in objects so that I can pull them out into other objects and once I’ve done this I often find that I can’t see a small next step to take.

At this stage in the past I’ve often then stopped and left the refactoring until I have more time to complete it but this hasn’t really worked and a lot of the time I end up only seeing the code change in my mind and not in the actual code.

I came across this problem again last week in an object which had 8 dependencies when I came across it.

Having drawn the effect sketch I realised that 3 of those could be pulled out into a new object which could then be injected into the original object and help to encapsulate those other 3 dependencies.

The code that I wanted to change was something like this:

public class TheObject 
{
	private readonly DependencyA;
	private readonly DependencyB;
	private readonly DependencyB;
	...	
 
	public Foo FooCreation()
	{
		var dependencyAValue = dependencyA.GetSomething();
		var dependencyBValue = dependencyB.GetSomething();
		var dependencyCValue = dependencyC.GetSomething();
 
		return new Foo(dependencyAValue, dependencyBValue, dependencyCValue);
	}
 
	...
}

I wanted to pull the ‘FooCreation’ method out into another object and then change all the places that were calling ‘TheObject’FooCreation’ to just call this new object directly.

The first step was to create a ‘FooFactory’ and just have that delegated to internally:

public class FooFactory 
{
	private readonly DependencyA;
	private readonly DependencyB;
	private readonly DependencyB;
	...	
 
	public Foo Create()
	{
		var dependencyAValue = dependencyA.GetSomething();
		var dependencyBValue = dependencyB.GetSomething();
		var dependencyCValue = dependencyC.GetSomething();
 
		return new Foo(dependencyAValue, dependencyBValue, dependencyCValue);
	}
 
}
public class TheObject 
{
	...	
 
	public Foo FooCreation()
	{
		return new FooFactory(dependencyA, dependencyB, dependencyC).Create();
	}
 
	...
}

I ran out of time at this stage to finish off the refactoring but it was obvious where the refactoring was going so the next time I got the chance I injected the ‘FooFactory’ into ‘TheObject’:

public interface IFooFactory 
{
	Foo Create();
}
public class TheObject 
{
	private readonly IFooFactory;
	...	
	public TheObject(IFooFactory fooFactory) 
	{
		this.fooFactory = fooFactory;
	}
 
	public Foo FooCreation()
	{
		return fooFactory.Create();
	}
 
	...
}

To do this I had to go and change the tests on ‘TheObject’ and move some of them to go directly against ‘FooFactory’.

The third stage of the refactoring was to change all the places which called ‘TheObject.FooCreation()’ to just call ‘FooFactory.Create()’ directly.
Some of those places were also using other methods on ‘TheObject’ so those objects now have an extra dependency although I think at least the code is more intention revealing than it was previously.

I’m sure there are some other patterns for this type of small step refactoring but this is just one that I’ve noticed so far.

Written by Mark Needham

February 24th, 2010 at 12:45 am

Consistency in the code base

with 2 comments

I’ve had quite a few discussions with various different colleagues about coding consistency over the last year or so and Pat Kua and Frank Trindade have both written posts suggesting that we should look to have coding standards on projects in order to avoid the type of pain that having an inconsistent approach can lead to.

From what I’ve noticed there seem to be two reasons that we end up with inconsistent code on projects:

  1. People’s personal preferences for how certain bits of code should be written vary.
  2. The team is trying to incrementally move towards an improved solution to a problem encountered but isn’t completely there yet.

I think the first of these is the area which Frank and Pat are addressing in their posts.

However I think there is some overlap between the two. For example someone may rewrite a bit of code in their own style and while they would suggest that what they are doing is improving the code base their team mates may think that what they are doing is merely creating more inconsistency.

Potential coding inconsistency falls into three categories as I see it:

Personal Preferences

A simple example of this that we had on the last project I worked on was around how we should write simple conditional statements when used as guard clauses.

These were the 3 different ways that people preferred and we had all the different styles across the code base!

public String SomeMethod()
{
	if(SomeCondition()) return "hello";
}
public String SomeMethod()
{
	if(SomeCondition()) 
		return "hello";
}
public String SomeMethod()
{
	if(SomeCondition()) 
	{
		return "hello";
	}
}

We eventually agreed to use the last one because it was the least controversial choice and although people who preferred the other two approaches thought it was too verbose they were fine with using that as a compromise.

Pat covers a lot of the other types of code that I would classify as being linked to personal preference and I think that it would be beneficial to have coding standards/agreement in the team for the patterns we favour/names we’re going to use in this area.

It’s way easier to just go and write code in your own personal style and I do this way too frequently but it’s much better for the team if we follow a common style.

Improving the code but perhaps not significantly

I’ve written previously about the way that we’ve been making use of a ‘GetBuilderFor‘ class to hang our test builders off and while this worked quite well there is a better way to do this which requires less code being written overall.

We can make use of C#’s object initializer syntax to setup fields with values instead of having to create methods to do that and we don’t need to write the code to make the builder hang off ‘GetBuilderFor’.

As long as we put all the builders in the same namespace we can just type ‘new BuilderNameSpace.’ and then the IDE will show us the choices of builder that we have.

The problem we experienced was that that we already had 30-40 builders written using the ‘GetBuilderFor’ approach and there wasn’t a significant advantage to be gained by going and rewriting all that code to use the new approach.

In the interests of consistency we therefore decided to keep using the ‘GetBuilderFor’ approach even though the next time I come across this problem I’d definitely favour the other approach.

Reg Braithwaite has a really interesting post in which he talks about ‘the narcissism of small code differences‘ and how different authors ‘improve’ the code by adding their own personal touch.

I think he describes the type of code changes that I would classify in this and the previous category although it does make me wonder where the line between refactoring and adding your own personal touch to a piece of code lies.

Improving the code significantly

These are the code changes that may create inconsistency but help us to drive a design improvement which will have a big impact on the way we work.

An example of this that we had on my last project was moving our ‘domain’ model from being defined in WCF messages to being defined independently of other layers in the system.

We wanted to make this change to make the dependency between our code base and the service layer more loosely coupled and we started making that change around about February.

It took maybe a month to move the most awkward parts across since we did it bit by bit in an incremental way rather than stopping everything and making that change.

Along the way the code in these areas was in an inconsistent state and anyone looking at the code base who didn’t know the reason why it was like that would think it was truly insane!

Even now there are still bits of code which are more reminiscent of the old approach than the new and these tend to get changed when people are working in that area.

I think this type of inconsistency is somewhat inevitable if we’re incrementally making improvements to our code base or following the boy scout rule as Uncle Bob puts it.

In Summary

These are my current observations on coding inconsistency but I’m sure there are other factors which can affect this as well.

I think we should look to reduce the inconsistencies in the first two categories as much as possible and ensure that everyone is driving towards the same outcome with respect to the last category.

Written by Mark Needham

November 4th, 2009 at 9:39 pm

Posted in Coding,Incremental Refactoring

Tagged with

Safe refactoring: Removing object initializer, introducing builder

with 3 comments

I previously wrote about an approach we took to safely remove some duplication and I recently followed a similar mantra to replace an object initializer call which had around 40 properties being setup with a builder to try and make the code a bit easier to understand.

We did have tests checking the values being setup by the object initializer so I was already able to refactor with some degree of safety – it would probably have been possible to just create the builder and build the object from that and then delete the old code and replace it with the new but I’ve caused myself too many problems from doing that before that I decided to try a more incremental approach.

The idea was to have the builder and the object initializer both creating the object at the same time and as I built a property from the builder I would set it in the object initializer until eventually all of the properties were being set directly from the builder’s object and then I could just return that instead – this approach feels quite similar to what Kent Beck describes as having parallel implementations in his recent presentation on Responsive Design.

The code I started with was something like this:

public Foo CreateMeAFoo() 
{
	return new Foo
	{
		Bar = "someValue",
		OtherBar = "someOtherValue",
		UltraBar = "aValue",
		...
		AnotherArgumentWayDownHere = 1
		...
		AndSoOn = "yes"
 
	}
}

I worked together with one of the business analysts on our team who pointed out that there were actually some clear groupings in what I had just been considering ‘data’ and we were able to put those explicit domain concepts into the code as part of the builder.

My first step however was to remove the object initializer to avoid making any silly mistakes – an example of one I have made when using object initializers is to set a property using ‘Int16.Parse’ and then passing in a string with a value of “53700″ which causes the method to throw an exception and the stack trace just directs you to the first line of the object initializer, making it quite difficult to work out which line has failed.

Having worked the code into a sequence of setters I gradually added methods to the builder to create the policy:

public Foo CreateMeAFoo() 
{
	var fooBuilder = new FooBuilder().AddBars("someValue", "someOtherValue", "aValue);
	var newFoo = fooBuilder.Build();
 
	var foo = new Foo();
	foo.Bar = newFoo.Bar;
	foo.OtherBar = newFoo.OtherBar;
	foo.UltraBar = newFoo.UltraBar;
	...
	foo.AnotherArgumentWayDownHere = 1
 
	return foo;
}
public class FooBuilder  
{
	private string bar;
	private string otherBar;
	private string ultraBar;
 
	public FooBuilder AddBars(string bar, string otherBar, string ultraBar)
	{
		this.bar = bar;
		this.otherBar = otherBar;
		this.ultraBar = ultraBar;
		return this;
	}
 
	public Foo Build()
	{
		var foo = new Foo();
		foo.Bar = bar;
		foo.OtherBar = otherBar;
		foo.UltraBar = ultraBar;
		return foo;
	}
}

I created some duplication by doing this – I am now creating the ‘Foo’ twice – but I didn’t check any of the code into the main branch until I had totally replaced the original Foo creation with the builder.

I did about two of three properties at a time and then ran the tests which I thought might be too small a step but actually saved me on a couple of occasions so it probably actually saved me time.

Eventually when I had all the tests passing I got rid of the original Foo and replaced it with the one from the builder:

public Foo CreateMeAFoo() 
{
	var fooBuilder = new FooBuilder().AddBars("someValue", "someOtherValue", "aValue);
	return fooBuilder.Build();
}

This code is still in a state of transition – it is still possible to create an object with half the fields not set by passing in nulls to the builder for example – and I’m trying to work out what the best step is to fix that.

I generally prefer to have everything setup in the constructor and then you know that the object is in a good state as soon as you create it, but in this case moving everything straight into the constructor will probably make the object even more difficult to deal with.

My current thinking is to maybe check the pre conditions for creating the object inside the builder and then refactor out value objects so that there are not so many properties overall but I’m open to other ideas.

Written by Mark Needham

June 26th, 2009 at 12:02 am

Refactoring: Removing duplication more safely

with 5 comments

One of the most important things that I’ve learnt from the coding dojo sessions that we’ve been running over the last six months is the importance of small step refactorings.

Granted we have been trying to take some of the practices to the extreme but the basic idea of trying to keep the tests green for as much time as well as keeping our code in a state where it still compiles (in a static language) is very useful no matter what code we’re working on.

Recently a colleague and I were doing some work on our code base to try and remove some duplication – we had two classes which were doing pretty much the same thing but they were named just slightly differently.

The implementation was also slightly different – one was a list containing objects with Key and Value properties on and the other was a dictionary/map of key/value pairs.

We spent a bit of time checking that we hadn’t misunderstood what these two different classes were doing and having convinced ourselves that we knew what was going on decided to get rid of one of them – one was used in around 50% more places than the other so we decided to keep that one.

We now needed to replace the usages of the other one.

My pre-coding dojo refactoring approach would have been to just find the first place that the one we wanted to replace was being used, delete that and then let the compiler guide me to replacing all its usages and then do that with the second usage and so on.

The problem with this approach is that we would probably have had the code in a state where it didn’t compile for maybe half an hour, leading to a situation where we would be unable to run our tests for any of this time which would mean that we would lose confidence that the changes we were making actually worked.

The approach we therefore took was to add in the class we wanted to keep side by side with the one we wanted to get rid of and slowly move our tests across to setup data for that.

We therefore ended up with code looking a bit like this to start with:

public class IHaveUsages
{
	public IDictionary<OldType, string> OldType { get; set; }
	public IList<NewType> OldType2 { get; set; }
}

When changing tests we took the approach of commenting out the old setup code to start with so that we could see exactly what was going on in the setup and then delete it once we had done so. I’ve written previously about my dislike for commented code but we were using it in this case as a mechanism to guide our refactoring and we didn’t ever check the code in with these comments in so I think it was a reasonable approach.

[Test]
public void SomeTest()
{
	// some test stuff
	new IHaveUsages 
	{
		// OldType = new Dictionary<OldType, string> {{new OldType("key"), "value"}},
		OldType2 = new List<NewType> { new NewType { Key = "key", Value = "value" } } 
	}
}

The intention was to try and reduce the number of places that OldType was being used until the point where there were none left which would allow us to safely remove it.

Once we had made that change to the test setup we needed to make the changes in the class using that data to get our green bar back.

On a couple of occasions we found methods in the production code which took in the OldType as a parameter. In order to refactor these areas we decided to take a copy of the method and renamed it slightly before re-implementing it using the NewType.

private void OldMethod(OldType oldType) 
{
	// Do some stuff
}
private void OldMethod2(NewType newType) 
{
	// replicate what's being done in DoSomeStuffOn
}

We then looked for the usages of ‘OldMethod’ and replaced those with calls to ‘OldMethod2′, also ensuring that we passed in NewType to the method instead of OldType.

I’m intrigued as to whether there is an even better way to perform this refactoring – when I chatted with Nick about this he suggested that it might have been even easier to create a temporary inheritance hierarchy with NewType extending OldType. We could then just change any calls which use OldType to use NewType before eventually getting rid of OldType.

I haven’t tried this approach out yet but if it makes our feedback cycle quicker and chances of failing less then I’m all for it.

Written by Mark Needham

May 26th, 2009 at 1:20 pm

Refactoring: Comment it out vs small steps removal

with one comment

One refactoring I was doing last week was to try and remove the use of some getters/setters on one of our objects so that it was better encapsulated and all the behaviour related to it happened in one place.

The change involved introducing a constructor to initialise the object rather than doing so using the new object initialiser syntax and initalising it using the properties.

My initial approach was to find all the usages of these properties and then remove each usage one by one, running our suite of tests against the code after each change to ensure that nothing had broken as a result of the change.

While we were doing this my colleague pointed out that it would probably be quicker to just comment out the properties code and then recompile the code – the list of errors would then point out the areas which relied on the properties and we could then fix the code that way.

It’s a technique I’ve used previously and it worked out reasonably well because there were only 10 or so usages, meaning we were able to make all those changes and run the tests in under ten minutes.

I think if there had been bigger changes to make – for example if the properties had been being used to expose the data all over the application (luckily we only had one case of this) then the smaller steps approach may have been preferable.

Getting the balance between taking small steps and getting rapid feedback was also a consideration here.

We were running our unit and javascript tests as we made the changes since we were uncertain exactly what impact the changes would have. It took around 70 seconds to run these tests, which I think encouraged slightly bigger steps so that we weren’t sitting around waiting too often.

Both of these approaches are useful and it seems to me that maybe a combination of them, at different stages of our refactoring, would prove to be most useful.

Written by Mark Needham

February 8th, 2009 at 9:10 am

Posted in Coding,Incremental Refactoring

Tagged with