Archive for November, 2009
Book Club: Working Effectively With Legacy Code – Chapter 9 (Michael Feathers)
In our latest technical book club we discussed chapter 9 – ‘I Can’t Get This Class Into A Test Harness’ – of Michael Feather’s ‘Working Effectively With Legacy Code‘.
This chapter goes through various problems that we might have getting a class under test and then suggests different techniques to get around those problems.
These are some of my thoughts and our discussion of the chapter:
- One approach that Feathers describes when dealing with constructors which take in a lot of values is to just pass in nulls for the parameters that we don’t care about.
While I think this is a useful strategy it’s useful to keep in mind that the need to do this is often an indication that the class is doing too much and we probably need to look at breaking that class out into two smaller ones.
Another useful bit of advice Feathers gives is to try and get the code under test instead of discussing all the reasons why we won’t be able to do so:
The best way to see if you will have trouble instantiating a class in a test harness is to just try to do it. Write a test case and attempt to create an object in it. The compiler will tell you what you need to make it really work.
I think this advice is applicable to most aspects of software development. A lot of time is spent debating approaches when it would be far better just to try something out and see what happens.
- Several patterns are described for dealing with singletons in our code and Matt also pointed out the mono state pattern which Uncle Bob outlined as being a step on the way to getting rid of them.
- The sub class and override pattern is described for handling several of the problems we might encounter. The idea here is to get the code into a position where the code which is making it difficult to test a class is all in one protected method which we can then override in a sub class and replace with a fake/stub version of the method.
I’ve previously fallen into the trap of thinking that this is a pattern we should be aiming for when designing code from scratch. In particular it seems to happen a lot in ASP.NET MVC controllers when working out how to test code in ‘OnActionExecuting’ and similar methods.
I now believe it is another indicator that we are too highly coupled to a concrete implementation which has side effects and we might be better off looking for an abstraction that allows us to break that.
- Feathers makes another really telling comment towards the end of the chapter when discussing how we use language features to try and enforce certain things in our code:
In the end, it all comes down to responsible design and coding
This reminded me of his ‘Design Sense‘ presentation where he covers similar ground and it’s very true. No matter what language features we have it’s down to us to use them sensibly in the systems we write.
The ‘should’ word
I’ve been reading Coders at Work recently and one of my favourite answers from the first chapter interview with Jamie Zawinski is the following:
I think one thing that’s really important is not to be afraid of your ignorance. If you don’t understand how something works, ask someone who does. A lot of people are skittish about that. And that doesn’t help anybody. Not knowing something doesn’t mean you’re dumb – it just means you don’t know it yet.
A variation of this which I’ve noticed myself doing is internally telling myself that I ‘should’ know how to do certain things much better than I can.
This is most typically the case when I’m struggling with something on a new project that I’m working on and while it is indirectly useful for helping to identify areas that we can work I think the voice in itself is not that helpful to our learning.
When this happens I’ve started writing down whatever it is that I think I should know better and then taking some time to read more in that area.
Wherever possible I also try to speak to people who already have that skill and find out how they went about learning it.
For example, as I mentioned in my post about reading the Unity source code, reading code is something I want to get better at and when I’ve worked with Dave Cameron he’s able to understand how things fit together much more quickly than I am.
When we discussed this he pointed out that he’d spent a lot of time debugging through a lot of different code bases just for fun and working out how they fitted together by following the control flow.
It’s pretty much always the case that others have spent quite a bit of time working on these skills so it’s certainly something to keep in mind next time I come across something I ‘should’ know!
Clojure: A first look at recursive functions
I’m working through Stuart Halloway’s ‘Programming Clojure‘ book and I just got to the section where it first mentions recursive functions.
It’s a simple function to countdown from a given number to zero and then return that sequence.
This was one of the examples from the book:
(defn countdown [result x] (if (zero? x) result (recur (conj result x) (dec x))))
That function could then be called like this:
(countdown [] 5)
I wanted to see what the function would look if we didn’t have the empty vector as a parameter.
From playing around with F# and Scala my first thought would be to write the function like this:
(defn count-down [from] (defn inner-count [so-far x] (if (zero? x) so-far (inner-count (conj so-far x) (dec x)))) (inner-count [] from))
As the book points out a bit further on, Clojure doesn’t perform automatic tail call optimisation so we end up with a stack overflow exception if we run the function with a big enough input value.
Clojure does optimise calls to ‘recur’ so it makes more sense to use that if we want to avoid that problem.
This is an example which makes use of that:
(defn count-down [from] (defn inner-count [so-far x] (if (zero? x) so-far (recur (conj so-far x) (dec x)))) (inner-count [] from))
Looking through the Clojure mailing list at a similar problem I noticed that one of the suggestions was to arity overload the function to include an accumulator.
(defn count-down ([from] (count-down [] from)) ([so-far from] (if (zero? from) so-far (recur (conj so-far from) (dec from)))))
Written this way it feels a little bit like Haskell or Erlang but probably not idiomatic Clojure.
Anyway on the next page Halloway shows a better way to do this with much less code!
(into [] (take 5 (iterate dec 5)))
I noticed that in Scala the idea of using ‘take’ and ‘drop’ on streams of values seems to be quite popular so I’m intrigued as to whether I’ll find the same thing with Clojure.
A reminder to talk to the rubber duck
Alongside taking a break from it perhaps one of the most effective ways to solve a tricky problem is to describe it to someone else.
When pairing
This typically isn’t a problem when pair programming although it can still happen if a pair stays together too long and both start making the same possibly incorrect assumptions when trying to solve a problem.
In this case it makes sense to call someone else over who can lend a fresh perspective to the problem.
At this stage all three people can work through the problem together and then the third person can go back to whatever they were doing before or this can be the catalyst for a pair rotation whereby one of the original two will drop from the pair.
When working alone
When working alone I get into the situation where something ‘should’ be working but isn’t much more frequently. I find it really frustrating when this happens so I probably call a colleague over to look through the problem more frequently than most.
Frequently just describing the problem to them so that they can help is enough to realise where the problem lies.
I’ve been working with Raph recently and last week on a couple of occasions I found myself imagining a conversation with Raph about a problem I was having and I found that just doing that was enough for me to see where I’d gone wrong.
Luciano Passuello refers to this to as talking to the rubber duck and he describes a way that we can apply this technique silently as well:
There are situations when you can’t make noises or simply can’t afford the social awkwardness of it (“excuse me while I talk to my duck…”). Fear not: do your rubber ducking in writing instead — sure, you won’t have the benefits of verbalization, but the technique still works fine.
One way to do it that works particularly well is to write an e-mail message explaining your problem. Pretend you’re going to send it to the smartest (and busiest) person in the world. Describe the problem in detail, the solutions you tried so far and why they didn’t work. List your assumptions and make sure you include all relevant information. Be both thorough and objective.
When you are done writing the message, you’ll probably have had many ideas to try out. If not, find a specialized forum and just send it as it is: with your perfectly-crafted message, I’m sure the Internet gods will help you.
I actually hadn’t come across this blog post when I accidentally came across this idea while writing an email describing why something I’d been working on wasn’t quite working.
As I described the main part of the problem it became obvious from reading it back to myself that I’d missed out a small but important detail.
I was then able to go back to the code and fix the problem!
Mercurial: hg bisect
We’ve been using Mercurial locally on the project I’ve been working on and Phil showed me a cool feature called ‘bisect‘ a couple of weeks ago which can be helpful for working out which revision we managed to break our code in.
It’s been ported across from Git and is included in Mercurial from version 1.0.0 rather than just being an extension.
From the bisect extension page:
Its behaviour is fairly simple: it takes a first revision known to be correct (i.e. without the bug) and a last revision known to be bad (i.e. with the bug). The bisect extension ouputs a revision halfway between the good and the bad ones and lets you test it. If this revision is a good one, you mark it as good with hg bisect good, otherwise you mark it as bad with hg bisect bad. In both cases, bisect outputs a new revision to test, halfway between the good and the bad ones. You repeat until only one revision is left: the culprit.
The usage has changed a bit now that it’s included as part of the initial download.
I was working on something yesterday and checking in fairly regularly before realising that I’d broken something.
I was fairly sure that the break had happened in the tip (revision 98) but I could only remember it definitely working in revision 96.
I defined the good and bad revisions like this:
hg bisect -b tip
hg bisect -g 96
In this case revision 97 was now checked out and I checked that the code was working with that revision before marking it:
hg bisect -g
It’s now able to work out that the problem is in fact in revision 98:
The first bad revision is: changeset: 98:86260809c309 tag: tip user: mneedham date: Fri Nov 13 16:31:02 2009 +1100 summary: seem to have screwed up the graphs. Not sure how
I managed to mess up setting up the good and bad revisions a few times – I somehow had the impression that I needed to manually update to the original good and bad revisions which isn’t the case.
The reset command was my friend before I worked out what I was doing wrong:
hg bisect -r
It seems like a neat little feature. I’m sure there are lots of other cool things like this in Mercurial so if you know any let me know!
* Update *
As Rob points out in the comments the command is ‘bisect’ rather than ‘bisec’ as I had it originally. Both commands work for the moment though!
TDD: Combining the when and then steps
I’ve written before about my favoured approach of writing tests in such a way that they have clear ‘Given/When/Then’ sections and something which I come across quite frequently is tests where the latter steps have been combined into one method call which takes care of both of these.
An example of this which I came across recently was roughly like this:
@Test public void shouldCalculatePercentageDifferences() { verifyPercentage(50, 100, 100); verifyPercentage(100, 100, 0); verifyPercentage(100, 50, -50); }
private void verifyPercentage(int originalValue, int newValue, int expectedValue) { assertEquals(expectedValue, new PercentageCalculator().calculatePercentage(originalValue, newValue)); }
This code is certainly adhering to the DRY principle although it took us quite a while to work out what the different numbers being passed into ‘verifyPercentage’ were supposed to represent.
With this type of test I think it makes more sense to have a bit of duplication to make it easier for us to understand the test.
We changed this test to have its assertions inline and make use of the Hamcrest library to do those assertions:
@Test public void shouldCalculatePercentageDifferences() { assertThat(new PercentageCalculator().calculatePercentage(50, 100), is(100)); assertThat(new PercentageCalculator().calculatePercentage(100, 100), is(0)); assertThat(new PercentageCalculator().calculatePercentage(100, 50), is(-50)); }
I think we may have also created a field to instantiate ‘PercentageCalculator’ so that we didn’t have to instantiate that three times.
Although we end up writing more code than in the first example I don’t think it’s a problem because it’s now easier to understand and we’ll be able to resolve any failures more quickly than we were able to previously.
As Michael Feathers points out during Jay Fields’ ‘Beta Test‘ presentation we need to remember why we try and adhere to the DRY principle in the first place.
To paraphrase his comments:
In production code if we don’t adhere to the DRY principle then we might make a change to a piece of code and we won’t know if there’s another place where we need to make a change as well.
In test code the tests always tell us where we need to make changes because the tests will break.
Adapting our approach for the context
Amongst the many posts written recently about unit testing one which I quite liked was written by fallenrogue where he describes how in different contexts/cultures a different approach is favoured which means a technique like TDD might not work so well.
cashto, the guy who wrote the original post, agrees with this in the comments on that post:
Absolutely right. I write apps on mobile devices in C++. What works for me may not work well for someone who designs websites with RoR, and vice versa. Context definitely matters — it was one of the points I was trying to get across.
Recently I’ve started noticing that it’s quite beneficial to consider the context that we’re working in for all sorts of approaches we might want to take.
Immutability
When I was first playing around with F# I liked the idea of having everything in the code being immutable and I decided that it would be interesting to try and apply that idea to the code I wrote in C# on the project I was working on.
Unfortunately it didn’t work out as well as I’d hoped because it requires more code to do that, it’s not really idiomatic C# so it was confusing to my colleagues and we ended up having objects hanging around which weren’t updated with the latest state changes because I’d forgotten to re-assign the variable to the new instance.
I still think that we could probably make more use of value objects in our code but I’m not convinced that aiming for immutability everywhere when using C# is the way to go.
If it was a different context such as a problem we were trying to solve in Haskell or to a lesser extent F# then it would make perfect sense.
Confidence changing code
Another example of this is the confidence that we can have in making changes to the code base.
I’ve worked on projects where there’s been a lot of test coverage across the board and also ones where there wasn’t.
With the latter we’d have to put some time in to get tests around that code before thinking about making big changes.
If we don’t have the time to do that then we need to be quite careful when changing code since we don’t have a quick feedback mechanism to let us know whether we’ve broken anything or not.
Final thoughts
Corey Haines wrote a post on pragmatism about a month ago where he suggested the following:
Often times, people use the term ‘pragmatic’ as a way to hide a lack of skill and experience. Or, sometimes, it is used in ignorance: someone doesn’t realize that they don’t understand something well enough. Usually, though, it is brought to play when someone is justifying cutting corners on something.
I think this applies when deciding that an approach isn’t useful in our context. We need to be careful that we’re not just making excuses for our lack of skill in that area.
When I first started working professionally in software development I was fairly convinced that there would be an optimal approach to take in every situation that I just didn’t know about yet.
A few years later I’ve come across a lot of really good approaches to solving problems but no approach that is applicable across the board.
Coding: Pushing the logic back
I was reading a post on the law of demeter by Richard Hart recently and it reminded me that a lot of the refactorings that we typically do on code bases are about pushing the logic back into objects instead of exposing data and performing calculations elsewhere.
An example that I spotted where we did this recently was while building a ‘BusinessSummary’ object whose state was based on the state of a collection of other objects.
The code was keeping a count of the various states of the business objects:
public BusinessSummary buildSummaryObject() { BusinessSummary businessSummary = new BusinessSummary(); for(BusinessObject businessObject : businessObjects) { State state = businessObject.getState(); if(State.STATE_1.equals(state)) { businessSummary.incrementState1(); } else if(State.STATE_2.equals(state)) { businessSummary.incrementState2(); } else { // and so on } } return businessSummary; }
In this example ‘State’ is an enum representing the state the ‘BusinessObject’ is currently in.
Our first change to the code was to push the logic around state back into the ‘BusinessObject’ which results in the following code:
public BusinessSummary buildSummaryObject() { BusinessSummary businessSummary = new BusinessSummary(); for(BusinessObject businessObject : businessObjects) { if(businessObject.hasState1()) { businessSummary.incrementState1(); } else if(businessObject.hasState2()) { businessSummary.incrementState2(); } else { // and so on } } return businessSummary; }
public class BusinessObject { ... public boolean hasState1() { return State.STATE_1.equals(state); } ... }
I think this is an improvement but it still isn’t following the tell don’t ask principle, we’re just asking to be told about the data instead of asking for the data itself.
We didn’t have the time to do the next change where we would have got rid of the ‘hasState’ methods and just got the ‘BusinessObject’ to update the ‘BusinessSummary’ itself:
public BusinessSummary buildSummaryObject() { BusinessSummary businessSummary = new BusinessSummary(); for(BusinessObject businessObject : businessObjects) { businessObject.writeTo(businessSummary); } return businessSummary; }
public class BusinessObject { private final State state; ... public void writeTo(BusinessSummary businessSummary) { if(State.STATE_1.equals(state)) { businessSummary.incrementState1(); } else if(State.STATE_2.equals(state)) { businessSummary.incrementState2(); } else { // and so on } } ... }
Looking back on this code having written this post I wonder whether we need to do the loop or whether it would be better to pass the collection of ‘businessObjects’ to the ‘BusinessSummary’ and let it take care of that logic instead.
If we did that then we would need to expose ‘hasState1′ and ‘hasState2′ methods on ‘BusinessObject’ like on the first refactoring so that we could work out how to populate ‘BusinessSummary’.
I prefer the solution where ‘BusinessObject’ writes to the ‘BusinessSummary’ although it seems like ‘BusinessObject’ now has more than one reason to change – if it changes or if ‘BusinessSummary’ changes. This would violate the single responsibility principle
I discussed this with Dave and he pointed out that as long as the contract that ‘BusinessSummary’ and ‘BusinessObject’ have – i.e. the ‘incrementState1′ and ‘incrementState2′ methods – stays the same then it wouldn’t really be a problem.
Legacy Code: Sensing
In ‘Working Effectively With Legacy Code‘ Michael Feathers describes two reasons for wanting to break dependencies in our code – to allow separation and sensing.
The former describes the need to get a piece of code into a test harness while the latter describes the need to assert whether that piece of code is doing what we want it to.
On the projects I’ve worked on we’ve tended to run into problems with the latter more frequently and Matt and I actually ran into this problem when we were refactoring some code into a role based interface approach.
We started with the following code:
public class ApplicationController : Controller { public string BusinessType { get { return GetType().Replace("Controller", ""); } } public override void OnActionExecuting(ActionExecutingContext context) { base.OnActionExecuting(context); ViewData["SomeViewDataKey"] = CreateThatViewDataStuff(); } private ViewDataStuff CreateThatViewDataStuff() { return new ViewDataStuff { Content = BuildContent() // and so on }; } private IContent BuildContent() { if(BusinessType == "BusinessType1") { return CreateContentForBusinessType1(); } else if(BusinessType == "BusinessType2") { return CreateContentForBusinessType2(); } else { return CreateContentForEveryOtherBusinessType(); } } }
The ‘BuildContent’ method is the one that we’re really interested in here but it’s a private method on the controller so we currently don’t have an easy way to assert that it’s being created correctly.
In order to test that functionality more easily we decided to make the method public so we could just call it directly.
We wanted to do this so that we could write some tests around this bit of functionality before we refactored it so that we could be sure we hadn’t broken the way it worked while doing so.
As Hamlet D’Arcy points out, if we’re going to call it refactoring then we need to make sure that we’re protected by tests. Otherwise we’re just changing stuff.
[Test] public void ShouldCreateContent() { var someRepository = MockRepository.CreateMock<ISomeRepository>(); var controller = new BusinessType1Controller(someRepository); someRepository.Expect(s => s.GetBusinessType1Message()).Return("someValue"); var content = controller.BuildContent(); // and so on }
We did the same for the other business types as well.
I normally don’t like making private methods public but we didn’t intend to checkin our code until the refactoring was complete at which point the method would be made private again.
The code ended up something like this:
public class BusinessType1Controller : ApplicationController { public override IContent CreateContent() { return new BusinessType1Content(someRepository); } }
public class BusinessType1Content : IContent { private readonly ISomeRepository someRepository; public BusinessType1Content(ISomeRepository someRepository) { this.someRepository = someRepository; } public string GetMessage() { // this would be a different repository call for other business types return someRepository. GetBusinessType1Message(); } }
We were able to create a test against the business types directly instead of having to write a test against the controller:
[Test] public void ShouldCreateContent() { var someRepository = MockRepository.CreateMock<ISomeRepository>(); var content = new Content(someRepository); someRepository.Expect(s => s.GetBusinessType1Message()).Return("someValue"); var message = controller.GetMessage(); // and so on }
Having got these new tests passing the ones we’d written against ‘BuildContent’ now also worked and since they were effectively testing the same thing we were able to get rid of them and make ‘BuildContent’ private again.
Despite the fact that I was initially reluctant to expose the ‘BuildContent’ method this approach seemed to work out reasonably well.
An alternative approach would have been to create a test only ApplicationController and then create a method on that which called the ‘OnActionExecuting’ method.
We’ve done that to test some other things and it would have allowed us to test the creation of ‘Content’ in a more roundabout way without needing to make ‘BuildContent’ public.
The additional effort involved in doing that for not much gain means that given a similar situation in the future I’d probably use just make the method public again!
Coding: The agent noun class
I refer quite frequently to a post written by my colleague Peter Gillard Moss where he describes the agent noun code smell for class names.
An agent noun is defined by Wikipedia as:
In linguistics, an agent noun (or nomen agentis) is a word that is derived from another word denoting an action, and that identifies an entity that does that action.
Some typical examples of this are classes which end in the name ‘Manager’, ‘Retriever’, ‘Helper’ or even ‘Controller’ as Carlos points out.
It’s often the case that these classes better describe a method which belongs on another object and I quite like Peter’s idea that agent nouns are useful for describing the roles that our objects carry out.
We came across an interesting problem related to this a while ago where we wanted to calculate the age of a given customer and then send that data to another web page where it would be displayed.
The date of birth was an attribute on the ‘Customer’ object and we didn’t need any of the other attributes of a customer on this web page so my first thought was that we needed an ‘AgeCalculator’ class.
public class AgeCalculator { private readonly IClock clock; public AgeCalculator(IClock clock) { this.clock = clock; } public int CalculateAge(DateTime dataOfBirth) { // do some calculation with clock & date of birth here } }
public void SomeMethod() { var age = new AgeCalculator(new Clock()).CalculateAge(customer.DateOfBirth); // and so on }
This approach worked quite well as it allowed us to easily test the age calculation logic in isolation and we could then pass on the age returned to the web page.
The thing I didn’t like about this solution is that it seems a lot like an agent noun class.
The giveaway in this case is that the method name is just a variation on the class name. This suggests that this behaviour belongs on another object instead.
It’s not as object oriented as it could be but it doesn’t seem as bad as some of the other examples such as classes ending in ‘Manager’ since this object is only solving one problem and we do have calculators in real life.
Some colleagues suggested that perhaps age should be an attribute on the customer which does seem to be a better place for this logic to reside.
On the other hand age isn’t used anywhere else in the application and we don’t need any of the other attributes of customer as I mentioned earlier.
I’m not sure what the ‘right’ solution is but there seem to be a few different potential approaches:
- Should age be an attribute of the customer? There could be a tiny type ‘Age’ which does the above logic.
- Should we have a role/interface ‘ICalculateAges’ which the customer implements?
- Should we just keep the ‘AgeCalculator’?
- Is there some other solution that I’m not seeing?!
–
In reality after we’d discussed all these alternatives it turned out that the requirement was slightly wrong and that we actually needed to send down the date of birth and not the age.
I still think it’s an interesting problem though and one that I’m bound to come across again!