Mark Needham

Thoughts on Software Development

Archive for February, 2009

Encoding user entered data

without comments

I previously wrote about protecting websites from cross site scripting in the ASP.NET MVC framework by encoding user input when we are going to display it in the browser.

We can either choose to encode data like this or we can encode it straight away when we get it.

There did not seem to be a consensus on the best approach in a discussion on the ASP.NET forums but we believe it is far better to encode the data when it is outgoing rather than incoming.

If we encode the data when it comes in then we need to remember which data was pre-encoded and then un-encode it when it is going to be displayed in a non-HTML format.

If we had a user enter their name as “John O’Shea’ for example then we would end up storing the ‘ in the name as its HTML representation and might accidentally end up sending them correspondence referring to them as the wrong name.

Dave suggested ‘fixing it [encoding issue] as close to the problem as possible’ as a useful guideline to follow.

In this case our problem is that when the data we received from the user is displayed there could be some potentially dangerous code in there and we don’t want to allow it to be executed on our page.

We then work off the assumption that any web applications that use our data should know how to encode the data since that falls under their responsibility. Any other applications can use the data as is.

I drew this diagram to clear it up in my head:

dataEncoding.gif

Written by Mark Needham

February 15th, 2009 at 1:46 am

Coding: Assertions in constructors

with 8 comments

While browsing through the ASP.NET MVC source I noticed that they use an interesting pattern on the constructors to ensure that an exception will be thrown if an object is not instantiated correctly.

public ControllerContext(HttpContextBase httpContext, RouteData routeData, ControllerBase controller)
    : base(httpContext, routeData) {
    if (controller == null) {
        throw new ArgumentNullException("controller");
    }
    Controller = controller;
}

If you pass in a null Controller you shall go no further!

We have discussed the merits of putting assertions in constructors several times in our Domain Driven Design book club but without ever really coming to a conclusion with respect to whether or not it is a good pattern.

Spring is an example of a framework which does internal assertions on the values being passed to its objects and will throw assertions appropriately. There is even a custom Assert class to do so.

The benefit of this approach is that our code will fail early if there is a problem. The disadvantage is that we may end up putting Asserts all over our code to cater for every place where the input could be invalid. If we were to use the Spring Assert class we would then have a coupling to that in our production code.

The approach I am more familiar with is that we would validate any input from the UI or from systems that we don’t have control over, but apart from these boundary cases we would assume that our classes are good citizens and will call each other with valid data.

My understanding of the implicit contract of a constructor is that we should never be passing null values into them. If we are contemplating doing so then we are probably doing something wrong and either need to reconsider or create a new constructor and then default the value for the extra parameters in other constructors.

In a way I suppose this is a limitation of the Java/C# type system that you can’t prevent null values being passed into a constructor or that you can even have a null value in the first place.

Written by Mark Needham

February 14th, 2009 at 1:32 am

Posted in Coding

Tagged with

Ferengi Programmer and the Dreyfus Model

with 5 comments

I’ve been reading Jeff Atwood’s post regarding Joel’s comments on the podcast about Uncle Bob’s SOLID principles and what struck me as I read through his dislike of having too many rules and guidelines is that there is a misunderstanding of how we should use these rules and I think at the heart of this understanding the Dreyfus Model might clear this up.

To briefly recap the different levels of the Dreyfus Model (you can read more about this in Pragmatic Thinking and Learning)

  • Novice – Little or no experience in a skill area
  • Advanced Beginner – Break way from rules and try some stuff alone. Have difficulty troubleshooting
  • Competent – Develop conceptual models of problem domain & can problem solve
  • Proficient – Try to understand larger conceptual framework around skill. Can reflect on performance and do better next time
  • Expert – Vast range of experience to apply in just the right context

Given these levels at which any person may be at for a given set of skills, as Dhananjay Nene points out someone who is an expert in this area is not going to be referring to a checklist of design guidelines – they have already been past this stage and are at the stage where they can apply the ideas intuitively in the right context.

If you are at a stage where you still need to review your design with respect to the SOLID principles (or other appropriate design principles) – please take your time to apply the principles, learn if you are breaking them, understand the costs of doing so (I would recommend that involve a senior programmer / designer in the process) and then by all means make the best judgement. Principles distilled over time and experience should be adjusted preferably by those who understand the cost and implications of doing so, the rest should strive to reach that state first.

In which case the following statement on Jeff Atwood’s post makes perfect sense:

Like James Bach, I’ve found less and less use for rules in my career. Not because I’m a self-made genius who plays by my own rules, mind you, but because I value the skills, experience, and judgment of my team far more than any static set of rules.

But it shouldn’t be used as a stick to beat the principles because using them as a set of guidelines or rules is still very valuable for someone who is not so far up the Dreyfus Model as Jeff is for this skill.

I still think there is value in re-reading the principles even when you think you have got them ingrained in your mind and don’t need to keep on referring to them to write your code. I have often found that when I end up writing code which is difficult to work with it tends to be because I’ve broken one of the principles without realising that I was doing so.

Overall

To summarise I don’t see a problem in following principles/rules when you are new to a skill – in fact I think this is the best way to pick up good practices otherwise you are just guessing.

In order to progress up the Dreyfus Model we need to eventually start questioning the rules and working out when we should and should not apply them.

Leaving the last word to Mr Atwood…

Rules, guidelines, and principles are gems of distilled experience that should be studied and respected. But they’re never a substute for thinking critically about your work.

Written by Mark Needham

February 13th, 2009 at 12:01 am

Posted in Learning

Tagged with

ASP.NET MVC: Preventing XSS attacks

with 12 comments

XSS(Cross site scripting) attacks on websites seem to be quite popular these days but luckily if you’re working with the ASP.NET MVC framework Steve Sanderson has written a great post on how to protect yourself from this.

The solution Steve details works the opposite way to other solutions I have heard for this problem – we assume that everything that goes to the browser needs to be HTML encoded unless otherwise stated.

The benefit of this approach is that when we add new things to our page in the future we don’t need to worry that we might have inadvertently reintroduced a security problem.

Excluding aspx code from being encoded

After setting up the encoding we came up against the problem of how to exclude HTML that we are generating from our aspx templates that we don’t want HTML encoded.

This was a particular problem with our use of the HtmlHelper and UrlHelper provided with the framework.

Since the majority of the methods we use are extension methods on these classes we didn’t find it that easy to work out how to proxy those calls (we couldn’t just override the methods since they aren’t actually on HtmlHelper!) to our own RawHtmlHelper which could cast the result to RawHtml and therefore allow it to be rendered by the page.

The aim was to minimise the amount of things that we had to change. The worst case solution would be to have to go through all our aspx files and change every call to a Helper method to a new encoded version of the same method.

Luckily the new keyword comes to the rescue and allows us to hack around the problem.

We hooked into the page lifecycle to create a new version of the Html property which returned our RawHtmlHelper.

public class BaseViewPage<TModel> : ViewPage<TModel> where TModel = class
{
	public override void InitHelpers() 
	{
		base.Html = new RawHtmlHelper(..);
		base.Url = new RawUrlHelper(...);
	}
 
	public new RawHtmlHelper Html 
	{
		get { return (RawHtmlHelper) base.Html; }
	}
}
public class RawHtmlHelper : HtmlHelper 
{
	public RawHtml TextBox(string name, object value)
	{
		return (RawHtml) ((HtmlHelper)this).TextBox(name, value);
	}
}

In our aspx page we don’t need to make any changes to our code in order for it to now use the RawHtmlHelper method instead of the HtmlHelper method.

<%= Html.TextBox("name", "value") %>

The only other change was to make each of our aspx code behind classes inherit from BaseViewPage so that they receive the new Html property.

Overall

There is way too much casting for my liking in this solution but I’m not sure if a better solution exists other than creating our own version of the helper methods named ‘RawTextBox’, ‘RawRadioButton’ and so on.

The amount of work with that approach would be almost the same except if we have used the original HtmlHelper anywhere then we need to go and change all those too – exactly the problem we were trying to solve with the above solution.

Written by Mark Needham

February 12th, 2009 at 10:47 pm

Posted in .NET

Tagged with ,

Coding Dojo #9: Refactoring Isola

with 2 comments

Our latest coding dojo involved refactoring the code we wrote a couple of weeks ago for the board game Isola.

We started a repository on Bit Bucket to store our code from these sessions.

The Format

We used the Randori approach again with four people participating for the whole session.

What We Learnt

  • Last time we had spent most of our time purely making the code functional so all the objects were completely mutable. We decided to start by removing that mutability to allow us to add additional functionality more easily. We came up with a rough idea of where we were aiming for and then started refactoring towards that.
  • The tests were really useful for this as they provided feedback after every small refactoring with respect to whether or not it had broken the original functionality. In some cases we had to redesign the tests a bit to cater for the fact that we were no longer mutating the original Isola class so some our assertions were incorrect.
  • It was quite surprising to me how much time it took to refactor the code. On the first session we didn’t spend any time refactoring the code so it made it difficult to change bits of the code without other bits being affected, several times leading into a bit of a yak shaving exercise. Luckily we backed out of these refactorings without spending too much time on them. It pretty much drilled into us how we shouldn’t forget the Refactor part of ‘Red, Green, Refactor’ or we will suffer!
  • While trying to implement what I have previously heard referred to as a slug but which may in fact be a variance of the flyweight pattern we realised that our IsolaPlayer object was mutable meaning that our tests were now dependent on each other! This was the code that led us into trouble:
    public class IsolaPlayer {
    	public IsolaPlayer playerOne = new IsolaPlayer("-1");
    	public IsolaPlayer playerTwo = new IsolaPlayer("-2");
     
        private final String playerRepresentation;private final String HOME_POSITION = "[]";
        private String stomach;
     
        public IsolaPlayer(String playerRepresentation) {
            this.playerRepresentation = playerRepresentation;
            this.stomach = HOME_POSITION;
        }
     
        public String toBoardRepresentation() {
            return playerRepresentation;
        }
     
        public String poop() {
            return stomach;
        }
     
        public void eat(String boardPosition) {
            stomach = boardPosition;
        }
    }

    As you can see the class is mutable but being referenced by a static instance. We quickly backed that change out and refactored to that pattern later on when IsolaPlayer was immutable.

  • We used a combination of the techniques from Working Effectively With Legacy Code to allow us to extract an IsolaBoard from the original Isola class. IsolaBoard was kept completely inside Isola while we refactored the code so that it could exist on its own. This approach allowed us to continually validate that we hadn’t broken any tests while we gradually put more and more of the board logic into the appropriate class.
  • When we write mutable code the order of operations makes a big difference and the application doesn’t work correctly if we change the order. We learnt this with an early refactoring to inline some variables – an innocuous enough change, but one which led to 50% of our tests breaking.
  • We had an interesting discussion around how we can have code which is mutable but in a non dangerous way. On our way to creating value objects at one stage we had the code in a state where we were returning a new Isola object evey time but we were passing the same instance of our coveredSquares queue around. The queue was mutable meaning that we had references between difference instances of Isola to the same queue. In this case we were throwing away old Isolas but this might have been a problem if we had multiple games running at the same time. The next step was to refactor Isola to be completely immutable.

For next time

  • Since we spent the whole of this weeks session refactoring the code the plan for next week is to add some more functionality to the application. There is still quite a bit of logic left before we have a working game.

Written by Mark Needham

February 12th, 2009 at 9:46 pm

Posted in Coding Dojo

Tagged with

C#: Properties vs Methods

with 8 comments

I was browsing through our tests today and noticed a test along these lines (simplified for example purposes):

[Test, ExpectedException(typeof(Exception))]
public void ShouldThrowExceptionIfNoBarSet()
{
    var bar = new Foo(null).Bar;
}
public class Foo
{
	private readonly string bar;
 
	public Foo(string bar)
	{
    		this.bar = bar;
	}
 
	public string Bar
	{
    		get
    		{
        		if (bar == null)
        		{
            		throw new Exception("No bar");
        		}
        		return bar;
    		}
	}
}

What I found strange here is that ‘bar’ is never used and Resharper points out as much. If we try to remove it, however, the code no longer compiles.

[Test, ExpectedException(typeof(Exception))]
public void ShouldThrowExceptionIfNoBarSet()
{
    new Foo(null).Bar;
}
Only call, assignment, increment, decrement, and new object expressions can be used as a statement.

We therefore need to assign the call to Bar to a variable even though we never intend to reference it.

This problem wouldn’t happen if Bar was a method instead.

[Test, ExpectedException(typeof(Exception))]
public void ShouldThrowExceptionIfNoBarSet()
{
    new Foo(null).Bar();
}
public string Bar()
{
    if (bar == null)
    {
        throw new Exception("No bar");
    }
    return bar;
}

Which got me thinking how do we know when to use a method and when to use a property?

Clearly if there are parameters to be passed in then we have no choice but to use a method since we can’t call properties with parameters.

If we are simply returning data values then we might as well just use properties since this is what they were made for. We can even make use of automated properties to smooth the process.

Which leaves the situation where there is some logic that needs to be calculated, no outside data is needed to do this calculation and a result is returned to the client.

Probably because of the Java influence my preference would be to do this using a method but I don’t really have any other justification for not using a property to achieve the same thing.

Discussing this briefly with Dave we thought maybe if a property is used then the execution needs to be idempotent (i.e. return the same result every time) although the argument could be made that well written methods should also be side effect free. Eric Evans’ advocates writing side effect free functions in Domain Driven Design for example.

Does anyone have any compelling reasons why or why not to use one or the other?

Written by Mark Needham

February 11th, 2009 at 11:20 am

Posted in .NET

Tagged with ,

Agile: Re-estimating cards

with one comment

Chris Johnston has another interesting post in which he writes about the practice of re-estimating cards after they have been completed.

I think this somewhat misses the point that the estimate is indeed supposed to be an estimate. It might turn out to be too optimistic or too pessimistic, the theory being that overall we will end up with a reasonable balance that will allow us to make a prediction on how much work we believe we can complete in a certain time period.

I’ve always seen estimates of story cards to be a relative thing – on all the teams I’ve worked on after initial cards have been estimated we’ve looked at the difficulty of upcoming cards and tried to score them relative to the ones already estimated.

Re-estimating cards (presumably) individually means that you lose the benefits of this approach and end up with a fairly meaningless value.

We are also ignoring the inherent uncertainty involved in estimating if we do so afterwards and I don’t really see it helping us that much for future predictions since there is still going to be uncertainty with regards to the implementation of future cards no matter what we try to do.

If data needs to be collected after a card has been played then I would say the time it took to complete a card might be a more useful metric although there are more than likely going to be some cards that take longer than expected and some that take less than expected, so I’m not sure what extra value this data would provide.

As Chris points out if there are times, when we come to play a card or just after we start work on it, when we realise that some of our assumptions are incorrect therefore meaning that the estimate is inaccurate.

In this case I think it’s reasonable to talk through the new assumptions and re-estimate the card so that we have a more accurate estimation of its difficulty.

When trying to work out how much work we can do in a certain time box, averaging the data we have from the original estimates is likely to provide the most accurate prediction in my opinion.

Apart from times when we have story cards overflowing from one iteration to the next, meaning that we end up with one high scoring and then one low scoring iteration, I have found on my projects that the amount of work points wise doesn’t tend to vary very much which lends credence to the theory that the estimates eventually balance out.

Written by Mark Needham

February 11th, 2009 at 7:25 am

Posted in Agile

Tagged with

Agile: What is it?

without comments

My colleague Chris Johnston wrote recently about his experiences in agile software development, posing some questions that he has.

Specifically:

  • Why comments are evil?
  • Why design is evil?
  • Why must you pair all the time?
  • Why Agile principles become Agile rules?

Now I’m assuming that most (if not all) of Chris’ experiences with agile have been at ThoughtWorks, in which case the mix of agile we use on our projects tends to be a combination of Scrum and Extreme Programming.

The last question seems the most logical to start with:

Why is it that for many people and many shops, Agile principles become rules and the only way to do things? I have met people and worked in shops were they have Agile all figured out and any deviation is heresy. If you are doing Agile this one way, then you are unprofessional and a very bad programming.

I’m with Chris on this one – as far as I understand agile is all about adapting a set of principles to the environment that we find ourself in. It’s unlikely that the exact same approach will work in two different projects.

I’ve worked on 6 or 7 projects at ThoughtWorks and each has taken a slightly different approach to agile, varying the amount of process we use, the amount and type of tracking done, the approach to gathering requirements, and so on.

Our client’s wishes, the availability of the business, the number of integration points, the size of the team, are all things that will influence the approach that we end up taking.

Why must you pair all the time?

Extreme Programming (XP) defines a set of practices that we can use on an agile project although there are certainly other approaches that are equally valid.

Pair Programming is one of the XP practices rather than something you need to do in order to be ‘agile’.

I’ve worked on teams where we haven’t paired, teams where some people on the team pair and teams where everyone does pair all the time but I wouldn’t say as a rule people have to pair all the time.

When a team is responsible for the delivery of some software pair programming is the best approach that I have worked with for ensuring that people on the team have a good understanding of all areas of the code base.

While someone working alone in ‘flow’ might be able to write some brilliant code, noone else on the team knows anything about that code meaning that the knowledge of the implementation takes longer to work its way to the rest of the team.

Why design is evil?

I don’t think design is evil and again all the teams I’ve worked on have done some design work up front, it would be fairly foolish not to.

Finding the balance between over architecting up front and doing just enough design to allow us to keep moving forward is the key here.

Why comments are evil?

I’ve previously written about the dangers of commenting code so I don’t want to revisit that too much, suffice to say that it it usually possible to describe what you want to in a comment using an appropriate method name instead.

My experience with comments is that they go out of date more quickly than code does and therefore can end up being misleading to someone reading the code.

The only real time I have seen a valid use for them is when used to describe a problem the code is getting around where the explanation is fairly verbose and can’t be expressed easily in code – for example, when using one of the jQuery validators there was some strange behaviour with regards to the behaviour of the plugin when different options were set.

For a while we didn’t put a comment and others in the team would change the code not realising that it would cause a problem.

With the comment in place people have a better context before they make a change.

Written by Mark Needham

February 9th, 2009 at 5:06 pm

Posted in Agile

Tagged with

OOP: What does an object’s responsibility entail?

with 9 comments

One of the interesting discussions I’ve been having recently with some colleagues is around where the responsibility lies for describing the representation of an object when it is to be used in another bounded context – e.g. on the user interface or in a call to another system.

I believe that an object should be responsible for deciding how its data is used rather than having another object reach into it, retrieve its data and then decide what to do with it.

Therefore if we had an object Foo whose data was needed for a service call to another system my favoured approach would be for Foo to populate the FooMessage with the required data.

public class Foo 
{
	private string bar;
 
	public Foo(string bar)
	{
		this.bar = bar;
	}
 
	public void Populate(IFooMessage fooMessage) 
	{
		fooMessage.Bar = bar;
	}
}
public interface IFooMessage 
{
	string Bar { get; set; }
}
public class FooMessage : IFooMessage
{
	public string Bar { get; set; }
}

The advantage of this approach is that Foo has kept control over its internal data, encapsulation has been preserved. Although we could just expose ‘Bar’ and make it possible to build the FooMessage from somewhere else, this violates Tell Don’t Ask and opens up the possibility that Foo’s internal data could be used elsewhere in the system outside of its control.

The question to be answered is whether or not it should be Foo’s responsibility to generate a representation of itself. In discussion about this it was suggested that Foo has more than one responsibility if we design the class as I have.

Uncle Bob’s definition of the Single Responsibility Principle (SRP) in Agile Software Development: Principles, Patterns and Principles describes it thus:

A class should have only one reason to change.

In this example Foo would need to change if there was a change to the way that it needed to be represented as an IFooMessage as well as for other changes not related to messaging. Foo is not dependent on a concrete class though, only an interface definition, so the coupling isn’t as tight as it might be.

It might just be my interpretation of SRP but it seems like there is a trade off to be made between ensuring encapsulation and SRP in this instance.

The other way to create a FooMessage is to create a mapper class which takes the data out of the Foo class and then creates the FooMessage for us.

We might end up with something like this:

public class FooMapper 
{
	public FooMessage ConvertToMessage(Foo foo) 
	{
		return new FooMessage { Bar = foo.Bar; }
	}
}

Whereby Foo needs to expose Bar as a property in order to allow this mapping to be done.

This is somewhat similar to the way that NHibernate handles the persistence and loading of objects for us – unless we use the Active Record pattern an object is not responsible for saving/loading itself.

What I don’t like about this approach is that it doesn’t strike me as being particularly object oriented – in fact the mapper class would be an example of an agent noun class.

A cleaner solution which allows us to keep encapsulation in tact would be to make use of reflection to build the FooMessage from our mapper, probably by creating private properties on Foo as they are easier to reflect on than fields. The downside to the reflection approach is that code written in this way is probably more difficult to understand.

I know I’ve oversimplified the problem with my example but ignoring that, where should this type of code go or do we choose to make a trade off between the two approaches and pick which one best suits our situation?

Written by Mark Needham

February 9th, 2009 at 4:52 pm

Posted in OOP

Tagged with

Quality is what I work for

with 3 comments

I’ve been reading the transcript of Joel Spolsky/Jeff Atwood’s podcast discussion on TDD/Quality and related posts on the subject by Uncle Bob and Ron Jeffries and while I guess it’s fairly inevitable that I’m likely to side with the latter two, what I’ve realised is that I get the greatest enjoyment from my job when we are writing high quality software.

Certainly delivering value to customers in a timely manner is important but if we’re not producing something that we’re proud to have written then I think we’re doing ourselves and our customer a disservice.

Taking shortcuts on quality in the short run might seem to give some immediate benefits but they don’t tend to last very long and nearly every time I’ve taken a short cut on something I’ve come to regret it.

Given this, however, what is it that high quality software means to me?

  • Code that adheres to the principles of object orientation so that when we have to make changes these can all be done in one place and without having to keep the context of loads of different classes in our head in order to make those changes. This means creating lots of small objects with single responsibilities and keeping behaviour and data in the same place just for starters.
  • Using the language of the domain in our code so that we don’t have to translate between the language the business use and the language that exists in our code. It’s amazing what a difference this makes and the amount of time we save by having everyone speak the same language.
  • Taking the time to think about how the problem we are currently working on fits into the bigger picture and coming up with a solution that not only solves the problems but doesn’t have a negative impact in other places. There is nearly always more than one solution to every problem and taking the time to analyse each of the options and then choose the most appropriate one tends to result in a better outcome.
  • Ensuring that we refactor our code into a state that someone else on the team can easily understand what is going on without having to spend a lot of time delving into implementation details. Using extract method religiously is the best way I know for achieving this. Writing code that just works is not enough.
  • Writing code that is well tested so that we have a degree of confidence that it is actually going to work when we put it in production. Zero defects is the aim, but if at the least we can ensure that bugs aren’t repeated then we are doing reasonably well.
  • Favouring a simple solution over a complex or clever one that is more difficult to understand. Writing simple code is much more difficult but when we aim for this the job of reading code becomes much easier and our team mates can understand what we’re doing much more easily.
  • Treating our tests or specifications with as much respect as we do the production code. If they are to act as part of the documentation of the system then it is our duty to ensure they are in good shape.

I’m sure there are more ways to assess quality but those are the ones that come immediately to mind.

If we are achieving the above then I tend to feel a sense of satisfaction in the work I’m doing. If not then I start to wonder what we’re doing wrong and try to find ways to get closer to my ideals where possible.

In addition to the above, I think we should be constantly looking for ways to improve the work we are doing as well as our approach.

Never settle.

Written by Mark Needham

February 9th, 2009 at 4:51 pm