Mark Needham

Thoughts on Software Development

Archive for June, 2009

F#: Useful for scripting

with one comment

We had the need to do a bit of scripting recently to change the names of the folders where we store our artifacts to signify which artifacts were created from our build’s production branch and which were generated from the main branch.

The problem we had was that we were ending up overwriting old artifacts from the main branch with the production branch’s artifacts so we wanted to fix this.

We had already manually changed some of the folder names to work with the changes that had already been made to our deployment script to read from the proposed new folder names.

We therefore had a folder structure that looked like this:

  • Artifacts
    • 12
    • 20
    • 45
    • 1000
    • 1001
    • Trunk-1050
    • Prod-23

All the folders with numbers over 1000 were from the trunk build since our production build is only up to around 50. The trouble in the renaming was around the lower numbers where some could be production or trunk.

I think this could have been calculated by checking the creation date of the folders but I decided it was quicker at the time to just scan through them and manually note which were of each type.

At the time we didn’t have F# installed so I wrote the script in Ruby but I decided to rewrite it later on in an F# script file and then use F# interactive to execute the script.

This is the script I’ve ended up with:

(rename.fsx)

open System
open System.IO
open System.Text.RegularExpressions
 
let get_position_of_last_folder (dir:string) = dir.LastIndexOf('\\') + 1
 
let get_last_folder_name (dir:string) = dir.Substring(get_position_of_last_folder dir)
let get_rest_of_dir_name (dir:string) = dir.Substring(0, (get_position_of_last_folder dir)-1)
 
let create_new_dir_name dir = 
    let prodVersions = seq { yield! [47..48]; yield! [37..45]; yield! [28..34]; } 
    let folderName = get_last_folder_name dir
    let isProdDir = prodVersions |> Seq.exists (fun item -> item = Int32.Parse(folderName))
 
    let build_dir_name branch = (get_rest_of_dir_name dir) + "\\" + branch + "-" + folderName
 
    if (isProdDir) then build_dir_name "Prod" else build_dir_name "Trunk"
 
let rename_directories =  Array.filter (fun dir -> Regex.IsMatch(get_last_folder_name dir , "^[0-9]") ) >> 
                          Array.iter (fun dir -> Directory.Move(dir, create_new_dir_name dir))
 
rename_directories <| Directory.GetDirectories("C:\\artifacts")

I don’t really like the ‘get_position_of_last_folder’ function but it helped to remove the duplication in the following two functions. Maybe there’s a better way to remove this duplication that I’m not aware of.

We can then execute this by using the following command (note I have added ‘C:\Program Files\FSharp-1.9.6.2\bin’ to the path):

fsi --exec --nologo rename.fsx

I wrote this script file in Visual Studio so that I could get all the Intellisense help that I need but it’s not part of any project – it stands alone!

I learnt about the possibility to do scripting in F# from a blog post by Chris Smith where he talks about some of the ways that he’s been able to use F# for scripting in his work.

I thought running an F# script would be significantly slower than running an equivalent Ruby one since the F# code needs to be compiled first but I didn’t notice that the F# script ran any slower than the Ruby one just from observation.

Written by Mark Needham

June 9th, 2009 at 11:29 pm

Posted in F#

Tagged with ,

Pair Programming: So you don’t want to do it…

with 7 comments

I’ve worked on several software development teams over the last few years – some that pair programmed all the time and some that didn’t – and one of the key things that I’ve noticed is that the level of collaboration on these teams was significantly higher when pair programming was being done on a regular basis.

The following are some of the observations I have noticed in teams which don’t pair program frequently.

A culture of silence

The amount of communication in a team which doesn’t pair is almost by that very definition going to be lower than in one that does – you don’t have that almost constant dialogue/discussion going on that you get when a pair are at the keyboard since everyone is head down coding in their own little world.

A consequence of this is that other types of communication are also reduced.

When a room is full of people pairing it’s not uncommon for someone to shout across the room for a colleague to come and help them and their pair with something. Since there is already a lot of communication going on this doesn’t feel that unusual and since other people in the room might hear us speaking we can also get the benefits of osmotic communication.

When we don’t have that it feels very awkward to do this and in fact often the person you call won’t actually hear you since their head is down and focused on the task their working on.

The inclination the next time you need help is to just work it out yourself, further harming team collaboration.

Code silos

Another consequence of having people working alone is that people become very specialised in certain areas of the code base and then when they either leave the project or are ill there isn’t anyone else who knows their area to work in that area of the code.

It becomes obvious that we have silos in the code base when code starts to be referred to as “X’s code’ or ‘X’s way of doing things”. Another danger sign is when there is a lot of talk about “handovers” – a sure sign that collaboration hasn’t been taking place.

One way to address this problem without pair programming is to mandate that code reviews must be done before code written alone is checked into the source control system.

The problem with this is that there isn’t really way to enforce it if people don’t want to have their code looked at by someone else beyond reverting their changes which seems perhaps over confrontational.

Repeated code

The tendency to end up with several people solving the same problem is one that rears itself quite frequently when people are working alone.

Since each developer doesn’t have as much visibility into what their colleagues have been working on – the sharing of knowledge that pair programming encourages is absent – we often end up with several sub par solutions to the same problem instead of collaborating to come up with one good solution.

Repeated code is perhaps the biggest waste of any developer’s time – we are adding no value at all by doing it and are creating confusion for our colleagues as they now don’t know which bit of code is the correct one to use.

In Summary

I’m sure there are ways to overcome these problems but I’ve never seen it done effectively without having people working together more closely.

I’d be interested in hearing ways that others have created collaborative teams without having developers collaborating closely by using a practice such as pair programming.

Written by Mark Needham

June 8th, 2009 at 5:05 pm

Posted in Pair Programming

Tagged with

Javascript: Using ‘replace’ to make a link clickable

with 2 comments

I’ve been doing a bit more work on my twitter application over the weekend – this time taking the tweets that I’ve stored in CouchDB and displaying them on a web page.

One of the problems I had is that the text of the tweets is just plain text so if there is a link in a tweet then when I display it on a web page it isn’t clickable since it isn’t enclosed by the ‘<a href”…”></a>’ tag.

Javascript has a ‘replace’ function which you can call to allow you to replace some characters in a string with some other characters.

What I actually wanted to do was surround some characters with the link tag but most of the examples I came across didn’t explain how to do this.

Luckily I came across a forum post from a few years ago which explained how to do it.

In this case then we would make use of a matching group on links to create a clickable link:

"Interesting post... Kanban &amp; estimates http://tinyurl.com/p58o3r".replace(/(http:\/\/\S+)/g, "<a href='$1'>$1</a>");

Which results in a tweet with a nice clickable link:

"Interesting post... Kanban &amp; estimates <a href='http://tinyurl.com/p58o3r'>http://tinyurl.com/p58o3r</a>"

Written by Mark Needham

June 8th, 2009 at 11:57 am

Posted in Javascript

Tagged with

F#: Explicit interface implementation

with one comment

I’ve been writing some code to map between CouchDB documents and F# objects and something which I re-learned while doing this is the way that interfaces work in F#.

In F# when you have a class which implements an interface that class makes use of explicit interface implementation.

This means that in order to access any members of the interface that the class implements you need to specifically refer to the interface by upcasting the value using the ‘:>’ operator.

Given the following interface and class definitions:

type CouchDBDocument =  
    abstract DocType : string
 
type UserDocument = 
    { UserName:string; FirstName:string; Surname:string }
    interface CouchDBDocument with member x.DocType = "User"

If we had the following value:

let mark = { UserName = "mneedham"; FirstName = "Mark"; Surname = "Needham" }

In order to access the ‘DocType’ member of ‘mark’ we would need to do the following:

(mark :> CouchDBDocument).DocType

Coming from the world of C# I had expected that it would be possible to define a value as being of type ‘CouchDBDocument’ and then pass in a value of UserDocument like this:

let mark : CouchDBDocument = { UserName = "mneedham"; FirstName = "Mark"; Surname = "Needham" };;

But that doesn’t actually compile:

error FS0191: The type CouchDBDocument does not contain a field UserName

It is possible to do this in C# as well although the implementation would be implicit in C# unless we explicitly declare it to be explicit like so:

public interface CouchDBDocument
{
    string DocType { get; }
}
 
public class UserDocument : CouchDBDocument
{
     string CouchDBDocument.DocType
    {
        get { return "User"; }
    }
}

To access the ‘DocType’ property in this case we would need to be explicitly referring to the ‘CouchDBDocument’:

CouchDBDocument mark = new UserDocument();
Console.WriteLine(mark.DocType);

Mauricio Scheffer has an interesting post where he talks about rewriting a piece of C# code in F# which required him to use interfaces in F# and Brian McNamara explains on hubfs why explicit interface implementation can actually be quite useful.

The Real World Functional Programming book also has a chapter which describes interfaces in C# and F# and the way they differ very clearly.

Written by Mark Needham

June 7th, 2009 at 8:19 am

Posted in F#

Tagged with ,

Coding: Why do we extract method?

with 9 comments

Ever since I’ve read Uncle Bob’s Clean Code book my approach to coding has been all about the ‘extract method‘ refactoring – I pretty much look to extract method as much as I can until I get to the point where extracting another method would result in me just describing the language semantics in the method name.

One of the approaches that I’ve come across with regards to doing this refactoring is that it’s only used when there is duplication of code and we want to reduce that duplication so that it’s all in one place and then call that method from two places.

While this is certainly a valuable reason for doing extracting method I think there are other reasons why we would want to do it more frequently than just this.

Expressing intent

One of the main reasons we design code in an object oriented way is that it often allows us to describe the intent of our code more clearly than we would be able to it we wrote it with a more procedural approach and I think the same thing applies when extracting methods inside a class.

Quite often when reading code we’re interested in knowing a bit more about a class than we can derive from its name but we’re not really that interested in all the low level details of its implementation.

If methods have been extracted abstracting all that detail away from us then we’re able to quickly glance at the class and fairly quickly work out what is going on and then move back to working out what we were actually doing in the first place.

It makes code easier to read

A consequence of extracting that detail away is that it makes the code easier to read because we don’t have to hold as much information about what is going on in our head at any one time.

The aim is to try and ensure that the chunks of code that we extract into a method are all at the same level of abstraction – Neal Ford refers to this as the Single Level of Abstraction Principle (SLAP) in The Productive Programmer.

We would therefore not have a chunk of code which described some business concept or rule mixed in with a bit of code that was interacting with the database as an extreme example.

I find myself most frequently extracting method when I come across several lines of code doing similar operations, the aim being that when we read the code we don’t need to care about each of the individual operations but just the fact that operations are being done.

It exposes semantic errors

One benefit which I hadn’t actually appreciated until recently is that extracting a method can actually help to identify areas of code which shouldn’t actually be where they are.

We were recently working on some code around starting up a Selenium session where the ‘ResetSeleniumSession’ method was doing the following:

public ISelenium ResetSeleniumSession()
{
	if(Selenium != null)
	{
		Selenium.Stop();
	}
	Selenium = new CustomSelenium(....)
	Selenium.Start()
	Selenium.Open(ApplicationRootUrl);
	Selenium.WindowMaximize();
}

We didn’t think those last two lines belonged in there so we extracted them out so that we could make sure that the opening of the selenium client was still being done in all the places that ResetSeleniumSession was being called:

public ISelenium ResetSeleniumSession()
{
	...
	Selenium = new CustomSelenium(....)
	Selenium.Start()
	LoadAndMaximise(ApplicationRootUrl);
}

Later on another colleague passed by and saw us looking at this method and pointed out that it was wrong that we were launching the client from inside this method and had probably been added into that method by mistake!

Maybe that code would have been spotted anyway but it had been like that for a while and I think extracting it out into its own method to make it more obvious was useful for exposing that.

In Summary

That’s all I can think of for the moment although I’m sure there are more reasons why we’d want to extract method.

From my experience extract method is the most useful refactoring that we can do and it can quickly make a bit of code that seems impossible to understand somewhat readable and it can help keep new code that we write easy for others to understand instead of becoming a legacy mess.

Written by Mark Needham

June 4th, 2009 at 8:30 pm

Posted in Coding

Tagged with , ,

Coding: Putting code where people can find it

with 10 comments

I’ve previously written about the builder pattern which I think is a very useful pattern for helping to setup data.

It allows us to setup custom data when we care about a specific piece of data in a test or just use default values if we’re not bothered about a piece of data but need it to be present for our test to execute successfully.

One problem that I noticed was that despite the fact we had builders for quite a number of the classes we were using in our tests, when new tests were being added test data was still being setup by directly using the classes instead of making use of the builders which had already done the hard work for you.

A colleague and I were pairing last week and I pointed out one of these areas and he suggested that we should try and introduce the builder pattern to try and solve it!

We actually didn’t have a builder for that particular piece of data yet but I pointed out several other builders we did have which he wasn’t aware actually existed.

Clearly I hadn’t done a very good job of communicating the existence of the builders but when discussing this we realised that the turn around time for checking whether or not a builder existed was actually not very quick at all.

  • Start writing test and realise that test data setup was a bit complicated
  • At best search for ‘ClassNameBuilder’ if you knew that was the naming convention for these builders
  • Create test data for the test by typing ‘new ClassNameBuilder()…’

We therefore came up with the idea of anchoring the builders to a common class which we called ‘GetBuilderFor’.

It is now possible to create test data by writing code like this:

var car = GetBuilderFor.Car().Year("2009").Make("Audi").Build();

The nice thing about this is that we now only have to type in ‘GetBuilderFor’ and then a ‘.’ and ReSharper will show us all the builders that are available to us. If there isn’t one then we can create it.

Communication wise we’ve both been mentioning this approach in our stand ups and to other people when we pair with them and hopefully this approach will stop the duplication of test data creation.

For those in the Java world Jay Fields wrote a cool post a few months ago where he describes a way to do a similar thing in Java. I think this is one place where having static imports makes the code read really fluently.

Written by Mark Needham

June 2nd, 2009 at 11:35 pm

Posted in Coding,Communication

Tagged with ,

F#: Tuples don’t seem to express intent well

without comments

Tuples are one of the data types that I learnt about at university but never actually got to use for anything until I started playing around with F# which has this type in the language.

A tuple describes an ordered group of values and in that sense is similar to a C# anonymous type except an anonymous type’s values are named whereas a tuple’s are not.

In F# we can create one by separating a sequence of values with a comma in a value assignment:

> let myTuple = "mark", 7;;
 
val myTuple : string * int

As we can see the type of myTuple is ‘string * int’ and there are some functions such as ‘fst’ and ‘snd’ which allow us to extract the individual values from the tuple.

They’re also quite nice to work with in terms of pattern matching which is why I decided to make use of a tuple in my twitter application to represent the running state of the retrieval of tweets.

type TwitterService() = 
        static member GetLatestTwitterStatuses(recordsToSearch) =    
            findStatuses(0L, 0, recordsToSearch)
    let rec findStatuses (args:int64 * int * int) =
        let findOldestStatus (statuses:seq<TwitterStatus>) = 
            statuses |> Seq.sortBy (fun eachStatus -> eachStatus.Id) |> Seq.hd
        match args with 
        | (_, numberProcessed, statusesToSearch) when numberProcessed >= statusesToSearch -> centralProcessor.Stop()
        | (lastId, numberProcessed, statusesToSearch) ->  
            let latestStatuses = getStatusesBefore lastId
            centralProcessor.Send(latestStatuses)
            findStatuses(findOldestStatus(latestStatuses).Id, numberProcessed + 20, statusesToSearch)

The pattern matching is evident here and has allowed me to easily separate each of the values and give it a meaningful name inside the findStatuses function.

The problem I had is that looking at the ‘GetLatestTwitterStatuses’ method after a few weeks of not working with this code I didn’t really have any idea what the first two 0′s being passed to ‘findStatuses’ mean – I’m not expressing intent very well at all.

I decided to refactor this code to make it a bit more explicit by introducing a type to describe the search parameters.

    type TwitterService() = 
            static member GetLatestTwitterStatuses(recordsToSearch) =    
                findStatuses(new TwitterBackwardsSearch(startingTweetId = 0L,tweetsSoFar = 0 , tweetsToTraverse =  recordsToSearch))
    type TwitterBackwardsSearch(startingTweetId:int64, tweetsSoFar:int, tweetsToTraverse:int) =
        member x.ShouldKeepSearching() = tweetsSoFar < tweetsToTraverse
        member x.LastId = startingTweetId
        member x.NextSearch(newStartingTweetId: int64) = 
            new TwitterBackwardsSearch( startingTweetId = newStartingTweetId, tweetsSoFar = tweetsSoFar+20, tweetsToTraverse = tweetsToTraverse)
 
 
    let rec findStatuses(twitterBackwardsSearch:TwitterBackwardsSearch) =
        if(twitterBackwardsSearch.ShouldKeepSearching()) then
            let findOldestStatus (statuses:seq<TwitterStatus>) = statuses |> Seq.sortBy (fun eachStatus -> eachStatus.Id) |> Seq.hd
            let latestStatuses = getStatusesBefore twitterBackwardsSearch.LastId
            centralProcessor.Send(latestStatuses)
            findStatuses(twitterBackwardsSearch.NextSearch(findOldestStatus(latestStatuses).Id))
        else
            centralProcessor.Stop()

There’s more code there than there was in the original solution but I think it is easier to work out what’s going on from reading that than from reading the original code because it’s more explicit.

Another advantage I found from doing this refactoring was that I could write tests describing whether or not we should keep on processing more tweets or not:

    [<Fact>]
    let should_not_keep_processing_if_number_processed_is_equal_or_higher_than_number_to_search () =
        let twitterBackwardsSearch = new TwitterBackwardsSearch(startingTweetId = 0L, tweetsSoFar = 20, tweetsToTraverse = 20)
        Assert.False(twitterBackwardsSearch.ShouldKeepSearching())    
 
    [<Fact>]
    let should_keep_processing_if_number_processed_is_less_than_number_to_search () =
        let twitterBackwardsSearch = new TwitterBackwardsSearch(startingTweetId = 0L, tweetsSoFar = 20, tweetsToTraverse = 40)
        Assert.True(twitterBackwardsSearch.ShouldKeepSearching())

I’ve definitely refactored the code into a more object oriented style although I don’t think that what I had before was necessarily the functional style – it felt more like a mix of different concerns in the same function.

I don’t think a tuple was the correct choice of data type for what I wanted to do although I could certainly see their value when doing mathematical calculations which require x and y values for example.

I’m intrigued to see what usages people will come up with for using tuples in C# 4.0 – I’m not really convinced that they are beneficial for use when describing most types although I can certainly see some value from the way we can make use of them in active patterns to do pattern matching against certain parts of an instance of a stronger type as Matt Podwysocki explains about half way down the page of his post.

Written by Mark Needham

June 2nd, 2009 at 10:01 pm

Posted in F#

Tagged with

VMware: Accessing host server

with 2 comments

I’ve been doing all my spare time .NET development from within VMWare for about the last year or so and now and then it’s quite useful to be able to access the host machine either to get some files from there or to access a server that’s running on the host.

The former problem is solved by going to ‘Virtual Machines -> Shared Folders’ and clicking on the + button on the bottom left of the menu to add a folder that you want to share.

This folder will be accessible by going to ‘My Network Places -> Entire Network -> VMWare Shared Folders -> .host -> Shared Folders’ from Windows Explorer or by typing ‘\\.host\Shared Folders’ into the Windows Explorer address bar.

The latter is something I’d not wanted to do until today when I wanted to access a CouchDB server I had running via CouchDBX (thanks to J Chris Anderson for the recommendation) from a .NET application that I was running inside VMWare.

From the host environment I can view all the databases in CouchDB by going to ‘http://127.0.0.1:5984/_utils’ but from VMWare I need to make use of the Gateway IP address which can be found by typing ‘ipconfig’ at the command prompt inside the VM.

The database listing is now available at ‘http://the.gateway.ip:5984/_utils’.

Written by Mark Needham

June 2nd, 2009 at 9:36 pm

Posted in Software Development

Tagged with