Mark Needham

Thoughts on Software Development

Archive for September, 2011

Scala: Replacing a trait with a fake one for testing

without comments

We recently wanted to replace a trait mixed into one of our classes with a fake version to make it easier to test but forgot how exactly to do that!

The class is roughly like this:

trait Foo { def foo : String = "real foo" } 
class Mark extends Foo {}

We originally tried to replace it like this:

trait BrokenFakeFoo { def foo : String = "broken fake foo" }
val m = new Mark with BrokenFakeFoo
error: overriding method foo in trait Foo of type => String;
 method foo in trait BrokenFakeFoo of type => String needs `override' modifier
       val m = new Mark with BrokenFakeFoo

If m compiled it would have two versions of foo but it wouldn’t know which one to use, hence the error message.

Attempt two was this:

trait BrokenFakeFoo { override def foo : String = "broken fake foo" }
error: method foo overrides nothing
       trait BrokenFakeFoo { override def foo : String = "broken fake foo" }

As Uday pointed out, what we actually need to do is make our fake trait extend the original one and then override the method.

trait FakeFoo extends Foo { override def foo : String = "fake foo" }
val m = new Mark with FakeFoo
m.foo
> res5: String = fake foo

Since FakeFoo is the right most of the traits mixed into Mark its foo method will be used over the Foo one mixed into Mark on its class definition.

Written by Mark Needham

September 25th, 2011 at 10:24 am

Posted in Scala

Tagged with

jQuery: Collecting the results from a collection of asynchronous requests

with 5 comments

Liz and I recently spent some time building a pair stair to show how long ago people had paired with each other and one of the things we had to do was make AJAX requests to get the pairing data for each person and then collate it all to build the stair.

Pair stair

The original attempt to do this looked a bit like this:

var people = ["Marc", "Liz", "Ken", "Duncan", "Uday", "Mark", "Charles"];
 
var grid = [];
$.each(people, function(index, person) {
  $.getJSON('/git/pairs/' + person, function(data) {
    // parse data and create somethingCool
    grid.push(somethingCool);
  });  
});
 
// do something with grid

When we try to do something with grid it is of course empty because we’ve attempted to access it before all of the callbacks (which populate it) have returned.

Pedro Teixeira has a nice blog post which explains how to solve this problem in node.js and we can use the same pattern here.

We need to write our own looping mechanism which is able to determine when the last callback has returned.

This is done by creating a copy of the people array and then manually iterating through it using shift.

var people = ["Marc", "Liz", "Ken", "Duncan", "Uday", "Mark", "Charles"];
var peopleCopy = people.slice(0), grid = [];
(function getPairs() {
  var person = peopleCopy.shift();
 
  if(peopleCopy.length == 0) {
    // do something with grid
  } else {
    $.getJSON("/git/pairs" + person, function(data) {
      // parse data and create somethingCool
      grid.push(somethingCool);
      getPairs();		
    })						
  }
})();

I tried to extract the asynchronous looping and ended up with the following function:

function asyncLoop(collection, seedResult, loopFn, completionFn) {
  var copy = collection.slice(0);
  (function loop() {
    var item = copy.shift();
 
    if(copy.length == 0) {
      completionFn(seedResult);
    } else {
      loopFn(item, seedResult, loop);
    }	
  })();	
}

Which could be called like this:

var people = ["Marc", "Liz", "Ken", "Duncan", "Uday", "Mark", "Charles"];
asyncLoop(people, [], function(name, grid, callBackFn) {
  // parse data and create something cool
  grid.push(somethingCool);
  callBackFn();
}, function(grid) {
  // do something with grid
});

I’m not sure that it reads that much clearer but it does push some of the boiler plate code away.

Written by Mark Needham

September 25th, 2011 at 9:26 am

Posted in Javascript,jQuery

Tagged with ,

Retrospectives: Getting overly focused on actions

with 2 comments

I’ve attended a lot of different retrospectives over the last few years and one thing that seems to happen quite frequently is that a problem will be raised and there will become a massive urgency to find an action to match with that problem.

As a result of this we don’t tend to go very deeply into working out why that problem happened in the first place and how we can stop it happening in the first place.

Any discussion tends to be quite shallow and doesn’t delve very far beyond the surface of the problem.

I’ve noticed that this tends to happen more when there are a lot of people in the retrospective and there’s a desire not to ‘waste’ everyone’s time which is understandable to some extent.

We recently had an iteration where there were a lot of stories going back and forth between the developers and testers which was leading to a lot of context switching for some developers.

Since it had felt very disruptive we tried to find some way of deciding when we should or shouldn’t context switch from the current story to fix bugs on earlier stories.

In hindsight it would have been more interesting to look at why that problem existed in the first place rather than directly addressing the problem.

In this case, as my colleague Chris pointed out, it might make more sense for a developer (pair) to go and work with a tester on the story until it was ready to be signed off rather than switching back and forth.

I’ve read about other retrospective formats such as the ‘five whys‘ which might help a team to dig deeper into the problems they’re facing but I’m curious whether it’d make sense to follow such a format with over 30 people attending.

We’d need to pick a sufficiently general problem to analyse so that everyone remained engaged.

I’d be curious whether anyone else has made a similar observation and how they made their retrospectives more effective.

Written by Mark Needham

September 24th, 2011 at 6:56 am

Posted in Agile

Tagged with

node.js: child_process.exec not returning all results

with one comment

I’ve been playing around with some node.js code to get each of the commits from our git repository but noticed that it didn’t seem to be returning me all the results.

I had the following code:

var exec = require('child_process').exec;
 
var gitRepository = '/some/local/path';
 
exec('cd ' + gitRepository + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw ', function(error, stdout, stderror) {
  var commits = stdout.split("\n");
 
  // do some stuff with commits
});

We have around 2000 commits in the repository but I was only getting back 1600 of them when I checked the length of commits.

Eventually I decided to print out what was in error and got the following message:

error: Error: maxBuffer exceeded.

Going back to the documentation revealed my mistake:

maxBuffer specifies the largest amount of data allowed on stdout or stderr – if this value is exceeded then the child process is killed.

The default options are

{ encoding: ‘utf8′,
timeout: 0,
maxBuffer: 200*1024,
killSignal: ‘SIGTERM’,
cwd: null,
env: null }

The limit is 2048000 which is around about the number of bytes being returned when I get to 1600 commits.

Changing the code to increase the buffer sorts it out:

var exec = require('child_process').exec;
 
var gitRepository = '/some/local/path';
 
exec('cd ' + gitRepository + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw ', {maxBuffer: 500*1024}, function(error, stdout, stderror) {
  var commits = stdout.split("\n");
 
  // do some stuff with commits
});

Written by Mark Needham

September 22nd, 2011 at 7:55 pm

Posted in Javascript

Tagged with ,

The ‘window fixing’ wall

with 2 comments

On my current project we have a wall where we keep track of ‘window fixing’ tasks – things that people want to fix in the code base but chose to defer until a later date.

Every now and then we take what’s on the wall and prioritise it according to Fabio Pereira’s effort/pain matrix so that we know which clean up tasks will provide the greatest value to the team.

While I think it’s a nice way of getting a team understanding of technical debt I think it can lead to a couple of problems which come with most attempts at group responsibility for something.

By writing the task up on the wall we’ve effectively pushed the responsibility for keeping the code clean away from us and onto the ‘team’.

It also seems to make it more acceptable to make a mess in the code because we’ve acknowledged that we’ve done that and either us or a team mate will fix it later.

In a way I suppose it’s good that people are at least conscious that they’re taking short cuts at times and we have a reasonable log of where those short cuts have been taken.

On the other hand, from my experience, when people are really motivated to fix a piece of code then they’ll find the time/way to do that whether or not it’s written up on the wall.

I think this is also a good thing even though the refactoring won’t have been prioritised by the rest of the team.

Sometimes it’s easier to go and fix something when you know what needs doing rather than deferring it and having to explain the problem to someone else later.

In summary I think the centralised wall is a good idea but not a complete replacement for people being diligent and taking care of the code base themselves.

Written by Mark Needham

September 20th, 2011 at 6:49 am

Scala: for comprehensions with Options

with one comment

I’ve generally avoided using for expressions in Scala because the keyword reminds me of for loops in Java/C# and I want to learn to program in a less imperative way.

After working with my colleague Mushtaq I realised that in some cases using for comprehensions can lead to much more readable code.

An interesting use case where this is the case is when we want to create an object from a bunch of parameters that may or may not be set. i.e. a bunch of options.

For example we might take some input from the user where they have to enter their name but could choose to leave the field blank:

val maybeFirstName : Option[String] = Some("Mark")
val maybeSurname : Option[String] = None

We only want to create a Person if they have provided both names.

The for comprehension works quite well in allowing us to do this:

case class Person(firstName:String, surname:String)
scala> for { firstName <- maybeFirstName; surname <- maybeSurname } yield Person(firstName, surname)
res27: Option[Person] = None

If we set the surname to have a value:

val maybeSurname : Option[String] = Some("Needham")

Running the same for comprehension will yield a Person

scala> for { firstName <- maybeFirstName; surname <- maybeSurname } yield Person(firstName, surname)
res29: Option[Person] = Some(Person(Mark,Needham))

From what I understand when we have multiple values assigned using ‘<-' inside a for comprehension, each value will have flatMap called on it except for the last one which will have map called instead.

The equivalent code if we didn’t use a for comprehension would therefore look like this:

scala> maybeFirstName.flatMap { firstName => maybeSurname.map { surname => Person(firstName, surname) } } 
res43: Option[Person] = Some(Person(Mark,Needham))

For me the for comprehension expresses intent much better and it seems to excel even more as we add more values to the comprehension.

Written by Mark Needham

September 15th, 2011 at 10:21 pm

Posted in Scala

Tagged with

Javascript: Internet Explorer 8 – trim() leads to ‘Object doesn’t support this property or method’ error

with 2 comments

We make use of the Javascript trim() function in our application but didn’t realise that it isn’t implemented by Internet Explorer until version 9.

This led to the following error on IE8 when we used it:

Message: Object doesn’t support this property or method
Line: 18
Char: 13
Code: 0
URI: http://our.app/file.js

There’s a stackoverflow thread suggesting some different ways of implementing your own ‘trim()’ method but since we’re using jQuery already we decided to just use the ‘$.trim()’ function from there.

Therefore:

var cleaned = ourString.trim();

becomes:

var cleaned = $.trim(ourString);

I’m sure I must have come across this before but I don’t remember when!

Written by Mark Needham

September 13th, 2011 at 1:33 pm

Posted in Javascript

Tagged with

gawk: Getting story numbers from git commit messages

with 2 comments

As I mentioned in my previous post I’ve been writing a little application to create graphs based on our git repository history and in one of them we wanted to try and create a graph showing which people had been working on which stories.

I needed a way to extract a story number from the git commit message and then store them all in a text file.

A typical commit with a story number in might look like this:

Mark/Uday #689 some awesome scala refactoring

I couldn’t think of an easy way to do this with my current knowledge of sed or the Mac version of awk but the match function of gawk (GNU awk) makes this really easy.

match(string, regexp [, array])

Search string for the longest, leftmost substring matched by the regular expression, regexp and return the character position, or index, at which that substring begins (one, if it starts at the beginning of string). If no match is found, return zero.

If array is present, it is cleared, and then the zeroth element of array is set to the entire portion of string matched by regexp.

The array argument is what I needed and it’s only available as a gawk extension according to the documentation.

I ended up with the following command to strip the story numbers:

git log --no-merges --pretty="format:%s" | 
gawk '{ match($0, /#([0-9]+)/, arr); if(arr[1] != "") print arr[1] }'

I had to install gawk using ports on my Mac but on Fedora the default installation of awk is gawk.

Written by Mark Needham

September 12th, 2011 at 7:05 am

Learning node.js: Step

with 2 comments

I’ve been playing around with node.js to generate some graphs from our git repository which effectively meant chaining together a bunch of shell commands to give me the repository data in the format I wanted.

I was able to do this by making use of child_process which comes with the core library.

The first version looked like this:

var exec = require('child_process').exec, _ = require("underscore");
...
function parseCommitsFromRepository(fn) {
  var gitRepository = "/tmp/core";
  var gitPlayArea = "/tmp/" + new Date().getTime();
 
  exec('cd ' + gitRepository + ' && git reset HEAD', function() {
    exec('git clone ' + gitRepository + ' ' + gitPlayArea, function() {
      exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', function(blank, gitEntries) {
        var commits = _(gitEntries.split("\n")).chain()
                        .filter(function(item) { return item != ""; })
                        .map(function(item) { return item.split("|") })
                        .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; })
                        .map(function(theSplit) {  
                          var date = new Date(theSplit[1].trim().split(" ")[0]*1000);
                          return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; })
                        .value();			
        fn(commits);
      });		
    });
  });
}

node.js has an asynchronous programming model so the majority of the time we have to pass callbacks to other functions which get called when the asynchronous computation has completed.

In this case there’s an order dependency in the parseCommitsFromRepository function such that we need to nest the second call to exec inside the callback from the first call.

i.e. we don’t want to get the log of the repository before we’ve cloned the repository to the location that we’re trying to get that log from.

As you create more and more order dependencies between asynchronous functions the nesting becomes greater and the code moves more and more to the right hand side of the screen.

I came across the Step library which allows you to stack up functions and have the results from each one get passed on to the next.

I decided to try it in my code and it ended up looking like this:

function parseCommitsFromRepository(fn) {	
  var gitRepository = "/tmp/core";
  var gitPlayArea = "/tmp/" + new Date().getTime();	
  Step(
    function getRepositoryUpToDate() { exec('cd ' + gitRepository + ' && git reset HEAD', this); },
    function cloneRepository()       { exec('git clone ' + gitRepository + ' ' + gitPlayArea, this); },
    function getGitEntries()         { exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', this); },
    function handleResponse(blank, gitEntries) {
      var commits = _(gitEntries.split("\n")).chain()
                      .filter(function(item) { return item != ""; })
                      .map(function(item) { return item.split("|") })
                      .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; })
                      .map(function(theSplit) {  
                        var date = new Date(theSplit[1].trim().split(" ")[0]*1000);
                        return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; })
                      .value();			
      fn(commits);
    }
  );	
}

An interesting side effect of using this approach is that we can describe what each exec call is doing in the name of the function that executes it.

Another neat thing about this library is that I can easily wrap those functions inside a logging function if I want to see on the console where the process has got up to:

function log(message, fn) {
  return function logMe() {
    console.log(new Date().toString() + ": " + message);
     fn.apply(this, arguments);
  }
}
function parseCommitsFromRepository(fn) {	
  var gitRepository = "/tmp/core";
  var gitPlayArea = "/tmp/" + new Date().getTime();	
  Step(
    log("Resetting repository", function getRepositoryUpToDate() { exec('cd ' + gitRepository + ' && git reset HEAD', this); }),
    log("Cloning repository", function cloneRepository()         { exec('git clone ' + gitRepository + ' ' + gitPlayArea, this); }),
    log("Getting log", function getGitEntries()                  { exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', this); }),
    log("Processing log", function handleResponse(blank, gitEntries) {
      var commits = _(gitEntries.split("\n")).chain()
                      .filter(function(item) { return item != ""; })
                      .map(function(item) { return item.split("|") })
                      .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; })
                      .map(function(theSplit) {  
                        var date = new Date(theSplit[1].trim().split(" ")[0]*1000);
                        return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; })
                      .value();			
      fn(commits);
    })
  );	
}

I then get this output when executing the function:

Sun Sep 11 2011 23:33:09 GMT+0100 (BST): Resetting repository
Sun Sep 11 2011 23:33:11 GMT+0100 (BST): Cloning repository
Sun Sep 11 2011 23:33:24 GMT+0100 (BST): Getting log
Sun Sep 11 2011 23:33:24 GMT+0100 (BST): Processing log

There are more cool ways to use the Step library on the github page – what I’ve described here is only a very simple use case.

Written by Mark Needham

September 11th, 2011 at 10:37 pm

Posted in Javascript

Tagged with ,

Learning Regular Expressions: Non capturing match

with 5 comments

I’ve been working my way slowly through the O’Reilly ‘Mastering Regular Expressions‘ book and recently read about the non capturing match operator which came in useful for some Git log parsing I’ve been doing.

On the project I’m working on we all commit as the same user and then put our names at the beginning of the commit message.

We wanted to try and find out the statistics of who’d been pairing with each other and therefore needed to extract the pairs from commits.

Unfortunately everyone writes their names in a slightly different way so the regular expression which I used to parse each commit needed to try and handle that.

For example these are some of the ways that commit messages start:

Uday/Charles #67 did some stuff
mark,suzuki more stuff
pat, tom: very important stuff
Uday:Marc #87 stuff

The separator between the names is different in each case but in the majority of cases can be satisfied by the following regular expression:

([\/,][ ]?|:)

It’s either:

  • A forward slash or comma followed by an optional space
  • A colon

Since I want to express the fact that the separator can be one thing or the other I need to group those two things together in parentheses.

Unfortunately that means that the separator will be included in the array of captures that we have when parsing the commit.

I only wanted to have the names of the two people included in that array.

The non capturing match operator ‘(?:’ allows us to match against the expected separator without actually capturing it:

(?:[\/,][ ]?|:)

That regular expression is part of a much larger/probably over complicated one which also helps to capture the names of the people pairing:

var pairRegex = /^\[?([\w-]+)[ ]?[^\/, ]*(?:[\/,][ ]?|:)([\w-]+)\]?[^\/]*[\s:]/

Using the regex with the non capturing match gives us:

"charles/mark: adios to play, hello scalatra".match(pairRegex)
["charles/mark: adios to play, hello ", "charles", "mark"]

Whereas if we used a normal capture we’d also capture the ‘/’:

"charles/mark: adios to play, hello scalatra".match(pairRegex)
["charles/mark: adios to play, hello ", "charles", "/", "mark"]

Written by Mark Needham

September 7th, 2011 at 8:47 pm