Archive for September, 2011
Scala: Replacing a trait with a fake one for testing
We recently wanted to replace a trait mixed into one of our classes with a fake version to make it easier to test but forgot how exactly to do that!
The class is roughly like this:
trait Foo { def foo : String = "real foo" } class Mark extends Foo {}
We originally tried to replace it like this:
trait BrokenFakeFoo { def foo : String = "broken fake foo" } val m = new Mark with BrokenFakeFoo
error: overriding method foo in trait Foo of type => String;
method foo in trait BrokenFakeFoo of type => String needs `override' modifier
val m = new Mark with BrokenFakeFooIf m compiled it would have two versions of foo but it wouldn’t know which one to use, hence the error message.
Attempt two was this:
trait BrokenFakeFoo { override def foo : String = "broken fake foo" }
error: method foo overrides nothing
trait BrokenFakeFoo { override def foo : String = "broken fake foo" }As Uday pointed out, what we actually need to do is make our fake trait extend the original one and then override the method.
trait FakeFoo extends Foo { override def foo : String = "fake foo" } val m = new Mark with FakeFoo
m.foo > res5: String = fake foo
Since FakeFoo is the right most of the traits mixed into Mark its foo method will be used over the Foo one mixed into Mark on its class definition.
jQuery: Collecting the results from a collection of asynchronous requests
Liz and I recently spent some time building a pair stair to show how long ago people had paired with each other and one of the things we had to do was make AJAX requests to get the pairing data for each person and then collate it all to build the stair.
The original attempt to do this looked a bit like this:
var people = ["Marc", "Liz", "Ken", "Duncan", "Uday", "Mark", "Charles"]; var grid = []; $.each(people, function(index, person) { $.getJSON('/git/pairs/' + person, function(data) { // parse data and create somethingCool grid.push(somethingCool); }); }); // do something with grid
When we try to do something with grid it is of course empty because we’ve attempted to access it before all of the callbacks (which populate it) have returned.
Pedro Teixeira has a nice blog post which explains how to solve this problem in node.js and we can use the same pattern here.
We need to write our own looping mechanism which is able to determine when the last callback has returned.
This is done by creating a copy of the people array and then manually iterating through it using shift.
var people = ["Marc", "Liz", "Ken", "Duncan", "Uday", "Mark", "Charles"]; var peopleCopy = people.slice(0), grid = []; (function getPairs() { var person = peopleCopy.shift(); if(peopleCopy.length == 0) { // do something with grid } else { $.getJSON("/git/pairs" + person, function(data) { // parse data and create somethingCool grid.push(somethingCool); getPairs(); }) } })();
I tried to extract the asynchronous looping and ended up with the following function:
function asyncLoop(collection, seedResult, loopFn, completionFn) { var copy = collection.slice(0); (function loop() { var item = copy.shift(); if(copy.length == 0) { completionFn(seedResult); } else { loopFn(item, seedResult, loop); } })(); }
Which could be called like this:
var people = ["Marc", "Liz", "Ken", "Duncan", "Uday", "Mark", "Charles"]; asyncLoop(people, [], function(name, grid, callBackFn) { // parse data and create something cool grid.push(somethingCool); callBackFn(); }, function(grid) { // do something with grid });
I’m not sure that it reads that much clearer but it does push some of the boiler plate code away.
Retrospectives: Getting overly focused on actions
I’ve attended a lot of different retrospectives over the last few years and one thing that seems to happen quite frequently is that a problem will be raised and there will become a massive urgency to find an action to match with that problem.
As a result of this we don’t tend to go very deeply into working out why that problem happened in the first place and how we can stop it happening in the first place.
Any discussion tends to be quite shallow and doesn’t delve very far beyond the surface of the problem.
I’ve noticed that this tends to happen more when there are a lot of people in the retrospective and there’s a desire not to ‘waste’ everyone’s time which is understandable to some extent.
We recently had an iteration where there were a lot of stories going back and forth between the developers and testers which was leading to a lot of context switching for some developers.
Since it had felt very disruptive we tried to find some way of deciding when we should or shouldn’t context switch from the current story to fix bugs on earlier stories.
In hindsight it would have been more interesting to look at why that problem existed in the first place rather than directly addressing the problem.
In this case, as my colleague Chris pointed out, it might make more sense for a developer (pair) to go and work with a tester on the story until it was ready to be signed off rather than switching back and forth.
I’ve read about other retrospective formats such as the ‘five whys‘ which might help a team to dig deeper into the problems they’re facing but I’m curious whether it’d make sense to follow such a format with over 30 people attending.
We’d need to pick a sufficiently general problem to analyse so that everyone remained engaged.
I’d be curious whether anyone else has made a similar observation and how they made their retrospectives more effective.
node.js: child_process.exec not returning all results
I’ve been playing around with some node.js code to get each of the commits from our git repository but noticed that it didn’t seem to be returning me all the results.
I had the following code:
var exec = require('child_process').exec; var gitRepository = '/some/local/path'; exec('cd ' + gitRepository + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw ', function(error, stdout, stderror) { var commits = stdout.split("\n"); // do some stuff with commits });
We have around 2000 commits in the repository but I was only getting back 1600 of them when I checked the length of commits.
Eventually I decided to print out what was in error and got the following message:
error: Error: maxBuffer exceeded.
Going back to the documentation revealed my mistake:
maxBuffer specifies the largest amount of data allowed on stdout or stderr – if this value is exceeded then the child process is killed.
The default options are
{ encoding: ‘utf8′,
timeout: 0,
maxBuffer: 200*1024,
killSignal: ‘SIGTERM’,
cwd: null,
env: null }
The limit is 2048000 which is around about the number of bytes being returned when I get to 1600 commits.
Changing the code to increase the buffer sorts it out:
var exec = require('child_process').exec; var gitRepository = '/some/local/path'; exec('cd ' + gitRepository + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw ', {maxBuffer: 500*1024}, function(error, stdout, stderror) { var commits = stdout.split("\n"); // do some stuff with commits });
The ‘window fixing’ wall
On my current project we have a wall where we keep track of ‘window fixing’ tasks – things that people want to fix in the code base but chose to defer until a later date.
Every now and then we take what’s on the wall and prioritise it according to Fabio Pereira’s effort/pain matrix so that we know which clean up tasks will provide the greatest value to the team.
While I think it’s a nice way of getting a team understanding of technical debt I think it can lead to a couple of problems which come with most attempts at group responsibility for something.
By writing the task up on the wall we’ve effectively pushed the responsibility for keeping the code clean away from us and onto the ‘team’.
It also seems to make it more acceptable to make a mess in the code because we’ve acknowledged that we’ve done that and either us or a team mate will fix it later.
In a way I suppose it’s good that people are at least conscious that they’re taking short cuts at times and we have a reasonable log of where those short cuts have been taken.
On the other hand, from my experience, when people are really motivated to fix a piece of code then they’ll find the time/way to do that whether or not it’s written up on the wall.
I think this is also a good thing even though the refactoring won’t have been prioritised by the rest of the team.
Sometimes it’s easier to go and fix something when you know what needs doing rather than deferring it and having to explain the problem to someone else later.
In summary I think the centralised wall is a good idea but not a complete replacement for people being diligent and taking care of the code base themselves.
Scala: for comprehensions with Options
I’ve generally avoided using for expressions in Scala because the keyword reminds me of for loops in Java/C# and I want to learn to program in a less imperative way.
After working with my colleague Mushtaq I realised that in some cases using for comprehensions can lead to much more readable code.
An interesting use case where this is the case is when we want to create an object from a bunch of parameters that may or may not be set. i.e. a bunch of options.
For example we might take some input from the user where they have to enter their name but could choose to leave the field blank:
val maybeFirstName : Option[String] = Some("Mark") val maybeSurname : Option[String] = None
We only want to create a Person if they have provided both names.
The for comprehension works quite well in allowing us to do this:
case class Person(firstName:String, surname:String)
scala> for { firstName <- maybeFirstName; surname <- maybeSurname } yield Person(firstName, surname) res27: Option[Person] = None
If we set the surname to have a value:
val maybeSurname : Option[String] = Some("Needham")
Running the same for comprehension will yield a Person
scala> for { firstName <- maybeFirstName; surname <- maybeSurname } yield Person(firstName, surname) res29: Option[Person] = Some(Person(Mark,Needham))
From what I understand when we have multiple values assigned using ‘<-' inside a for comprehension, each value will have flatMap called on it except for the last one which will have map called instead.
The equivalent code if we didn’t use a for comprehension would therefore look like this:
scala> maybeFirstName.flatMap { firstName => maybeSurname.map { surname => Person(firstName, surname) } } res43: Option[Person] = Some(Person(Mark,Needham))
For me the for comprehension expresses intent much better and it seems to excel even more as we add more values to the comprehension.
Javascript: Internet Explorer 8 – trim() leads to ‘Object doesn’t support this property or method’ error
We make use of the Javascript trim() function in our application but didn’t realise that it isn’t implemented by Internet Explorer until version 9.
This led to the following error on IE8 when we used it:
Message: Object doesn’t support this property or method
Line: 18
Char: 13
Code: 0
URI: http://our.app/file.js
There’s a stackoverflow thread suggesting some different ways of implementing your own ‘trim()’ method but since we’re using jQuery already we decided to just use the ‘$.trim()’ function from there.
Therefore:
var cleaned = ourString.trim();
becomes:
var cleaned = $.trim(ourString);
I’m sure I must have come across this before but I don’t remember when!
gawk: Getting story numbers from git commit messages
As I mentioned in my previous post I’ve been writing a little application to create graphs based on our git repository history and in one of them we wanted to try and create a graph showing which people had been working on which stories.
I needed a way to extract a story number from the git commit message and then store them all in a text file.
A typical commit with a story number in might look like this:
Mark/Uday #689 some awesome scala refactoring
I couldn’t think of an easy way to do this with my current knowledge of sed or the Mac version of awk but the match function of gawk (GNU awk) makes this really easy.
match(string, regexp [, array])
Search string for the longest, leftmost substring matched by the regular expression, regexp and return the character position, or index, at which that substring begins (one, if it starts at the beginning of string). If no match is found, return zero.
…
If array is present, it is cleared, and then the zeroth element of array is set to the entire portion of string matched by regexp.
The array argument is what I needed and it’s only available as a gawk extension according to the documentation.
I ended up with the following command to strip the story numbers:
git log --no-merges --pretty="format:%s" |
gawk '{ match($0, /#([0-9]+)/, arr); if(arr[1] != "") print arr[1] }'I had to install gawk using ports on my Mac but on Fedora the default installation of awk is gawk.
Learning node.js: Step
I’ve been playing around with node.js to generate some graphs from our git repository which effectively meant chaining together a bunch of shell commands to give me the repository data in the format I wanted.
I was able to do this by making use of child_process which comes with the core library.
The first version looked like this:
var exec = require('child_process').exec, _ = require("underscore"); ... function parseCommitsFromRepository(fn) { var gitRepository = "/tmp/core"; var gitPlayArea = "/tmp/" + new Date().getTime(); exec('cd ' + gitRepository + ' && git reset HEAD', function() { exec('git clone ' + gitRepository + ' ' + gitPlayArea, function() { exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', function(blank, gitEntries) { var commits = _(gitEntries.split("\n")).chain() .filter(function(item) { return item != ""; }) .map(function(item) { return item.split("|") }) .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; }) .map(function(theSplit) { var date = new Date(theSplit[1].trim().split(" ")[0]*1000); return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; }) .value(); fn(commits); }); }); }); }
node.js has an asynchronous programming model so the majority of the time we have to pass callbacks to other functions which get called when the asynchronous computation has completed.
In this case there’s an order dependency in the parseCommitsFromRepository function such that we need to nest the second call to exec inside the callback from the first call.
i.e. we don’t want to get the log of the repository before we’ve cloned the repository to the location that we’re trying to get that log from.
As you create more and more order dependencies between asynchronous functions the nesting becomes greater and the code moves more and more to the right hand side of the screen.
I came across the Step library which allows you to stack up functions and have the results from each one get passed on to the next.
I decided to try it in my code and it ended up looking like this:
function parseCommitsFromRepository(fn) { var gitRepository = "/tmp/core"; var gitPlayArea = "/tmp/" + new Date().getTime(); Step( function getRepositoryUpToDate() { exec('cd ' + gitRepository + ' && git reset HEAD', this); }, function cloneRepository() { exec('git clone ' + gitRepository + ' ' + gitPlayArea, this); }, function getGitEntries() { exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', this); }, function handleResponse(blank, gitEntries) { var commits = _(gitEntries.split("\n")).chain() .filter(function(item) { return item != ""; }) .map(function(item) { return item.split("|") }) .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; }) .map(function(theSplit) { var date = new Date(theSplit[1].trim().split(" ")[0]*1000); return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; }) .value(); fn(commits); } ); }
An interesting side effect of using this approach is that we can describe what each exec call is doing in the name of the function that executes it.
Another neat thing about this library is that I can easily wrap those functions inside a logging function if I want to see on the console where the process has got up to:
function log(message, fn) { return function logMe() { console.log(new Date().toString() + ": " + message); fn.apply(this, arguments); } }
function parseCommitsFromRepository(fn) { var gitRepository = "/tmp/core"; var gitPlayArea = "/tmp/" + new Date().getTime(); Step( log("Resetting repository", function getRepositoryUpToDate() { exec('cd ' + gitRepository + ' && git reset HEAD', this); }), log("Cloning repository", function cloneRepository() { exec('git clone ' + gitRepository + ' ' + gitPlayArea, this); }), log("Getting log", function getGitEntries() { exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', this); }), log("Processing log", function handleResponse(blank, gitEntries) { var commits = _(gitEntries.split("\n")).chain() .filter(function(item) { return item != ""; }) .map(function(item) { return item.split("|") }) .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; }) .map(function(theSplit) { var date = new Date(theSplit[1].trim().split(" ")[0]*1000); return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; }) .value(); fn(commits); }) ); }
I then get this output when executing the function:
Sun Sep 11 2011 23:33:09 GMT+0100 (BST): Resetting repository Sun Sep 11 2011 23:33:11 GMT+0100 (BST): Cloning repository Sun Sep 11 2011 23:33:24 GMT+0100 (BST): Getting log Sun Sep 11 2011 23:33:24 GMT+0100 (BST): Processing log
There are more cool ways to use the Step library on the github page – what I’ve described here is only a very simple use case.
Learning Regular Expressions: Non capturing match
I’ve been working my way slowly through the O’Reilly ‘Mastering Regular Expressions‘ book and recently read about the non capturing match operator which came in useful for some Git log parsing I’ve been doing.
On the project I’m working on we all commit as the same user and then put our names at the beginning of the commit message.
We wanted to try and find out the statistics of who’d been pairing with each other and therefore needed to extract the pairs from commits.
Unfortunately everyone writes their names in a slightly different way so the regular expression which I used to parse each commit needed to try and handle that.
For example these are some of the ways that commit messages start:
Uday/Charles #67 did some stuff mark,suzuki more stuff pat, tom: very important stuff Uday:Marc #87 stuff
The separator between the names is different in each case but in the majority of cases can be satisfied by the following regular expression:
([\/,][ ]?|:)
It’s either:
- A forward slash or comma followed by an optional space
- A colon
Since I want to express the fact that the separator can be one thing or the other I need to group those two things together in parentheses.
Unfortunately that means that the separator will be included in the array of captures that we have when parsing the commit.
I only wanted to have the names of the two people included in that array.
The non capturing match operator ‘(?:’ allows us to match against the expected separator without actually capturing it:
(?:[\/,][ ]?|:)
That regular expression is part of a much larger/probably over complicated one which also helps to capture the names of the people pairing:
var pairRegex = /^\[?([\w-]+)[ ]?[^\/, ]*(?:[\/,][ ]?|:)([\w-]+)\]?[^\/]*[\s:]/
Using the regex with the non capturing match gives us:
"charles/mark: adios to play, hello scalatra".match(pairRegex) ["charles/mark: adios to play, hello ", "charles", "mark"]
Whereas if we used a normal capture we’d also capture the ‘/’:
"charles/mark: adios to play, hello scalatra".match(pairRegex) ["charles/mark: adios to play, hello ", "charles", "/", "mark"]