Mark Needham

Thoughts on Software Development

Archive for the ‘node.js’ tag

node.js: child_process.exec not returning all results

with one comment

I’ve been playing around with some node.js code to get each of the commits from our git repository but noticed that it didn’t seem to be returning me all the results.

I had the following code:

var exec = require('child_process').exec;
 
var gitRepository = '/some/local/path';
 
exec('cd ' + gitRepository + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw ', function(error, stdout, stderror) {
  var commits = stdout.split("\n");
 
  // do some stuff with commits
});

We have around 2000 commits in the repository but I was only getting back 1600 of them when I checked the length of commits.

Eventually I decided to print out what was in error and got the following message:

error: Error: maxBuffer exceeded.

Going back to the documentation revealed my mistake:

maxBuffer specifies the largest amount of data allowed on stdout or stderr – if this value is exceeded then the child process is killed.

The default options are

{ encoding: ‘utf8’,
timeout: 0,
maxBuffer: 200*1024,
killSignal: ‘SIGTERM’,
cwd: null,
env: null }

The limit is 2048000 which is around about the number of bytes being returned when I get to 1600 commits.

Changing the code to increase the buffer sorts it out:

var exec = require('child_process').exec;
 
var gitRepository = '/some/local/path';
 
exec('cd ' + gitRepository + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw ', {maxBuffer: 500*1024}, function(error, stdout, stderror) {
  var commits = stdout.split("\n");
 
  // do some stuff with commits
});

Written by Mark Needham

September 22nd, 2011 at 7:55 pm

Posted in Javascript

Tagged with ,

Learning node.js: Step

with 5 comments

I’ve been playing around with node.js to generate some graphs from our git repository which effectively meant chaining together a bunch of shell commands to give me the repository data in the format I wanted.

I was able to do this by making use of child_process which comes with the core library.

The first version looked like this:

var exec = require('child_process').exec, _ = require("underscore");
...
function parseCommitsFromRepository(fn) {
  var gitRepository = "/tmp/core";
  var gitPlayArea = "/tmp/" + new Date().getTime();
 
  exec('cd ' + gitRepository + ' && git reset HEAD', function() {
    exec('git clone ' + gitRepository + ' ' + gitPlayArea, function() {
      exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', function(blank, gitEntries) {
        var commits = _(gitEntries.split("\n")).chain()
                        .filter(function(item) { return item != ""; })
                        .map(function(item) { return item.split("|") })
                        .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; })
                        .map(function(theSplit) {  
                          var date = new Date(theSplit[1].trim().split(" ")[0]*1000);
                          return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; })
                        .value();			
        fn(commits);
      });		
    });
  });
}

node.js has an asynchronous programming model so the majority of the time we have to pass callbacks to other functions which get called when the asynchronous computation has completed.

In this case there’s an order dependency in the parseCommitsFromRepository function such that we need to nest the second call to exec inside the callback from the first call.

i.e. we don’t want to get the log of the repository before we’ve cloned the repository to the location that we’re trying to get that log from.

As you create more and more order dependencies between asynchronous functions the nesting becomes greater and the code moves more and more to the right hand side of the screen.

I came across the Step library which allows you to stack up functions and have the results from each one get passed on to the next.

I decided to try it in my code and it ended up looking like this:

function parseCommitsFromRepository(fn) {	
  var gitRepository = "/tmp/core";
  var gitPlayArea = "/tmp/" + new Date().getTime();	
  Step(
    function getRepositoryUpToDate() { exec('cd ' + gitRepository + ' && git reset HEAD', this); },
    function cloneRepository()       { exec('git clone ' + gitRepository + ' ' + gitPlayArea, this); },
    function getGitEntries()         { exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', this); },
    function handleResponse(blank, gitEntries) {
      var commits = _(gitEntries.split("\n")).chain()
                      .filter(function(item) { return item != ""; })
                      .map(function(item) { return item.split("|") })
                      .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; })
                      .map(function(theSplit) {  
                        var date = new Date(theSplit[1].trim().split(" ")[0]*1000);
                        return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; })
                      .value();			
      fn(commits);
    }
  );	
}

An interesting side effect of using this approach is that we can describe what each exec call is doing in the name of the function that executes it.

Another neat thing about this library is that I can easily wrap those functions inside a logging function if I want to see on the console where the process has got up to:

function log(message, fn) {
  return function logMe() {
    console.log(new Date().toString() + ": " + message);
     fn.apply(this, arguments);
  }
}
function parseCommitsFromRepository(fn) {	
  var gitRepository = "/tmp/core";
  var gitPlayArea = "/tmp/" + new Date().getTime();	
  Step(
    log("Resetting repository", function getRepositoryUpToDate() { exec('cd ' + gitRepository + ' && git reset HEAD', this); }),
    log("Cloning repository", function cloneRepository()         { exec('git clone ' + gitRepository + ' ' + gitPlayArea, this); }),
    log("Getting log", function getGitEntries()                  { exec('cd ' + gitPlayArea + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw', this); }),
    log("Processing log", function handleResponse(blank, gitEntries) {
      var commits = _(gitEntries.split("\n")).chain()
                      .filter(function(item) { return item != ""; })
                      .map(function(item) { return item.split("|") })
                      .filter(function(theSplit) { return theSplit !== undefined && theSplit[1] !== undefined && theSplit[2] !== undefined; })
                      .map(function(theSplit) {  
                        var date = new Date(theSplit[1].trim().split(" ")[0]*1000);
                        return {message: theSplit[2].trim(), date: date.toDateString(), time : date.toTimeString()}; })
                      .value();			
      fn(commits);
    })
  );	
}

I then get this output when executing the function:

Sun Sep 11 2011 23:33:09 GMT+0100 (BST): Resetting repository
Sun Sep 11 2011 23:33:11 GMT+0100 (BST): Cloning repository
Sun Sep 11 2011 23:33:24 GMT+0100 (BST): Getting log
Sun Sep 11 2011 23:33:24 GMT+0100 (BST): Processing log

There are more cool ways to use the Step library on the github page – what I’ve described here is only a very simple use case.

Written by Mark Needham

September 11th, 2011 at 10:37 pm

Posted in Javascript

Tagged with ,

node.js: Building a graph of build times using the Go API

with 3 comments

I’ve been playing around with node.js again and one thing that I wanted to do was take a CSV file generated by the Go API and extract the build times so that we could display it on a graph.

Since I don’t have a Go instance on my machine I created a URL in my node application which would mimic the API and return a CSV file.

I’m using the express web framework to take care of some of the plumbing:

dashboard.js

var express = require('express')
var app = express.createServer();
 
app.get('/fake-go', function(req, res) {
  fs.readFile('go.txt', function(err, data) {
    res.attachment("data.csv");
    res.end(data, 'UTF-8');		
  });
});

go.txt is just in my home directory and looks like this:

cruise_agent,cruise_job_duration,cruise_job_id,cruise_job_result,cruise_pipeline_counter,cruise_pipeline_label,cruise_stage_counter,cruise_timestamp_01_scheduled,cruise_timestamp_02_assigned,cruise_timestamp_03_preparing,cruise_timestamp_04_building,cruise_timestamp_05_completing,cruise_timestamp_06_completed,tests_failed_count,tests_ignored_count,tests_total_count,tests_total_duration
TheOriginalAndTheBest,275,1812,Passed,647,0.647,1,2011-08-02T14:48:33+01:00,2011-08-02T14:48:45+01:00,2011-08-02T14:48:56+01:00,2011-08-02T14:48:57+01:00,2011-08-02T14:53:11+01:00,2011-08-02T14:53:32+01:00,0,0,375,0.076
TheOriginalAndTheBest,20,1815,Cancelled,648,0.648,1,2011-08-02T15:09:32+01:00,2011-08-02T15:09:46+01:00,2011-08-02T15:09:56+01:00,2011-08-02T15:09:56+01:00,,2011-08-02T15:10:17+01:00,,,,
TheOriginalAndTheBest,268,1817,Passed,649,0.649,1,2011-08-02T15:14:20+01:00,2011-08-02T15:14:30+01:00,2011-08-02T15:14:40+01:00,2011-08-02T15:14:41+01:00,2011-08-02T15:18:49+01:00,2011-08-02T15:19:09+01:00,0,0,368,0.074
TheOriginalAndTheBest,272,1822,Passed,650,0.650,2,2011-08-02T15:30:31+01:00,2011-08-02T15:30:41+01:00,2011-08-02T15:30:51+01:00,2011-08-02T15:30:52+01:00,2011-08-02T15:35:05+01:00,2011-08-02T15:35:24+01:00,0,0,368,0.083
TheOriginalAndTheBest,271,1825,Passed,651,0.651,1,2011-08-02T15:38:33+01:00,2011-08-02T15:38:44+01:00,2011-08-02T15:38:54+01:00,2011-08-02T15:38:54+01:00,2011-08-02T15:43:06+01:00,2011-08-02T15:43:26+01:00,0,0,368,0.093

I wanted to create an end point which I could call and get back a JSON representation of all the different builds.

app.get('/go/show', function(req, res) {
  var site = http.createClient(3000, "localhost"); 
  var request = site.request("GET", "/fake-go", {'host' : "localhost"})
  request.end();
  request.on('response', function(response) {
    var data = "";
    response.setEncoding('utf8');
 
    response.on('data', function(chunk) {
      data += chunk;
    });
 
    response.on('end', function() {
      var lines = data.split("\n"), buildTimes = [];
      lines.forEach(function(line, index) {
        var columns = line.split(",");
        if(index != 0 && nonEmpty(columns[9]) && nonEmpty(columns[11]) && columns[3] == "Passed") {
          buildTimes.push({ start :  columns[9], end : columns[11]});
        }
      });
 
      res.contentType('application/json');
      res.send(JSON.stringify(buildTimes));			
    });
  });	
});
 
function isEmpty(column) {
  return column !== "" && column !== undefined
}

I should probably use underscore.js for some of that code but I didn’t want to shave that yak just yet!

I have a default route setup so that I can just go to localhost:3000 and see the graphs:

app.get('/', function(req, res){
  res.render('index.jade', { title: 'Dashboard' });
});

On the client side we can then create a graph using the RGraph API:

index.jade

h2(align="center") Project Dashboard
script
  function drawGoGraph(buildTimes) {		
    var go = new RGraph.Line('go', _(buildTimes).map(function(buildTime) { return (new Date(buildTime.end) - new Date(buildTime.start)) / 1000 }).filter(function(diff) { return diff > 0; }));
    go.Set('chart.title', 'Build Times');		
    go.Set('chart.gutter.top', 45);
    go.Set('chart.gutter.bottom', 125);
    go.Set('chart.gutter.left', 50);
    go.Set('chart.text.angle', 90);
    go.Set('chart.shadow', true);
    go.Set('chart.linewidth', 1);
 
    go.Draw();		
  }
 
  $(document).ready(function() {
    $.getJSON('/go/show', function(data) {
	  drawGoGraph(data);
    });
  });
 
div(align="center")
  canvas(id="go", width="500", height="400")
    [Please wait...]

We just do some simple subtraction between the start and end build times and then filter out any results which have an end time before the start time. I’m not entirely sure why we end up with entries like that but having those in the graph totally ruins it!

We include all the .js files in the layout.jade file.

layout.jade

!!! 5
html(lang="en")
  head
    title Project Dashboard
    script(src='RGraph/libraries/RGraph.common.core.js')
    script(src="RGraph/libraries/RGraph.common.context.js")
    script(src="RGraph/libraries/RGraph.common.annotate.js")
    script(src="RGraph/libraries/RGraph.common.tooltips.js")
    script(src="RGraph/libraries/RGraph.common.zoom.js")
    script(src="RGraph/libraries/RGraph.common.resizing.js")
    script(src="RGraph/libraries/RGraph.line.js")
    script(src="jquery-1.6.2.min.js ")
    script(src="underscore-min.js")

Et voila:

Build graph

Written by Mark Needham

August 13th, 2011 at 2:52 pm

Posted in Javascript

Tagged with

node.js: A little application with Twitter & CouchDB

with 5 comments

I’ve been continuing to play around with node.js and I thought it would be interesting to write a little application to poll Twitter every minute and save any new Tweets into a CouchDB database.

I first played around with CouchDB in May last year and initially spent a lot of time trying to work out how to install it before coming across CouchDBX which gives you one click installation for Mac OS X.

I’m using sixtus’ node-couch library to communicate with CouchDB and I’ve written a little bit of code that allows me to call the Twitter API.

What did I learn?

  • I’ve been reading through Brian Guthrie’s slides from his ‘Advanced Ruby Idioms So Clean You Can Eat Off Of Them‘ talk from RubyConfIndia and one of the suggestions he makes is that in Ruby there are only 6 acceptable types of signatures for functions:
    • 0 parameters
    • 1 parameter
    • 2 parameters
    • A hash
    • 1 parameter and a hash
    • A variable number of arguments passed in as an array

    It seems to me that the same guidelines would be applicable in JavaScript as well except instead of a Hash we can pass in an object with properties and values to serve the same purpose. A lot of the jQuery libraries I’ve used actually do this anyway so it’s an idiom that’s in both languages.

    I originally wrote my twitter function it so that it would take in several of the arguments individually:

    export.query = function(username, password, method) { ... }

    After reading Brian’s slides I realised that this was quickly going to become a mess so I’ve changed the signature to only take in the most important parameter (‘method’) on its own with the other parameters passed in an ‘options’ object:

    export.query = function(method, options) { ... }

    I’ve not written functions that take in parameters like this before but I really like it so far. It really helps simplify signatures while allowing you to pass in extra values if necessary.

  • I find myself porting higher order functions from C#/F# into JavaScript whenever I can’t find a function to do what I want – it’s fun writing them but I’m not sure how idiomatic the code I’m writing is.

    For example I wanted to write a function to take query parameters out of an options object and create a query string out of them. I adapted the code from node-couch and ended up with the following:

    Object.prototype.filter = function(fn) {
        var result = {};
        for (var name in this) {
            if (this.hasOwnProperty(name)) {
                if (fn.call(this[name], name, this[name])) {
                    result[name] = this[name];
                }
            }
        }
        return result;
    };
     
    Object.prototype.into = function(theArray, fn) {
        for (var name in this) {
            if (this.hasOwnProperty(name)) {
                theArray.push(fn.call(this[name], name, this[name]));
            }
        }
        return theArray;
    };
     
    function encodeOptions(options) {
        var parameters = [];
        if (typeof(options) === "object" && options !== null) {
            parameters = options
                            .filter(function(name) {
                                return !(name === "username" || name === "password" || name === "callback");})
                            .into([], function(name, value) {
                                return encodeURIComponent(name) + "=" + encodeURIComponent(value); });
        }
        return parameters.length ? ("?" + parameters.join("&")) : "";
    }

    I’m not sure how wise it is adding these functions to the object prototype – I haven’t had any problems so far but I guess if other libraries I’m using changed the prototype of these built in types in the same way as I am then I might get unexpected behaviour.

    Would the typical way to defend against this be to check if a function is defined before trying to define one and throwing an exception if so? Or is adding to the prototype just a dangerous thing to do altogether?

    Either way I’m not altogether convinced that the code with these higher order functions actually reads better than it would without them.

  • I’m finding it quite interesting that a lot of the code I write around node.js depends on callbacks which means that if you have 3 operations that depend on each other then you end up with nested callbacks which almost reads like code written in a continuation passing style.

    For example I have some code which needs to do the following:

    • Query CouchDB to get the ID of the most recently saved tweet
    • Query Twitter with that ID to get any tweets since that one
    • Save those tweets to CouchDB
    var server = http.createServer(function (req, res) {
        couchDB.view("application/sort-by-id", {
            descending : true,
            success : function(response) {
                twitter.query("friends_timeline", {
                    ...
                    since_id : response.rows[0].value.id,
                    callback : function(tweets) {
                        tweets.each(function(tweet) {
                            couchDB.saveDoc(tweet, {
                                success : function(doc) {
                                    sys.log("Document " + doc._id + " successfully saved")
                                },
                                error : function() {
                                    sys.log("Document failed to save");
                                }
                            });
                        }); 
     
                    }
                });
            },
            error : function() {
                sys.log("Failed to retrieve documents");
            }
        });
     
        ...
    });

    There’s a ‘success’ callback for calling ‘couchDB.view’ and then a ‘callback’ callback for calling ‘twitter.query’ and finally a ‘success’ callback from calling ‘couchDB.saveDoc’.

    To me it’s not that obvious what the code is doing at first glance – perhaps because I’m not that used to this style of programming – but I’m intrigued if there’s a way to write the code to make it more readable.

  • I haven’t yet worked out a good way to test drive code in a node.js module. As I understand it all the functions we define except for ones added to the ‘exports’ object are private to the module so there’s no way to test against that code directly unless you pull it out into another module.

    At the moment I’m just changing the code and then restarting the server and checking if it’s working or not. It’s probably not the most effective feedback cycle but it’s working reasonably well so far.

I’ve put the code that I’ve written so far as gists on github:

That can be run with the following command from the terminal:

node twitter-server.js

Written by Mark Needham

March 21st, 2010 at 10:13 pm

Posted in Javascript

Tagged with ,