Mark Needham

Thoughts on Software Development

Archive for the ‘Coding’ Category

Coding: Explore and retreat

without comments

When refactoring code or looking for the best way to integrate a new piece of functionality I generally favour a small steps/incremental approach but recent experiences have led me to believe that this isn’t always the quickest approach.

Sometimes it seems to make more sense to go on little discovery missions in the code, make some bigger steps and then if necessary retreat and revert our changes and apply the lessons learnt on our next discovery mission. This technique which isn’t anything novel but I think is quite effective.

Michael and I were recently looking at the Smart Local Moving algorithm which is used for community detection in large networks and decided to refactor the code to make sure we understood how it worked. When we started the outline of the main class was like this:

public class Network implements Cloneable, Serializable
    private static final long serialVersionUID = 1;
    private int numberOfNodes;
    private int[] firstNeighborIndex;
    private int[] neighbor;
    private double[] edgeWeight;
    private double totalEdgeWeightSelfLinks;
    private double[] nodeWeight;
    private int nClusters;
    private int[] cluster;
    private double[] clusterWeight;
    private int[] numberNodesPerCluster;
    private int[][] nodePerCluster;
    private boolean clusteringStatsAvailable;

My initial approach was to put methods around things to make it a bit easier to understand and then step by step replace each of those fields with nodes and relationships. I spent the first couple of hours doing this and while it was making the code more readable it wasn’t progressing very quickly and I wasn’t much wiser about how the code worked.

Michael and I paired on it for a few hours and he adopted a slightly different but more successful approach where we looked at slightly bigger chunks of code e.g. all the loops that used the firstNeighborIndex field and then created a hypothesis of what that code was doing.

In this case firstNeighborIndex acts as an offset into neighbor and is used to iterate through a node’s relationships. We thought we could probably replace that with something more similar to the Neo4j model where you have classes for nodes and relationships and a node has a method which returns a collection of relationships.

We tried tearing out everywhere that used those two fields and replacing them with our new nodes/relationships code but that didn’t work because we hadn’t realised that edgeWeight and nodeWeight are also tied to the contents of the original fields.

We therefore needed to retreat and try again. This time I put the new approach alongside the existing approach and then slowly replaced existing bits of code.

Along the way I came up with other ideas about how to restructure the code, tried some more bigger leaps to validate my ideas and then moved back into incremental mode again.

In summary I’ve found the combination of incrementally changing code and going on bigger exploratory missions works quite well.

Now I’m trying to work out when each approach is appropriate and I’ll write that up when I learn more! You can see my progress via the github commits.

Written by Mark Needham

June 17th, 2015 at 5:23 pm

Posted in Coding

Tagged with

Coding: Visualising a bitmap

without comments

Over the last month or so I’ve spent some time each day reading a new part of the Neo4j code base to get more familiar with it, and one of my favourite classes is the Bits class which does all things low level on the wire and to disk.

In particular I like its toString method which returns a binary representation of the values that we’re storing in bytes, ints and longs.

I thought it’d be a fun exercise to try and write my own function which takes in a 32 bit map and returns a string containing a 1 or 0 depending if a bit is set or not.

The key insight is that we need to iterate down from the highest order bit and then create a bit mask of that value and do a bitwise and with the full bitmap. If the result of that calculation is 0 then the bit isn’t set, otherwise it is.

For example, to check if the highest order bit (index 31) was set our bit mask would have the 32nd bit set and all of the others 0’d out.

java> (1 << 31) & 0x80000000
java.lang.Integer res5 = -2147483648

If we wanted to check if lowest order bit was set then we’d run this computation instead:

java> (1 << 0) & 0x00000001
java.lang.Integer res7 = 0
java> (1 << 0) & 0x00000001
java.lang.Integer res8 = 1

Now let’s put that into a function which checks all 32 bits of the bitmap rather than just the ones we define:

private String  asString( int bitmap )
    StringBuilder sb = new StringBuilder();
    sb.append( "[" );
    for ( int i = Integer.SIZE - 1; i >= 0; i-- )
        int bitMask = 1 << i;
        boolean bitIsSet = (bitmap & bitMask) != 0;
        sb.append( bitIsSet ? "1" : "0" );
        if ( i > 0 &&  i % 8 == 0 )
            sb.append( "," );
    sb.append( "]" );
    return sb.toString();

And a quick test to check it works:

public void shouldInspectBits()
    System.out.println(asString( 0x00000001 ));
    // [00000000,00000000,00000000,00000001]
    System.out.println(asString( 0x80000000 ));
    // [10000000,00000000,00000000,00000000]
    System.out.println(asString( 0xA0 ));
    // [00000000,00000000,00000000,10100000]
    System.out.println(asString( 0xFFFFFFFF ));
    // [11111111,11111111,11111111,11111111]


Written by Mark Needham

May 3rd, 2015 at 12:19 am

Posted in Coding,Java

Tagged with

Coding: Hack then revert

without comments

For a long while my default approach when I came across a new code base that I wanted to change was to read all the code and try and understand how it all fitted together by sketching out flow of control diagrams.

Only after I’d done that would I start planning how I could make my changes.

This works reasonably well but it’s quite time consuming and a couple of years ago a former colleague (I can’t remember who!) showed me another technique which seems to be more effective.

Rather than trying to understand how all the code fits together we briefly skim it to get a general understanding but don’t drill into the specifics.

Instead once we’ve got a general understanding we make changes to the code and then either run the application or run the tests to see if it works as we expected.

There’ll often be a couple of cycles before we understand exactly what changes we need to make and I’ve found that reverting the code after each attempt works quite well.

When we change the same bit of code for the 2nd/3rd/4th time it takes a fraction of the time it did on the 1st occasion and we’ll often spot improvements that we can make which we didn’t notice before.

I’d recommend this as an exploratory tool if you haven’t already tried it and as an added bonus it’s much more fun than statically analysing code and trying to figure out how it’s mean to work!

Written by Mark Needham

August 19th, 2013 at 11:13 pm

Posted in Coding

Tagged with

Coding: Is there a name for everything?

with one comment

A month ago I wrote a post describing an approach my team has been taking to avoid premature abstractions whereby we leave code inline until we know enough about the domain to pull out meaningful classes or methods.

Since I wrote that post we’ve come across a couple of examples where there doesn’t seem to be a name to describe a data structure.

We are building a pricing engine where the input is a set of configurations and the output is a set of pricing rows associated with each configuration.

We modelled the problem using a List of Pairs of Configuration/PricingItems:

List<Pair<Configuration, PricingItem>> configurationToPricingItems = buildThoseCombinations();
Configuration pricingrow

We don’t need to do any lookups by Configuration – just show the results to the user – which is why we haven’t used a Map.

Our object oriented background suggested that there should be a name in the business domain for this but when we spoke to our business analyst and subject matter expert it became clear that they didn’t actually have a word.

Despite that it still feels strange to have to pass around a List of Pairs but I wonder if that’s because in Java we tend to abstract concepts behind classes rather than because it makes sense to do so.

If we were using clojure then I don’t think we’d feel as uncomfortable about passing around basic data structures because the language and the culture around it encourage this. We should only create a type when it’s strictly necessary.

In this case it’s a data structure to carry those combinations around and we don’t actually apply any logic to the data structure as a whole, only to the individual entries.

We wrote the code about three weeks ago now and haven’t experienced any difficulties in terms of the code being understandable or easy to work with.

I’m intrigued as to whether others have noticed a similar thing or if we aren’t embracing Domain Driven Design fully and need to dig deeper to find a missing domain concept?

Written by Mark Needham

April 23rd, 2012 at 12:20 am

Posted in Coding

Tagged with ,

Coding: Packaging by vertical slice

with 31 comments

On most of the applications I’ve worked on we’ve tended to organise/package classes by the function that they have or the layer that they fit in.

A typical package structure might therefore end up looking like this:

  • com.awesome.project
    • common
      • StringUtils
    • controllers
      • LocationController
      • PricingController
    • domain
      • Address
      • Cost
      • CostFactory
      • Location
      • Price
    • repositories
      • LocationRepository
      • PriceRepository
    • services
      • LocationService

This works reasonably well and allows you to find code which is similar in function but I find that more often than not a lot of the code that lives immediately around where you currently are isn’t actually relevant at the time.

On the last couple of applications that I’ve worked on we’ve been trying to group code around a domain concept or vertical slice of functionality.

Therefore instead of the above code we’d end up with something more like this:

  • com.awesome.project
    • location
      • Address
      • Location
      • LocationController
      • LocationRepository
      • LocationService
    • platform
      • StringUtils
    • price
      • Cost
      • CostFactory
      • Distance
      • Price
      • PriceController
      • PriceRepository

We were having a discussion about grouping code like this last week and I was struggling to describe what I prefer about the latter approach.

In the code base that I’m currently working on, which provides an API for other systems to do stuff with, it seems to lead to a design where we have created lots of potential micro services which could be deployed separately if we wanted.

That possibility wasn’t as clear to me until we started grouping code this way.

Another cool thing is that it’s made us think about the domain of the code more and whether the grouping of classes actually makes sense. We can also see which classes fall inside an aggregate root.

In the above example under ‘pricing’ we can tell that Price is an aggregate root because it has a repository which allows us to get one and we can also tell that Cost is probably contained by Price since we don’t have a way of directly getting a Cost.

We stop thinking about the domain classes as a whole, instead we think about them in their groups and how their aggregate roots might interact with each other if at all.

One disadvantage of grouping code like this is that if we’re writing a new repository, for example, we’ve got further to navigate to find another one to base ours on.

On the other hand you could argue that if we’re doing that then perhaps there’s an abstraction we can pull out to remove the problem.

It’s an interesting approach to grouping code and one thing we’ve started noticing is that we end up with some packages which have a lot of classes in them and others which have very few.

We’re not sure whether this is a symptom of us not breaking down those particular packages enough or if there are just some areas of the domain that are bigger than others.

These are just some of my early observations so it’d be interesting to hear other’s thoughts on whether this is a good/bad idea.

Written by Mark Needham

February 20th, 2012 at 9:54 pm

Posted in Coding

Tagged with

Getting stuck and agile software teams

without comments

I came across an interesting set of posts by Jeff Wofford where he talks about programmers getting stuck and it made me think that, despite its faults, agile software development does have some useful practices for stopping us getting stuck for too long.

Many of the examples that Jeff describes sound like yak shaving to me which is part of what makes programming fun but doesn’t always correlate to adding value to the product that you’re building.

Although I wrote about some of the disadvantages of pair programming a while ago it is actually a very useful practice for ensuring that we don’t get stuck.

We’re much less likely to go off down a rabbit hole trying to solve some interesting but unrelated problem if we have to try and convince someone else to come along on that journey.

On most teams that I’ve worked on at least a reasonable percentage of the team is co-located so there’s almost certainly going to be someone sitting nearby who will be able to help.

If that isn’t enough, we tend to have a very visible story wall of what everyone’s working on right next to the work space and it become pretty obvious when something has been stuck in one of the columns for a long time.

Another team member is bound to point that out and if they don’t then the standup at the beginning of the day provides a good opportunity to see if anyone else on the team has a way around the problem you’re working on.

It also provides an opportunity to find out whether the problem you’re trying to solve is actually worth solving or not by talking to the product owner/one of the business analysts.

For the types of problems that I work on more often than not it isn’t vital to solve a lot of problems that we think we need to and the product owner would much rather we just parked it and work on something else that is valuable to them.

Jeff goes on to describe some other more general ways of getting unstuck but the above are some which might not be available to us with a less collaborative approach.

Written by Mark Needham

October 20th, 2011 at 10:09 pm

Posted in Coding

Tagged with

Coding: The value in finding the generic abstraction

without comments

I recently worked on adding the meta data section for each of the different document types that it serves which involved showing 15-20 pieces of data for each document type.

There are around 4-5 document types and although the meta data for each document type is similar it’s not exactly the same!

When we got to the second document type it wasn’t obvious where the abstraction was so we went for the copy/paste approach to see if it would be any easier to see the commonality if we put the two templates side by side.

We saw some duplication in the way that we were building up each individual piece of meta data but couldn’t see any higher abstraction.

We eventually got through all the document types and hadn’t really found a clean solution to the problem.

I wanted to spend some time playing around the code to see if I could find one but Duncan pointed out that it was important to consider that refactoring in the bigger context of the application.

Even if we did find a really nice design it’s probably not going to give us any benefit since we’ve covered most of the document types and there will maybe be just one that we have to add the meta data section for.

The return on investment for finding a clean generic abstraction won’t be very high in this case.

In another part of our application we need to make it possible for the use to do faceted search but it hasn’t been decided what the final list of facets to search on will be.

It therefore needs to be very easy to make it possible to search on a new facet and include details about that facet in all search results.

We spent a couple of days about 5/6 weeks ago working out how to model that bit of code so that it would be really easy to add a new facet since we knew that there would be more coming in future.

When that time eventually came last week it took just 2 or 3 lines of code to get the new facet up and running.

In this case spending the time to find the generic abstraction had a good return on investment.

I sometimes find it difficult to know exactly which bits of code we should invest a lot of time in because there are always loads of places where improvements can be made.

Analysing whether there’s going to be a future return on investment from cleaning it up/finding the abstraction seems to be a useful thing to do.

Of course the return on investment I’m talking about here relates to the speed at which we can add future functionality.

I guess another return on investment could be reducing the time it takes to understand a piece of code if it’s likely to be read frequently.

Written by Mark Needham

August 31st, 2011 at 6:49 am

Posted in Coding

Tagged with

Coding: Light weight wrapper vs serialisation/deserialisation

with 5 comments


As I’ve mentioned before, we’re making use of a MarkLogic database on the project I’m working on which means that we’re getting quite big XML data structures coming into our application whenever we execute a query.

The normal way that I’ve seen for dealing with external systems would be to create an anti corruption layer where we initialise objects in our system with the required data from the external system.

In this case we’ve decided that approach doesn’t seem to make as much sense because we don’t need to do that much with the data that we get back.

We effectively map straight into a read model where the only logic is some formatting for how the data will be displayed on the page.

The read model objects look a bit like this:

class Content(root : xml.Node) {
    def numberOfResults: Int = (root \ "@count").text.toInt

They are just lightweight wrapper objects and we make use of Scala’s XML support to retrieve the various bits of content onto the page.

The advantage of doing things this way is that it means we have less code to write than we would with the serialisation/deserialisation approach although it does mean that we’re strongly coupled to the data format that our storage mechanism uses.

However, since this is one bit of the architecture which is not going to change it seems to makes sense to accept the leakage of that layer.

So far the approach seems to be working out fine but it’ll be interesting to see how well it holds up if those lightweight wrappers do end up needing to have more logic in them.

Written by Mark Needham

June 26th, 2011 at 1:58 pm

Posted in Coding

Tagged with

Coding: Reflection vs Action mode

without comments

It recently struck me while preparing some ThoughtWorks University sessions that there appear to be two modes that I spend my time switching between while coding:

  • Action mode – we’re focused on getting things done, making things happen
  • Reflective mode – we’re a bit more detached and looking at things from a higher level

I spent the majority of 2008 and 2009 in reflective mode on the systems I was working on which can be seen by scanning through a lot of the blog posts that I wrote during that time.

I’m sure would have been times when I was action mode but I was far more interested in how something was being built and whether that could be done more successfully.

In 2010/2011 I became much more interested in building stuff and spent more time thinking about that rather than how to refactor code or design it in a more functional way for example.

I’m now coming to the conclusion that both of these mentalities are useful at different times and I need to be more aware of when I’ve been spending too long in one or the other.

Examples of the two modes

When pairing programming these modes describe the role of the driver/navigator reasonably well.

While driving we’re head down focusing on building the required functionality whereas the navigator focuses more on quality of what’s being written, whether we can do that better, whether we’re going to cause ourselves big problems down the line and so on.

When we’re learning something new we’ll mostly be in action mode to start with – we want to see something working so that we get some feedback that we’re getting somewhere.

After a while when we’re a bit more comfortable with the language/tool/technique we’ll probably step back and reflect on how it’s going.

I think you can be in these two mindsets in non coding situations too.

For example I’ve recently become much more vocal in trying to get teams to use approaches that I’ve seen work before whereas previously I was happier to sit back and watch how things panned out and learn from that.

Are the modes this clear cut?

While writing this post I’ve started to wonder whether the way we use these two modes are actually this clearcut and whether I’m only aware of the ‘obvious’ switches between the two when in fact there are many more switches.

I’ll keep observing…

Written by Mark Needham

March 6th, 2011 at 4:19 am

Posted in Coding

Tagged with

Ruby: Refactoring from hash to object

without comments

Something I’ve noticed when I play around with Ruby in my own time is that I nearly always end up with the situation where I’m passing hashes all over my code and to start with it’s not a big deal.

Unfortunately I eventually get to the stage where I’m effectively modelling an object inside a hash and it all gets very difficult to understand.

I’ve written a few times before about incrementally refactoring code so this seemed like a pretty good chance for me to try that out.

The code in the view looked something like this:

<% @tweets.each do |tweet| %>
  <%= tweet[:key] %>  <%= tweet[:value][:something_else] %>
<% end %>

@tweets was being populated directly from a call to CouchDB so to start with I needed to change it from being a collection of hashes to a collection of objects:

I changed the Sinatra calling code from:

get '/' do
  @tweets = get_the_couchdb_tweets_hash


get '/' do
  tweets_hash = get_the_couchdb_tweets_hash
  @tweets = { |tweet| }

where TweetViewModel is defined like so:

class TweetViewModel
  attr_accessor :key, :value
  def initialize(tweet_hash)
    @key = tweet_hash[:key]
    @value = tweet_hash[:value]
  def get(lookup)
    if lookup == :key
  alias_method :[], :get

The next step was to get rid of the get method and rename those attr_accessor methods to something more intention revealing.

class TweetViewModel
  attr_accessor :url, :messages
  def initialize(tweet_hash)
    @url = tweet_hash[:key]
    @messages = tweet_hash[:value]
<% @tweets.each do |tweet| %>
  <%= tweet.url %>  <%= tweet.messages[:something_else] %>
<% end %>

I originally didn’t realise how easy it would be to make the TweetViewModel pretend to temporarily be a Hash but it actually made it really easy for me to change the code and know that it was working the whole way.

For someone with more Ruby experience perhaps it wouldn’t be necessary to break out the refactoring like this because they could fairly confidently do it in one go.

Written by Mark Needham

February 27th, 2011 at 8:10 pm

Posted in Incremental Refactoring,Ruby

Tagged with