Mark Needham

Thoughts on Software Development

Archive for the ‘Coding’ tag

Coding: Explore and retreat

without comments

When refactoring code or looking for the best way to integrate a new piece of functionality I generally favour a small steps/incremental approach but recent experiences have led me to believe that this isn’t always the quickest approach.

Sometimes it seems to make more sense to go on little discovery missions in the code, make some bigger steps and then if necessary retreat and revert our changes and apply the lessons learnt on our next discovery mission. This technique which isn’t anything novel but I think is quite effective.

Michael and I were recently looking at the Smart Local Moving algorithm which is used for community detection in large networks and decided to refactor the code to make sure we understood how it worked. When we started the outline of the main class was like this:

public class Network implements Cloneable, Serializable
{
    private static final long serialVersionUID = 1;
 
    private int numberOfNodes;
    private int[] firstNeighborIndex;
    private int[] neighbor;
    private double[] edgeWeight;
    private double totalEdgeWeightSelfLinks;
    private double[] nodeWeight;
    private int nClusters;
    private int[] cluster;
 
    private double[] clusterWeight;
    private int[] numberNodesPerCluster;
    private int[][] nodePerCluster;
    private boolean clusteringStatsAvailable;
...
}

My initial approach was to put methods around things to make it a bit easier to understand and then step by step replace each of those fields with nodes and relationships. I spent the first couple of hours doing this and while it was making the code more readable it wasn’t progressing very quickly and I wasn’t much wiser about how the code worked.

Michael and I paired on it for a few hours and he adopted a slightly different but more successful approach where we looked at slightly bigger chunks of code e.g. all the loops that used the firstNeighborIndex field and then created a hypothesis of what that code was doing.

In this case firstNeighborIndex acts as an offset into neighbor and is used to iterate through a node’s relationships. We thought we could probably replace that with something more similar to the Neo4j model where you have classes for nodes and relationships and a node has a method which returns a collection of relationships.

We tried tearing out everywhere that used those two fields and replacing them with our new nodes/relationships code but that didn’t work because we hadn’t realised that edgeWeight and nodeWeight are also tied to the contents of the original fields.

We therefore needed to retreat and try again. This time I put the new approach alongside the existing approach and then slowly replaced existing bits of code.

Along the way I came up with other ideas about how to restructure the code, tried some more bigger leaps to validate my ideas and then moved back into incremental mode again.

In summary I’ve found the combination of incrementally changing code and going on bigger exploratory missions works quite well.

Now I’m trying to work out when each approach is appropriate and I’ll write that up when I learn more! You can see my progress via the github commits.

Written by Mark Needham

June 17th, 2015 at 5:23 pm

Posted in Coding

Tagged with

Coding: Visualising a bitmap

without comments

Over the last month or so I’ve spent some time each day reading a new part of the Neo4j code base to get more familiar with it, and one of my favourite classes is the Bits class which does all things low level on the wire and to disk.

In particular I like its toString method which returns a binary representation of the values that we’re storing in bytes, ints and longs.

I thought it’d be a fun exercise to try and write my own function which takes in a 32 bit map and returns a string containing a 1 or 0 depending if a bit is set or not.

The key insight is that we need to iterate down from the highest order bit and then create a bit mask of that value and do a bitwise and with the full bitmap. If the result of that calculation is 0 then the bit isn’t set, otherwise it is.

For example, to check if the highest order bit (index 31) was set our bit mask would have the 32nd bit set and all of the others 0’d out.

java> (1 << 31) & 0x80000000
java.lang.Integer res5 = -2147483648

If we wanted to check if lowest order bit was set then we’d run this computation instead:

java> (1 << 0) & 0x00000001
java.lang.Integer res7 = 0
 
java> (1 << 0) & 0x00000001
java.lang.Integer res8 = 1

Now let’s put that into a function which checks all 32 bits of the bitmap rather than just the ones we define:

private String  asString( int bitmap )
{
    StringBuilder sb = new StringBuilder();
    sb.append( "[" );
    for ( int i = Integer.SIZE - 1; i >= 0; i-- )
    {
        int bitMask = 1 << i;
        boolean bitIsSet = (bitmap & bitMask) != 0;
        sb.append( bitIsSet ? "1" : "0" );
 
        if ( i > 0 &&  i % 8 == 0 )
        {
            sb.append( "," );
        }
    }
    sb.append( "]" );
    return sb.toString();
}

And a quick test to check it works:

@Test
public void shouldInspectBits()
{
    System.out.println(asString( 0x00000001 ));
    // [00000000,00000000,00000000,00000001]
 
    System.out.println(asString( 0x80000000 ));
    // [10000000,00000000,00000000,00000000]
 
    System.out.println(asString( 0xA0 ));
    // [00000000,00000000,00000000,10100000]
 
    System.out.println(asString( 0xFFFFFFFF ));
    // [11111111,11111111,11111111,11111111]
}

Neat!

Written by Mark Needham

May 3rd, 2015 at 12:19 am

Posted in Coding,Java

Tagged with

Coding: Hack then revert

without comments

For a long while my default approach when I came across a new code base that I wanted to change was to read all the code and try and understand how it all fitted together by sketching out flow of control diagrams.

Only after I’d done that would I start planning how I could make my changes.

This works reasonably well but it’s quite time consuming and a couple of years ago a former colleague (I can’t remember who!) showed me another technique which seems to be more effective.

Rather than trying to understand how all the code fits together we briefly skim it to get a general understanding but don’t drill into the specifics.

Instead once we’ve got a general understanding we make changes to the code and then either run the application or run the tests to see if it works as we expected.

There’ll often be a couple of cycles before we understand exactly what changes we need to make and I’ve found that reverting the code after each attempt works quite well.

When we change the same bit of code for the 2nd/3rd/4th time it takes a fraction of the time it did on the 1st occasion and we’ll often spot improvements that we can make which we didn’t notice before.

I’d recommend this as an exploratory tool if you haven’t already tried it and as an added bonus it’s much more fun than statically analysing code and trying to figure out how it’s mean to work!

Written by Mark Needham

August 19th, 2013 at 11:13 pm

Posted in Coding

Tagged with

Testing XML generation with vimdiff

without comments

A couple of weeks ago I spent a bit of time writing a Ruby DSL to automate the setup of load balancers, firewall and NAT rules through the VCloud API.

The VCloud API deals primarily in XML so the DSL is just a thin layer which creates the appropriate mark up.

When we started out we configured everything manually through the web console and then exported the XML so the first thing that the DSL needed to do was create XML that matched what we already had.

My previous experience using testing frameworks to do this is that they’ll tell you whether the XML you’ve generated is equivalent to your expected XML but if they differ it isn’t easy to work out what was different.

I therefore decided to use a poor man’s approach where I first copied one rule into an XML file, attempted to replicate that in the DSL, and then used vimdiff to compare the files.

Although I had to manually verify whether or not the code was working I found this approach useful as any differences between the two pieces of XML were very easy to see.

90% of the rules were almost identical so I focused on the 10% that were different and once I’d got those working it was reasonably plain sailing.

My vimdiff command read like this:

ruby generate_networking_xml.rb > bar && vimdiff -c 'set diffopt+=iwhite' bar initialFirewall.xml

After I was reasonably confident that I understood the way that the XML should be generated I created an Rspec test which checked that we could correctly create all of the existing configurations using the DSL.

While discussing this approach with Jen she suggested that an alternative would be to start with a Rspec test with most of the rules hard coded in XML and then replace them one by one with the DSL.

I think that probably does make more sense but I still quite like my hacky approach as well!

Written by Mark Needham

September 30th, 2012 at 3:48 pm

Posted in Testing

Tagged with ,

Performance: Caching per request

without comments

A couple of years ago I wrote a post describing an approach my then colleague Christian Blunden used to help improve the performance of an application where you try to do expensive things less or find another way to do them.

On the application I’m currently working on we load reference data from an Oracle database into memory based on configurations provided by the user.

There are multiple configurations and then multiple ways that those configurations can be priced so we have two nested for loops in which we load data and then perform calculations on it.

When profiling the application we realised that some of the database calls being made with the same parameters and were therefore loading back the same reference data that we’d already loaded.

The most obvious way to solve this problem would be to move the code out of the loop and make less calls to the database that way but logically the domain is expressed more clearly when it’s inside the loop.

Alex therefore came up with an alternative approach where we wrap the database calling code in a caching decorator.

The caching decorator is a request scoped object so we’re only caching those results for a short amount of time which means that we don’t have to worry about cache invalidation because it’s thrown away when the request has been processed.

I’ve previously seen this problem solving by using a Hibernate second level cache which would cache results across requests.

In our application there are more likely to be calls to the database with the same parameters within the same request rather than across requests.

The load on the system is likely to come from complex requests where many prices needed to be calculated rather than from a huge frequency of requests

If that changes then we always have the option of caching at both levels but at the moment our current approach seems to work pretty well.

Written by Mark Needham

April 30th, 2012 at 9:45 pm

Coding: Is there a name for everything?

with one comment

A month ago I wrote a post describing an approach my team has been taking to avoid premature abstractions whereby we leave code inline until we know enough about the domain to pull out meaningful classes or methods.

Since I wrote that post we’ve come across a couple of examples where there doesn’t seem to be a name to describe a data structure.

We are building a pricing engine where the input is a set of configurations and the output is a set of pricing rows associated with each configuration.

We modelled the problem using a List of Pairs of Configuration/PricingItems:

List<Pair<Configuration, PricingItem>> configurationToPricingItems = buildThoseCombinations();
Configuration pricingrow

We don’t need to do any lookups by Configuration – just show the results to the user – which is why we haven’t used a Map.

Our object oriented background suggested that there should be a name in the business domain for this but when we spoke to our business analyst and subject matter expert it became clear that they didn’t actually have a word.

Despite that it still feels strange to have to pass around a List of Pairs but I wonder if that’s because in Java we tend to abstract concepts behind classes rather than because it makes sense to do so.

If we were using clojure then I don’t think we’d feel as uncomfortable about passing around basic data structures because the language and the culture around it encourage this. We should only create a type when it’s strictly necessary.

In this case it’s a data structure to carry those combinations around and we don’t actually apply any logic to the data structure as a whole, only to the individual entries.

We wrote the code about three weeks ago now and haven’t experienced any difficulties in terms of the code being understandable or easy to work with.

I’m intrigued as to whether others have noticed a similar thing or if we aren’t embracing Domain Driven Design fully and need to dig deeper to find a missing domain concept?

Written by Mark Needham

April 23rd, 2012 at 12:20 am

Posted in Coding

Tagged with ,

Coding: Packaging by vertical slice

with 31 comments

On most of the applications I’ve worked on we’ve tended to organise/package classes by the function that they have or the layer that they fit in.

A typical package structure might therefore end up looking like this:

  • com.awesome.project
    • common
      • StringUtils
    • controllers
      • LocationController
      • PricingController
    • domain
      • Address
      • Cost
      • CostFactory
      • Location
      • Price
    • repositories
      • LocationRepository
      • PriceRepository
    • services
      • LocationService

This works reasonably well and allows you to find code which is similar in function but I find that more often than not a lot of the code that lives immediately around where you currently are isn’t actually relevant at the time.

On the last couple of applications that I’ve worked on we’ve been trying to group code around a domain concept or vertical slice of functionality.

Therefore instead of the above code we’d end up with something more like this:

  • com.awesome.project
    • location
      • Address
      • Location
      • LocationController
      • LocationRepository
      • LocationService
    • platform
      • StringUtils
    • price
      • Cost
      • CostFactory
      • Distance
      • Price
      • PriceController
      • PriceRepository

We were having a discussion about grouping code like this last week and I was struggling to describe what I prefer about the latter approach.

In the code base that I’m currently working on, which provides an API for other systems to do stuff with, it seems to lead to a design where we have created lots of potential micro services which could be deployed separately if we wanted.

That possibility wasn’t as clear to me until we started grouping code this way.

Another cool thing is that it’s made us think about the domain of the code more and whether the grouping of classes actually makes sense. We can also see which classes fall inside an aggregate root.

In the above example under ‘pricing’ we can tell that Price is an aggregate root because it has a repository which allows us to get one and we can also tell that Cost is probably contained by Price since we don’t have a way of directly getting a Cost.

We stop thinking about the domain classes as a whole, instead we think about them in their groups and how their aggregate roots might interact with each other if at all.

One disadvantage of grouping code like this is that if we’re writing a new repository, for example, we’ve got further to navigate to find another one to base ours on.

On the other hand you could argue that if we’re doing that then perhaps there’s an abstraction we can pull out to remove the problem.

It’s an interesting approach to grouping code and one thing we’ve started noticing is that we end up with some packages which have a lot of classes in them and others which have very few.

We’re not sure whether this is a symptom of us not breaking down those particular packages enough or if there are just some areas of the domain that are bigger than others.

These are just some of my early observations so it’d be interesting to hear other’s thoughts on whether this is a good/bad idea.

Written by Mark Needham

February 20th, 2012 at 9:54 pm

Posted in Coding

Tagged with

Getting stuck and agile software teams

without comments

I came across an interesting set of posts by Jeff Wofford where he talks about programmers getting stuck and it made me think that, despite its faults, agile software development does have some useful practices for stopping us getting stuck for too long.

Many of the examples that Jeff describes sound like yak shaving to me which is part of what makes programming fun but doesn’t always correlate to adding value to the product that you’re building.

Although I wrote about some of the disadvantages of pair programming a while ago it is actually a very useful practice for ensuring that we don’t get stuck.

We’re much less likely to go off down a rabbit hole trying to solve some interesting but unrelated problem if we have to try and convince someone else to come along on that journey.

On most teams that I’ve worked on at least a reasonable percentage of the team is co-located so there’s almost certainly going to be someone sitting nearby who will be able to help.

If that isn’t enough, we tend to have a very visible story wall of what everyone’s working on right next to the work space and it become pretty obvious when something has been stuck in one of the columns for a long time.

Another team member is bound to point that out and if they don’t then the standup at the beginning of the day provides a good opportunity to see if anyone else on the team has a way around the problem you’re working on.

It also provides an opportunity to find out whether the problem you’re trying to solve is actually worth solving or not by talking to the product owner/one of the business analysts.

For the types of problems that I work on more often than not it isn’t vital to solve a lot of problems that we think we need to and the product owner would much rather we just parked it and work on something else that is valuable to them.

Jeff goes on to describe some other more general ways of getting unstuck but the above are some which might not be available to us with a less collaborative approach.

Written by Mark Needham

October 20th, 2011 at 10:09 pm

Posted in Coding

Tagged with

Coding: The value in finding the generic abstraction

without comments

I recently worked on adding the meta data section for each of the different document types that it serves which involved showing 15-20 pieces of data for each document type.

There are around 4-5 document types and although the meta data for each document type is similar it’s not exactly the same!

When we got to the second document type it wasn’t obvious where the abstraction was so we went for the copy/paste approach to see if it would be any easier to see the commonality if we put the two templates side by side.

We saw some duplication in the way that we were building up each individual piece of meta data but couldn’t see any higher abstraction.

We eventually got through all the document types and hadn’t really found a clean solution to the problem.

I wanted to spend some time playing around the code to see if I could find one but Duncan pointed out that it was important to consider that refactoring in the bigger context of the application.

Even if we did find a really nice design it’s probably not going to give us any benefit since we’ve covered most of the document types and there will maybe be just one that we have to add the meta data section for.

The return on investment for finding a clean generic abstraction won’t be very high in this case.

In another part of our application we need to make it possible for the use to do faceted search but it hasn’t been decided what the final list of facets to search on will be.

It therefore needs to be very easy to make it possible to search on a new facet and include details about that facet in all search results.

We spent a couple of days about 5/6 weeks ago working out how to model that bit of code so that it would be really easy to add a new facet since we knew that there would be more coming in future.

When that time eventually came last week it took just 2 or 3 lines of code to get the new facet up and running.

In this case spending the time to find the generic abstraction had a good return on investment.

I sometimes find it difficult to know exactly which bits of code we should invest a lot of time in because there are always loads of places where improvements can be made.

Analysing whether there’s going to be a future return on investment from cleaning it up/finding the abstraction seems to be a useful thing to do.

Of course the return on investment I’m talking about here relates to the speed at which we can add future functionality.

I guess another return on investment could be reducing the time it takes to understand a piece of code if it’s likely to be read frequently.

Written by Mark Needham

August 31st, 2011 at 6:49 am

Posted in Coding

Tagged with

Coding: Light weight wrapper vs serialisation/deserialisation

with 5 comments

WrapperObjects

As I’ve mentioned before, we’re making use of a MarkLogic database on the project I’m working on which means that we’re getting quite big XML data structures coming into our application whenever we execute a query.

The normal way that I’ve seen for dealing with external systems would be to create an anti corruption layer where we initialise objects in our system with the required data from the external system.

In this case we’ve decided that approach doesn’t seem to make as much sense because we don’t need to do that much with the data that we get back.

We effectively map straight into a read model where the only logic is some formatting for how the data will be displayed on the page.

The read model objects look a bit like this:

class Content(root : xml.Node) {
    def numberOfResults: Int = (root \ "@count").text.toInt
}

They are just lightweight wrapper objects and we make use of Scala’s XML support to retrieve the various bits of content onto the page.

The advantage of doing things this way is that it means we have less code to write than we would with the serialisation/deserialisation approach although it does mean that we’re strongly coupled to the data format that our storage mechanism uses.

However, since this is one bit of the architecture which is not going to change it seems to makes sense to accept the leakage of that layer.

So far the approach seems to be working out fine but it’ll be interesting to see how well it holds up if those lightweight wrappers do end up needing to have more logic in them.

Written by Mark Needham

June 26th, 2011 at 1:58 pm

Posted in Coding

Tagged with