Mark Needham

Thoughts on Software Development

Archive for the ‘Software Development’ Category

4 types of user

with 2 comments

I’ve been working with Neo4j full time for slightly more than a year now and from interacting with the community I’ve noticed that while using different features of the product people fall into 4 categories.

These are as follows:


On one axis we have ‘loudness’ i.e. how vocal somebody is either on twitter, StackOverflow or by email and on the other we have ‘success’ which is how well a product feature is working for them.

The people in the top half of the diagram will get the most attention because they’re the most visible.

Of those people we’ll tend to spend more time on the people who are unhappy and vocal to try and help them solve the problems their having.

When working with the people in the top left it’s difficult to understand how representative they are for the whole user base.

It could be the case that they aren’t representative at all and actually there is a quiet majority who the product is working for and are just getting on with it with no fuss.

However, it could equally be the case that they are absolutely representative and there are a lot of users quietly suffering / giving up using the product.

I haven’t come up with a good way to come across the less vocal users but in my experience they’ll often be passive users of the user group or Stack Overflow i.e. they’ll read existing issues but not post anything themselves.

Given this uncertainty I think it makes sense to assume that the silent majority suffer the same problems as the more vocal minority.

Another interesting thing I’ve noticed about this quadrant is that the people in the top right are often the best people in the community to help those who are struggling.

It’d be interesting to know whether anyone has noticed a similar thing with the products they worked on, and if so what approach do you take to unveiling the silent majority?

Written by Mark Needham

July 29th, 2014 at 7:07 pm

Thoughts on meetups

without comments

I recently came across an interesting blog post by Zach Tellman in which he explains a new approach that he’s been trialling at The Bay Area Clojure User Group.

Zach explains that a lecture based approach isn’t necessarily the most effective way for people to learn and that half of the people attending the meetup are likely to be novices and would struggle to follow more advanced content.

He then goes on to explain an alternative approach:

We’ve been experimenting with a Clojure meetup modelled on a different academic tradition: office hours.

At a university, students who have questions about the lecture content or coursework can visit the professor and have a one-on-one conversation.

At the beginning of every meetup, we give everyone a name tag, and provide a whiteboard with two columns, “teachers” and “students”.

Attendees are encouraged to put their name and interests in both columns. From there, everyone can [...] go in search of someone from the opposite column who shares their interests.

While running Neo4j meetups we’ve had similar observations and my colleagues Stefan and Cedric actually ran a meetup in Paris a few months ago which sounds very similar to Zach’s ‘office hours’ style one.

However, we’ve also been experimenting with the idea that one size doesn’t need to fit all by running different styles of meetups aimed at different people.

For example, we have:

  • An introductory meetup which aims to get people to the point where they can follow talks about more advanced topics.
  • A more hands on session for people who want to learn how to write queries in cypher, Neo4j’s query language.
  • An advanced session for people who want to learn how to model a problem as a graph and import data into a graph.

I’m also thinking of running something similar to the Clojure Dojo but focused on data and graphs where groups of people could work together and build an app.

I noticed that Nick Manning has been doing a similar thing with the New York City Neo4j meetup as well, which is cool.

I’d be interested in hearing about different / better approaches that other people have come across so if you know of any let me know in the comments.

Written by Mark Needham

May 31st, 2014 at 7:50 pm

Posted in Software Development

Tagged with

install4j and AppleScript: Creating a Mac OS X Application Bundle for a Java application

without comments

We have a few internal applications at Neo which can be launched using ‘java -jar ‘ and I always forget where the jars are so I thought I’d wrap a Mac OS X application bundle around it to make life easier.

My favourite installation pattern is the one where when you double click the dmg it shows you a window where you can drag the application into the ‘Applications’ folder, like this:

2014 04 07 00 38 41

I’m not a fan of the installation wizards and the installation process here is so simple that a wizard seems overkill.

I started out learning about the structure of an application bundle which is well described in the Apple Bundle Programming guide. I then worked my way through a video which walks you through bundling a JAR file in a Mac application.

I figured that bundling a JAR was probably a solved problem and had a look at App Bundler, JAR Bundler and Iceberg before settling on Install4j which we used for Neo4j desktop.

I started out by creating an installer using Install4j and then manually copying the launcher it created into an Application bundle template but it was incredibly fiddly and I ended up with a variety of indecipherable messages in the system error log.

Eventually I realised that I didn’t need to create an installer and that what I actually wanted was a Mac OS X single bundle archive media file.

After I’d got install4j creating that for me I just needed to figure out how to create the background image telling the user to drag the application into their ‘Applications’ folder.

Luckily I came across this StackOverflow post which provided some AppleScript to do just that and with a bit of tweaking I ended up with the following shell script which seems to do the job:

rm target/DBench_macos_1_0_0.tgz
/Applications/install4j\ 5/bin/install4jc TestBench.install4j
rm -rf target/dmg && mkdir -p target/dmg
tar -C target/dmg -xvf target/DBench_macos_1_0_0.tgz
cp -r src/packaging/.background target/dmg
ln -s /Applications target/dmg
cd target
rm "${finalDMGName}"
umount -f /Volumes/"${title}"
hdiutil create -volname ${title} -size 100m -srcfolder dmg/ -ov -format UDRW pack.temp.dmg
device=$(hdiutil attach -readwrite -noverify -noautoopen "pack.temp.dmg" | egrep '^/dev/' | sed 1q | awk '{print $1}')
sleep 5
echo '
   tell application "Finder"
     tell disk "'${title}'"
           set current view of container window to icon view
           set toolbar visible of container window to false
           set statusbar visible of container window to false
           set the bounds of container window to {400, 100, 885, 430}
           set theViewOptions to the icon view options of container window
           set arrangement of theViewOptions to not arranged
           set icon size of theViewOptions to 72
           set background picture of theViewOptions to file ".background:'${backgroundPictureName}'"
           set position of item "'${applicationName}'" of container window to {100, 100}
           set position of item "Applications" of container window to {375, 100}
           update without registering applications
           delay 5
     end tell
   end tell
' | osascript
hdiutil detach ${device}
hdiutil convert "pack.temp.dmg" -format UDZO -imagekey zlib-level=9 -o "${finalDMGName}"
rm -f pack.temp.dmg
cd ..

To summarise, this script creates a symlink to ‘Applications’, puts a background image in a directory titled ‘.background’, sets that as the background of the window and positions the symlink and application appropriately.

Et voila:

2014 04 07 00 59 56

The Firefox guys wrote a couple of blog posts detailing their experiences writing an installer which were quite an interesting read as well.

Written by Mark Needham

April 7th, 2014 at 12:04 am

Soulver: For all your random calculations

without comments

I often find myself doing random calculations and I used to do so part manually and part using Alfred‘s calculator until Alistair pointed me at Soulver, a desktop/iPhone/iPad app, which is even better.

I thought I’d write some examples of calculations I use it for, partly so I’ll remember the syntax in future!

Calculating how much memory Neo4j memory mapping will take up

800 mb + 2660mb + 6600mb + 9500mb + 40mb in GB = 19.6 GB

How long would it take to cover 20,000 km at 100 km / day?

20,000 km / 100 km/day in months = 6.57097681677241832481 months

How long did an import of some data using the Neo4j shell take?

4550855 ms in minutes = 75.84758333333333333333 minutes

Bit shift 1 by 32 places

1 << 32 = 4,294,967,296

Translating into easier to digest units

32381KB / second in MB per minute = 1,942.86 MB/minute
500,000 / 3 years in per hour = 19.01324310408685857874 per hour^2

How long would it take to process a chunk of data?

100 GB / (32381KB / second in MB per minute)  = 51.47051254336390681778 minutes

Hexadecimal to base 10

0x1111 = 4,369
1 + 16 + 16^2 + 16^3 = 4,369

I’m sure there’s much more that you can do that I haven’t figured out yet but even for these simple examples it saves me a bunch of time.

Written by Mark Needham

March 30th, 2014 at 2:48 pm

Posted in Software Development

Tagged with

Automating Skype’s ‘This message has been removed’

with one comment

One of the stranger features of Skype is that that it allows you to delete the contents of a message that you’ve already sent to someone – something I haven’t seen on any other messaging system I’ve used.

For example if I wrote a message in Skype and wanted to edit it I would press the ‘up’ arrow:

2014 02 20 23 02 28

Once I’ve deleted the message I’d see this in the space where the message used to be:

2014 02 20 23 00 41

I almost certainly am too obsessed with this but I find it quite amusing when I see people posting and retracting messages so I wanted to see if it could be automated.

Alistair showed me Automator, a built in tool on the Mac for automating work flows.

Automator allows you to execute Applescript so we wrote the following code which selects the current chat in Skype, writes a message and then deletes it one character at a time:

on run {input, parameters}
	tell application "Skype"
	end tell
	tell application "System Events"
		set message to "now you see me, now you don't"
		keystroke message
		keystroke return
		keystroke (ASCII character 30) --up arrow
		repeat length of message times
			keystroke (ASCII character 8) --backspace
		end repeat
		keystroke return
	end tell
	return input
end run

We wired up the Applescript via the Utilities > Run Applescript menu option in Automator:

2014 02 20 23 12 38

We can then go further and wire that up to a keyboard shortcut if we want by saving the workflow as a service in Automator but for my messing around purposes clicking the ‘Run’ button from Automator didn’t seem too much of a hardship!

Written by Mark Needham

February 20th, 2014 at 11:16 pm

Learning about bitmaps

with 3 comments

A few weeks ago Alistair and I were working on the code used to model the labels that a node has attached to it in a Neo4j database.

The way this works is that chunks of 32 nodes ids are represented as a 32 bit bitmap for each label where a 1 for a bit means that a node has the label and a 0 means that it doesn’t.

For example, let’s say we have node ids 0-31 where 0 is the highest bit and 31 is the lowest bit. If only node 0 has the label then that’d be represented as the following value:

java> int bitmap = 1 << 31;
int bitmap = -2147483648

If we imagine the 32 bits positioned next to each other it would look like this:

2014 01 12 15 45 16
java> 0X80000000;
Integer res16 = -2147483648

The next thing we want to do is work out whether a node has a label applied or not. We can do this by using a bitwise AND.

For example to check whether the highest bit is set we would write the following code:

java> bitmap & (1 << 31);
Integer res10 = -2147483648

That is set as we would imagine. Now let’s check a a few bits that we know aren’t set:

java> bitmap & (1 << 0);
Integer res11 = 0
java> bitmap & (1 << 1);
Integer res12 = 0
java> bitmap & (1 << 30);
Integer res13 = 0

Another operation we might want to do is set another bit on our existing bitmap for which we can use a bitwise inclusive OR.

A bitwise inclusive OR means that a bit will be set if either value has the bit set or if both have it set.

Let’s set the second highest bit. and visualise that calculation:

2014 01 12 15 45 16

If we evaluate that we’d expect the two highest bits to be set:

java> bitmap |= (1 << 30);
Integer res14 = -1073741824

Now if we visualise the bitmap we’ll see that is indeed the case:

2014 01 12 17 16 21
java> 0XC0000000;
Integer res15 = -1073741824

The next operation we want to do is to unset a bit that we’re already set for which we can use a bitwise exclusive OR.

An exclusive OR means that a bit will only remain set if there’s a combination of (0 and 1) or (1 and 0) in the calculation. If there are two 1′s or 2 0′s then it’ll be unset.

Let’s unset the 2nd highest bit so that we’re left with just the top bit being set.

If we visualise that we have the following calculation:

2014 01 12 17 33 21

And if we evaluate that we’re back to our original bitmap:

java> bitmap ^= (1 << 30);
Integer res2 = -2147483648

I used the Java REPL to evaluate the code samples in this post and this article explains bitshift operators very clearly.

The Neo4j version of the bitmap described in this post is in the BitmapFormat class on github.

Written by Mark Needham

January 12th, 2014 at 5:44 pm

Supporting production code: Start with the simple things

without comments

A few months ago I wrote about my experiences supporting production code while working at uSwitch.

Since then I’ve been working on support for Neo4j customers and I’ve realised that there are a couple of other things to keep in mind while debugging production problems that I missed from the initial list.

Keep a clear head / Hold back your assumptions

The first is that it’s very helpful to completely clear your head of any assumptions when looking at a problem.

I’ve got into the habit of pattern matching different error messages that I come across with root causes and while that’s sometimes useful, often there are subtle differences which mean the root cause is different.

Although I still sometimes fall into the assumptions trap I’ve found that it helps to ask exactly what someone is trying to do rather than immediately trying to solve the problem.

Look for the simple things

Along with the assumptions another mistake I make is to imagine the most complicated version of events that could lead to a problem manifesting.

Sometimes this is the case but more frequently a configuration setting may have been misunderstood or a query poorly designed and the problem can be resolved more easily.

To stop myself making this mistake I have a rough flow chart in my head working down from simpler causes to more complicated ones for different problem areas.

As I said, I still do make assumptions and look for complicated reasons for problems but by keeping these two things in mind I think/hope I’m doing it less often than I used to!

Written by Mark Needham

December 20th, 2013 at 6:07 pm

Neo4j’s Graph Café London – 28th August 2013

with one comment

On Wednesday evening I attended an interesting spin on the monthly Neo4j meetup, where instead of the usual ‘talk then go to the pub afterwards’ format my colleagues Rik and Arturas organised Graph Café in the Doggetts Coat and Badge pub in Blackfriars.

The format was changed as well – the evening consisted of ~10 lightening talks which were spread out over about 3 hours, an approach Rik has used at similar events in Belgium and Holland earlier in the year.

In the gaps in between the talks people mingled with each other and shared tips/talked through the problems they were trying to solve using graphs.

There was a strong turn out and it was much more interactive than a normal meet-up where the main interaction comes from people asking the speaker questions. While there is a pub afterwards there’s always a noticeable drop out from the talk so it was good to have everyone together chatting this time.

Frank Gibson described what I thought was the coolest use of graphs of the evening. He’s modelling different drugs, which medical conditions they treat and which other drugs they aren’t compatible with. The next step is to bring that together with patients’ medical records to help doctors make treatment recommendations.

As well as the talks, table clothes were laid out on the tables where people could sketch out the problems they were working on and get input from others. Tobias went a bit meta and drew a graph about graph databases:


While it’s often said that graphs are whiteboard friendly I was still surprised at how effective this was. When someone was explaining what they were working on I found myself sketching out what I’d interpreted and then they’d join in and point out bits I’d misunderstood and bits they were thinking about changing.

Along those lines, my colleague Alistair also demoed Arrows – a Javascript library for drawing graphs which I use for most of the graph related diagrams on here.


Overall it was a fun meet up and now we need to try and work out how to keep the interactive aspect when it isn’t 25 degrees outside and we’re not in a pub overlooking St Paul’s.

Rik has put up a slide show of pictures from the event and if you’re interested in hosting your own similar event you should probably ping Rik for some tips!

Written by Mark Needham

August 31st, 2013 at 10:52 am

Ranking Systems: What I’ve learnt so far

with 5 comments

I often go off on massive tangents reading all about a new topic but don’t record what I’ve read so if I go back to the topic again in the future I have to start from scratch which is quite frustrating.

In this instance after playing around with calculating the eigenvector centrality of a sub graph I learnt that this algorithm can also be used in ranking systems.

I started off by reading a paper written by James Keener about the Perron-Frobenius Theorem and the ranking of American football teams.

The Perron-Frobenius Theorem asserts the following:

a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector has strictly positive components

This is applicable for network based ranking systems as we can build up a matrix of teams, store a value representing their performance against each other, and then calculate an ordered ranking based on eigenvector centrality.

I also came across the following articles describing different network-based approaches to ranking teams/players in tennis and basketball respectively:

Unfortunately I haven’t come across any corresponding code showing how to implement those algorithms so I need to do a bit more reading and figure out how to do it.

In the world of non network based ranking systems I came across 3 algorithms:

  • Elo – this is a method originally developed to calculate the relative skill of chess players.

    Players start out with an average rating which then increases/decreases based on games they take part in. If they beat someone much more highly ranked then they’d gain a lot of points whereas losing to someone similarly ranked wouldn’t affect their ranking too much.

    I came across a version used to rank country football teams. and the algorithm is quite well described in Christopher Allen’s article on competitive ranking systems.

  • Glicko – this method was developed as the author, Mark Glickman, detected some flaws in the Elo rating system around the reliability of players’ ratings.

    This algorithm therefore introduces the concept of a ratings deviation (RD) to measure uncertainty in a rating. If a player player plays regularly they’d have a low RD and if they don’t it’d be higher. This is then taken into account when assigning points based on games between different players.

    Rob Kohr has an implementation of this one using Javascript on his github.

  • TrueSkill – this one was developed by Microsoft Research to rank players using XBox Live. This seems similar to Glicko in that it has a rating and uncertainty for each player. TrueSkill’s FAQs suggest the following difference between the two:

    Glicko was developed as an extension of ELO and was thus naturally limited to two player matches which end in either win or loss. Glicko cannot update skill levels of players if they compete in multi-player events or even in teams. The logistic model would make it computationally expensive to deal with team and multi-player games. Moreover, chess is usually played in pre-set tournaments and thus matching the right opponents was not considered a relevant problem in Glicko. In contrast, the TrueSkill ranking system offers a way to measure the quality of a match between any set of players.

Scott Hamilton has an implementation of all these algorithms in Python which I need to play around with. He based his algorithms on a blog post written by Jeff Moser in which he explains probabilities, the Gaussian distribution, Bayesian probability and factor graphs in deciphering the TrueSkill algorithm. Moser’s created a project implementing TrueSkill in C# on github.

I follow tennis and football reasonably closely so I thought I’d do a bit of reading about the main two rankings I know about there as well:

  • UEFA club coefficients – used to rank football clubs that have taken part in a European competition over the last 5 seasons. It takes into account the importance of the match but not the strength of the opposition
  • ATP Tennis Rankings – used to rank tennis players on a rolling basis over the last 12 months. They take into account the importance of a tournament and the round a player reached to assign ranking points.

Now that I’ve recorded all that it’s time to go and play with some of them!

Written by Mark Needham

August 24th, 2013 at 11:05 am

Products & Infinite configurability

without comments

One of the common feature requests on the ThoughtWorks projects that I worked on was that the application we were working on should be almost infinitely configurable to cover potential future use cases.

My experience of attempting to do this was that you ended up with an extremely complicated code base and those future use cases often didn’t come to fruition.

It therefore made more sense to solve the problem at hand and then make the code more configurable if/when the need arose.

Now that I’m working on a product and associated tools I’m trying to understand whether those rules of application development apply.

One thing which I think makes sense is the idea of convention over configuration, an approach that I became familiar with after working with Ruby/Rails in 2010/2011.

The phrase essentially means a developer only needs to specify unconventional aspects of the application.

Even if we do this I wonder if it goes far enough. The more things we make configurable the more complexity we add and the more opportunity for people to create themselves problems through misconfiguration.

Perhaps we should only make a few things configurable and have our application work out appropriate values for everything else.

There are a reasonable number of people using a product who don’t have much interest in learning how to configure it. They just want to use it to solve a problem they have without having to think too much.

Although I haven’t used it I’m told that Azul’s Zing JVM takes the minimal configuration approach by only requiring you to specify one parameter – the heap size – and it handles everything else for you.

Of course I’m still new to this so perhaps it still does make sense to default most things but allow power users full control in case their use case differs from the average one that the defaults were created for.

I’d be interested in hearing the opinions of people more experienced in this arena of which there are undoubtably many.

Written by Mark Needham

August 22nd, 2013 at 10:11 pm