Mark Needham

Thoughts on Software Development

Data visualisation: Is ‘interesting’ enough?

with 6 comments

I recently read a blog post by Julian Boot titled ‘visualisation without analysis is fine‘ where he suggests that we can learn things from visualising data in the right way – detailed statistical analysis isn’t always necessary.

I thought this was quite an interesting observation because over the past couple of months I’ve been playing around with ThoughtWorks data and looking at different ways to visualise aspects of the data.

For example the following visualisation shows the strength of colleague relationships between the various ThoughtWorks offices:

Map 2

We can learn some interesting things from looking at it such as:

  • There’s a very strong connection between Bangalore and London
  • There’s also a very strong connection between Porto Alegre and Delhi/Dallas
  • If we look a bit closer we can see the connections from China aren’t as strong – there’s a reasonable link to San Francisco and weaker ones to Australia but not so much to other countries.

If we know more about the domain then we’d know that there are some distributed projects going on between Porto Alegre and the other two places and there are currently quite a lot of people from Bangalore working in London.

Although these observations are quite interesting I’m not sure if they are anything more than that.

In my opinion the intent of visualisations is to provoke some sort of action by helping people see something in the data which they couldn’t see before.

In Julian’s post he refers to a visualisation showing the colours used in movie posters and how there’s a trend towards a 50/50 split between orange and blue.

It’s really interesting to look at but as Julian points out it’s well known that movie posters make heavy use orange and blue and presumably at some stage someone has already worked out that these are the two most effective colours.

In my example most people I talked to were able to predict where the strongest connections would be before I showed them the visualisation.

My current thinking is that if a visualisation is only ‘interesting’ then perhaps I haven’t played around with the data enough to find some insight that would actually lead to an outcome/improvement of some sort.

I’m definitely interested in hearing others opinions/experienced in this area though!

Be Sociable, Share!

Written by Mark Needham

July 8th, 2012 at 10:45 pm

  • Peter Gillard-Moss

    We often fall into the trap of believing that proving through data what we can infer using heuristics has low value, or what people often refer to as proving the bleedin’ obvious.
    However, there are a number of problems with this line of thought: firstly we may have guessed but we did not know with any objective degree of certainty; you say ‘most’ people guessed right but how many? 55%, 60%, 90%?  Well now 100% have access to the correct data; and lastly, but perhaps most importantly, the process is repeatable, so say in 6 months time those relationships shift there is now the ability to re-establish, in reliable form, what the new relationships are rather than basing on someone’s guess work which may be biased around old ‘data’.

  • Anonymous

    There’s a fair amount you can do to move a visualisation past the point of interesting but it does require deepening the dataset you have. Edward Tufte is pretty big on providing more information in a manner that it is not only data-rich but accessible.

    With what you’ve got, you could try
    isolating specific subsets of data. For instance, what are the
    relationships that the Sydney office have tells us a lot about the
    interactions that they have. Whilst we can see that Sydney has a whole
    range of connections, it’s pretty hard to isolate on this visualisation.

    But if you can gain extra data (and depending on the data available), you could differentiate between project and social relationships, frequency of interaction (knowing that frequent communication is a stronger indicator of the strength of the relationship than the number of relationships), movements (eg. people who are originally from a US office but now based in Porto Alegre would still maintain strong relationships with their base office), size of an office (perhaps as a bubble) and then measure over time.

    A visualisation could then tell you about the way that we build relationships (socially vs project based), how we keep those types of relationships (do social bonds hold stronger than project bonds over time) and how transitions affect our interaction with our colleagues. That’s valuable information to managing and understanding us.

    I guess the point at which a visualisation moves from interesting to valuable is either where you can discover non-obvious relationships (as you say, most people were able to predict the strongest connections) or provides actionable data (the visualisation as it stands tells us that China has weaker bonds but doesn’t give us much context to do something about it – do we push for more distributed projects involving China, or do we need to boost the social links in China and externally? Would sending people from other offices to China help or hinder their social interactions?).

    Cheers
    Tim

  • http://www.markhneedham.com/blog Mark Needham

    @google-61f99e5a3db4a6558c6effd5566c89e9:disqus yeh fair enough I hadn’t thought of it that way. And my sample size about ‘most’ people is obviously pretty small so maybe I just picked a few people who were interested/observant about this type of thing. 

    Repeating it in 6 months time sounds like a neat idea. Will be interesting to see if there are any obvious differences

  • http://www.markhneedham.com/blog Mark Needham

    @menceau:disqus awesome, lots of ideas for me to have a look at. I’ve ordered Tufte’s The Visual Display of Quantitative Information – will that cover the stuff that you mentioned about data rich/accessible? I like the idea about zooming in on specific offices. Was thinking of doing some sort of chord diagram showing the strength of the connections between that office and the others. e,g, http://mbostock.github.com/d3/ex/chord.htmlThe ‘over time’ thing is cool – it would be cool if the ThoughtWorks data was available as an event stream type of thing but at the moment I think I’d have to sample the data store at different times to create that effect. It would be particularly cool to get the data about people transferring/assigning between offices over time to look at the migration patterns.

  • http://twitter.com/julianboot Julian Boot

    Mark, Tufte’s book has a wealth of examples. It’s not a “how to” book, more of an anthology of great visual communication.  There are many examples of very data-rich displays. One of the things I liked the most about Vijay’s work was how few pixels used per poster. Very dense in terms of information, but easy to convey as he was looking at colour. The John Snow map of London is in one of Tufte’s books (not sure which one right now).  

    Enjoy!

    -julian boot

  • Bipolardisorder

    I’m enjoying your blog. Excellent posts.