Mark Needham

Thoughts on Software Development

neo4j: What question do you want to answer?

with 12 comments

Over the past few weeks I’ve been modelling ThoughtWorks project data in neo4j and I realised that the way that I’ve been doing this is by considering what question I want to answer and then building a graph to answer it.

When I first started doing this the main question I wanted to answer was ‘how connected are people to each other’ which led to me modelling the data like this:

Initial

The ‘colleagues with’ relationship stored information about the project the two people had worked on together and how long they’d worked together.

This design was fine while that was the only question I wanted to answer but after I showed it to a few people it became clear that there were other questions we could ask which would be difficult to answer with it designed this way.

e.g.

  • Which people on project X have I never worked with?
  • Which person has worked for client X for the longest?
  • Which people worked together on the same client if not the same project?

I therefore need to make ‘client’ and ‘project’ first class entities in the graph rather than just being there implicitly which favours a design more along these lines:

Initial

It makes it a little more difficult to answer the initial question about connections between people but opens up the answers to other questions such as the ones detailed above.

I’m still getting used to this way of modelling data but it feels like you’re driven towards designing your data in a way that’s useful to you as opposed to the relational approach where you tend to model relations and then work out what you want to do with the data.

Be Sociable, Share!

Written by Mark Needham

May 5th, 2012 at 1:20 pm

Posted in neo4j

Tagged with

  • http://www.facebook.com/people/Dave-Cameron/570906207 Dave Cameron

    I like that it is a “different way of thinking.” No doubt you could also model “Client” as a table and “Project” and “Person” as other tables, with relationships between them, but it sounds like maybe the graph approach helped you discover these relationships?

    Is it possible to “traceback” through the edges to project and client, to infer that original edge directly between people?

    In your final model, are there multiple nodes for each person, one for each project they were on, or is the same node hit by multiple edges coming from the different projects?

  • http://www.markhneedham.com/blog Mark Needham

    @facebook-570906207:disqus yeh you’re right I’m sure it could be modelled the same way in tables but the graph does seem to drive you to some interesting discoveries.

    It is possible to find connections between people by following edges from them to the project and then from the project to the other person, is that what you meant?

    I’m still working on the final model but the idea is there will only be one node per person and then multiple edges going to the project.

    In the initial model there are multiple edges between people – one for each project they worked on together.

  • Devdas B

    This works well when you know what questions you want to ask. It fails when you need to change the questions later.

  • http://andypalmer.com Andy Palmer

    Your original question is fairly easy to solve with Cypher (I’m going to make some assumptions about your edges)
    start p2=node(1) 
    match workedOnSameProject = 
    (p2)-[:WORKED_ON]->(project)<-[:WORKED_ON]-(other)
    return p2.name,project.name,other.name

    If you were using type nodes, you could map this for the entire company
    start person_root=node(1) match workedOnSameProject = (person_root)(project)(person_root)return p1.name,project.name,p2.name

    You could also extend this, using a multiple path query, to match people who’d worked on different projects at the same client

  • http://joshadell.com Josh Adell

    The neat things about graphs is that multiple subgraphs can live in the same data-space. There’s no reason not to keep the original Person-colleague->Person relationships in addition to the Client–>Project–>Person relationships. You can query only the relationships you want and ignore the ones that are irrelevant for any given question. Nodes can have as many relationships of as many different types as you need.

    If you haven’t already, check out Cypher for some of the neat queries you can do with multiple relationships. For instance, assuming you kept the colleagues relationships, the answer “Which people on project X have I never worked with?” can be found with:

    START project=node(X), me=node(Y)
    MATCH projectproject
    WHERE c IS NULL
    RETURN neverColleague

    X and Y are node ids, or index look-ups. What this query is looking for is any person with a “worked_on” relationship to project X where there is not a corresponding “colleague” relationship with you.

    Keep your data model rich! Don’t be afraid to have as many relationships as you need. The power of graph databases comes from finding surprising results when you have strongly interconnected data.

  • http://www.facebook.com/people/Dave-Cameron/570906207 Dave Cameron

    “It is possible to find connections between people by following edges from them to the project and then from the project to the other person, is that what you meant?”

    Yes, that was what I meant. I take it that is possible. So a “through” relationship actually goes through the node.

  • http://www.markhneedham.com/blog Mark Needham

    @a10d7a2b02fb8f2b049f53d591830d9e:disqus yeh that’s what I’m finding now that I’ve changed the question I want to ask…I have to remodel the whole graph as far as I can tell.

    Of course I’m new to this so maybe I shouldn’t be rebuilding the whole thing…

  • http://www.markhneedham.com/blog Mark Needham

    @jadell:disqus didn’t think about the subgraph idea but that sounds pretty neat. I guess I initially ruled that out because it seemed like it would be creating some duplication in the graph – the normal form lectures from university DB lectures coming back to haunt me!

  • http://andypalmer.com Andy Palmer

    You should be able to transform the database from one style to another using a traverser. The graph database equivalent of a migration :-)

  • http://www.markhneedham.com/blog Mark Needham

    @andypalmer:disqus will have to check that out..but the initial model doesn’t contain as much information as the new one.

    e.g. I’m adding in the concept of people belonging to offices and projects having an account manager which isn’t anywhere in the initial graph. 

    I guess for all the other stuff I can transform it though?

  • http://andypalmer.com Andy Palmer

    Anything that’s new is straight forward enough. You might have to compose a nifty query to find the people you want to attach to an office but growing the graph would be unlikely to cause any significant issues.
    And changing the structure of the database should usually be possible by traversal and transformation. 

    Graphs are the future ;-)

  • Balaji

    i kinda think that we would have ended up with a similar relationship graph if we normalized our tables in a relational db…