Mark Needham

Thoughts on Software Development

neo4j: Embracing the sub graph

with 3 comments

In May I wrote a blog post explaining how I’d been designing a neo4j graph by thinking about what questions I wanted to answer about the data.

In the comments Josh Adell gave me the following advice:

The neat things about graphs is that multiple subgraphs can live in the same data-space.

Keep your data model rich! Don’t be afraid to have as many relationships as you need. The power of graph databases comes from finding surprising results when you have strongly interconnected data.

At the time I didn’t really understand the advice but I’ve since updated my graph so that it includes ‘colleagues’ relationships which can be derived by looking at the projects that people had worked together on.

V2 a

When I was showing the graph to Marc he thought it would be quite interesting to see all the people that he hadn’t worked with before, a query that’s very easy to write when we have these two sub graphs.

We could write the following cypher query to find who Marc hadn’t worked with on a specific project:

START project=node:projects(name="Project X"), me=node:people(name="Marc Hofer")
MATCH project<-[:worked_on]-neverColleague-[c?:colleagues]->me
RETURN neverColleague

We do an index lookup to find the appropriate project node and to find the node that represent’s Marc.

THe MATCH clause starts from the project and then work backwards to the people who have worked on it. We then follow an optional colleagues relationship back to Marc.

The WHERE clause makes sure that we only return people that Marc doesn’t have a ‘colleagues’ relationship with i.e. people Marc hasn’t worked with.

Along the same lines we could also find out all the people that Marc has worked with from a specific office:

START office=node:offices(name="London - UK South"), me=node:people(name="Marc Hofer")
MATCH office<-[r:member_of]-colleague-[c:colleagues]->me
WHERE (NOT(HAS(r.end_date)))
RETURN colleague, c

This is reasonably similar to the previous query except the colleagues relationship is no longer optional since we only want to find people that Marc has worked with.

I’m sure there are other queries we could run but these were a couple that I hadn’t thought of before I had multiple sub graphs together on the same overall graph.

Written by Mark Needham

July 21st, 2012 at 10:46 pm

Posted in neo4j

Tagged with ,

  • John Hume

    What would it look like to try to answer these questions without the colleagues relationship? It seems to me your instinct to eliminate that relationship when you made the model richer was correct, since it’s redundant, and maintaining the same fact in multiple places makes the model harder to update. (Every time someone joins a project, in addition to the one worked_on, you have to create colleague relationships with everyone else on the team.)

    It seems like queries about working together would be pretty painful due to the date overlap logic, so maybe the redundant relationship is more practical, but it’s worth considering the cost.

  • Mark Needham

    @8b54a98526ff7a437ec604aa471deba7:disqus yeh the fact that the data is redundant is what made me shy away from it in the first place actually. In this case I think it’s probably ok to have made the extra relationship because of the dates but like you say it is hard to update. 
    Since I’m just playing around with the data I haven’t bothered updating but I guess you’d need some sort of scheduled job running against the graph to make sure it was up to date if it was a prod app.

    Without the colleagues relationship you’d need to traverse to each of a person’s projects and then go out from the project to find colleagues. Something like this I guess:

    START project=node:projects(name=”Project X”), me=node:people(name=”Marc Hofer”)
    MATCH projectotherProjectme
    WHERE c is NULL
    RETURN neverColleague

    I’m not sure how that would work if you picked a project that you had worked on – don’t know if it would pick up that project on the second ‘worked_on’ relationship in the query.

  • China Consulting

    very interesting and impressive, Relationships is more important in China such Asian countries than that in Western countries, Guan Xi, (relationships in Chinese) has been recorded as an item in English Dictionary. 
    There is big difference between Asian people and Western people, since westerners are more direct while esterners are more indirect, so it always make westerners confused when they have communications each other.