neo4j: Embracing the sub graph
In May I wrote a blog post explaining how I’d been designing a neo4j graph by thinking about what questions I wanted to answer about the data.
In the comments Josh Adell gave me the following advice:
The neat things about graphs is that multiple subgraphs can live in the same data-space.
Keep your data model rich! Don’t be afraid to have as many relationships as you need. The power of graph databases comes from finding surprising results when you have strongly interconnected data.
At the time I didn’t really understand the advice but I’ve since updated my graph so that it includes 'colleagues' relationships which can be derived by looking at the projects that people had worked together on.
When I was showing the graph to Marc he thought it would be quite interesting to see all the people that he hadn’t worked with before, a query that’s very easy to write when we have these two sub graphs.
We could write the following cypher query to find who Marc hadn’t worked with on a specific project:
START project=node:projects(name="Project X"), me=node:people(name="Marc Hofer") MATCH project<-[:worked_on]-neverColleague-[c?:colleagues]->me WHERE c is NULL RETURN neverColleague
We do an index lookup to find the appropriate project node and to find the node that represent’s Marc.
THe MATCH clause starts from the project and then work backwards to the people who have worked on it. We then follow an optional colleagues relationship back to Marc.
The WHERE clause makes sure that we only return people that Marc doesn’t have a 'colleagues' relationship with i.e. people Marc hasn’t worked with.
Along the same lines we could also find out all the people that Marc has worked with from a specific office:
START office=node:offices(name="London - UK South"), me=node:people(name="Marc Hofer") MATCH office<-[r:member_of]-colleague-[c:colleagues]->me WHERE (NOT(HAS(r.end_date))) RETURN colleague, c
This is reasonably similar to the previous query except the colleagues relationship is no longer optional since we only want to find people that Marc has worked with.
I’m sure there are other queries we could run but these were a couple that I hadn’t thought of before I had multiple sub graphs together on the same overall graph.
About the author
I'm currently working on real-time user-facing analytics with Apache Pinot at StarTree. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also I co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.