Mark Needham

Thoughts on Software Development

Neo4j: Cypher – avoid duplicate calls to NOT patterns

with 2 comments

I’ve been reacquainting myself with the meetup.com dataset ahead of Wednesday’s meetup in London and wanted to write a collaborative filtering type query to work out which groups people in my groups were in.

This started simple enough:

MATCH (member:Member {name: "Mark Needham"})-[:MEMBER_OF]->(group:Group)<-[:MEMBER_OF]-(other:Member)-[:MEMBER_OF]->(otherGroup:Group)
RETURN otherGroup, COUNT(*) AS commonMembers
ORDER BY commonMembers DESC
LIMIT 5

And doesn’t take too long to run:

Cypher version: CYPHER 2.3, planner: COST. 1084378 total db hits in 1103 ms.

However, it was showing up several groups that I’m already a member of so I added in a “WHERE NOT” clause to sort that out:

MATCH (member:Member {name: "Mark Needham"})-[:MEMBER_OF]->(group:Group)<-[:MEMBER_OF]-(other:Member)-[:MEMBER_OF]->(otherGroup:Group)
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup, COUNT(*) AS commonMembers
ORDER BY commonMembers DESC
LIMIT 5

Unfortunately when I ran this the amount of db hits increased by 14x and it now took 3x as long to run:

Cypher version: CYPHER 2.3, planner: COST. 14061442 total db hits in 3364 ms.


The problem is that we’re making lots of duplicate calls to NOT (member)-[:MEMBER_OF]->(otherGroup) because each group shows up lots of times.

This is the ‘reduce cardinality of work in progress’ tip from Michael Hunger’s blog post:

Bonus Query Tuning Tip: Reduce Cardinality of Work in Progress

When following longer paths, you’ll encounter duplicates. If you’re not interested in all the possible paths – but just distinct information from stages of the path – make sure that you eagerly eliminate duplicates, so that later matches don’t have to be executed many multiple times.

We can reduce the WIP in our query by doing the counting of common members first and then filtering out the groups we’re already a member of:

MATCH (member:Member {name: "Mark Needham"})-[:MEMBER_OF]->(group:Group)<-[:MEMBER_OF]-(other:Member)-[:MEMBER_OF]->(otherGroup:Group)
WITH otherGroup, member, COUNT(*) AS commonMembers
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup, commonMembers
ORDER BY commonMembers DESC
LIMIT 5

This gets us back down to something closer to the running time/db hits of our initial query:

Cypher version: CYPHER 2.3, planner: COST. 1097114 total db hits in 1004 ms.
Be Sociable, Share!

Written by Mark Needham

January 17th, 2016 at 12:19 pm

Posted in neo4j

Tagged with

  • ikwattro

    Thanks for the blog post, I hope this will be solved later when TriadicSelection will work on longer patterns.

  • @ikwattro:disqus yes I was wondering why it wasn’t picking up the TriadicSelection actually but when I sketched it out I realised I had a path of length 4 instead of 3. Damn it!