· neo4j cypher

Neo4j 2.0.0-M06 \-> 2.0.0-RC1: Optional relationships with OPTIONAL MATCH

One of the breaking changes in Neo4j 2.0.0-RC1 compared to previous versions is that the -[?]-> syntax for matching optional relationships has been retired and replaced with the http://docs.neo4j.org/chunked/milestone/query-optional-match.html construct.

An example where we might want to match an optional relationship could be if we want to find colleagues that we haven’t worked with given the following model:

2013 11 23 21 43 57

Suppose we have the following data set:

CREATE (steve:Person {name: "Steve"})
CREATE (john:Person {name: "John"})
CREATE (david:Person {name: "David"})
CREATE (paul:Person {name: "Paul"})
CREATE (sam:Person {name: "Sam"})

CREATE (londonOffice:Office {name: "London Office"})

CREATE UNIQUE (steve)-[:WORKS_IN]->(londonOffice)
CREATE UNIQUE (john)-[:WORKS_IN]->(londonOffice)
CREATE UNIQUE (david)-[:WORKS_IN]->(londonOffice)
CREATE UNIQUE (paul)-[:WORKS_IN]->(londonOffice)
CREATE UNIQUE (sam)-[:WORKS_IN]->(londonOffice)

CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(john)
CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(david)

We might write the following query to find people from the same office as Steve but that he hasn’t worked with:

MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague)
WHERE person.name = "Steve" AND office.name = "London Office"
WITH person, potentialColleague
MATCH (potentialColleague)-[c?:COLLEAGUES_WITH]-(person)
WHERE c IS null
RETURN potentialColleague
==> +----------------------+
==> | potentialColleague   |
==> +----------------------+
==> | Node[4]{name:"Paul"} |
==> | Node[5]{name:"Sam"}  |
==> +----------------------+

We first find which office Steve works in and find the people who also work in that office. Then we optionally match the 'COLLEAGUES_WITH' relationship and only return people who Steve doesn’t have that relationship with.

If we run that query in 2.0.0-RC1 we get this exception:

==> SyntaxException: Question mark is no longer used for optional patterns - use OPTIONAL MATCH instead (line 1, column 199)
==> "MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague MATCH (potentialColleague)-[c?:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague"
==>

Based on that advice we might translate our query to read like this:

MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague)
WHERE person.name = "Steve" AND office.name = "London Office"
WITH person, potentialColleague
OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person)
WHERE c IS null
RETURN potentialColleague

If we run that we get back more people than we’d expect:

==> +------------------------+
==> | potentialColleague     |
==> +------------------------+
==> | Node[15]{name:"John"}  |
==> | Node[14]{name:"David"} |
==> | Node[13]{name:"Paul"}  |
==> | Node[12]{name:"Sam"}   |
==> +------------------------+

The reason this query doesn’t work as we’d expect is because the WHERE clause immediately following OPTIONAL MATCH is part of the pattern rather than being evaluated afterwards as we’ve become used to.

The OPTIONAL MATCH part of the query matches a 'COLLEAGUES_WITH' relationship where the relationship is actually null, something of a contradiction! However, since the match is optional a row is still returned.

If we include 'c' in the RETURN part of the query we can see that this is the case:

MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague)
WHERE person.name = "Steve" AND office.name = "London Office"
WITH person, potentialColleague
OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person)
WHERE c IS null
RETURN potentialColleague, c
==> +---------------------------------+
==> | potentialColleague     | c      |
==> +---------------------------------+
==> | Node[15]{name:"John"}  | <null> |
==> | Node[14]{name:"David"} | <null> |
==> | Node[13]{name:"Paul"}  | <null> |
==> | Node[12]{name:"Sam"}   | <null> |
==> +---------------------------------+

If we take out the WHERE part of the OPTIONAL MATCH the query is a bit closer to what we want:

MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague)
WHERE person.name = "Steve" AND office.name = "London Office"
WITH person, potentialColleague
OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person)
RETURN potentialColleague, c
==> +-----------------------------------------------+
==> | potentialColleague    | c                     |
==> +-----------------------------------------------+
==> | Node[2]{name:"John"}  | :COLLEAGUES_WITH[5]{} |
==> | Node[3]{name:"David"} | :COLLEAGUES_WITH[6]{} |
==> | Node[4]{name:"Paul"}  | <null>                |
==> | Node[5]{name:"Sam"}   | <null>                |
==> +-----------------------------------------------+

If we introduce a WITH after the OPTIONAL MATCH we can choose to filter out those people that we’ve already worked with:

MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague)
WHERE person.name = "Steve" AND office.name = "London Office"
WITH person, potentialColleague
OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person)
WITH potentialColleague, c
WHERE c IS null
RETURN potentialColleague

If we evaluate that query it returns the same output as our original query:

==> +----------------------+
==> | potentialColleague   |
==> +----------------------+
==> | Node[4]{name:"Paul"} |
==> | Node[5]{name:"Sam"}  |
==> +----------------------+
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket