Mark Needham

Thoughts on Software Development

neo4j/cypher: Converting queries from 1.9 to 2.0 – ‘Can’t use optional patterns without explicit START clause’

without comments

I’ve been playing around with the most recent Neo4j 2.0 milestone release – 2.0.0-M05 – and one of the first things I did was translate the queries from my football data set which were written against Neo4j 1.9.

The following query calculates the number of goals scored by players in matches that were shown on television, not on television and in total.

START player=node:players('name:*')
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

We use a legacy index to get all of the players, find the games they participated in, check if those games were on television and then group by player to work out whether the goals were scored in televised games.

When we evaluate that query we get the following result:

==> +--------------------------------------------------------+
==> | player.name        | nonTvGoals | tvGoals | totalGoals |
==> +--------------------------------------------------------+
==> | "Robin Van Persie" | 11         | 15      | 26         |
==> | "Gareth Bale"      | 8          | 13      | 21         |
==> | "Luis Suárez"      | 12         | 11      | 23         |
==> | "Theo Walcott"     | 5          | 9       | 14         |
==> | "Demba Ba"         | 7          | 8       | 15         |
==> | "Edin Dzeko"       | 7          | 7       | 14         |
==> | "Santi Cazorla"    | 5          | 7       | 12         |
==> | "Juan Mata"        | 6          | 6       | 12         |
==> | "Steven Gerrard"   | 3          | 6       | 9          |
==> | "Carlos Tevez"     | 5          | 6       | 11         |
==> +--------------------------------------------------------+
==> 10 rows

The first step was to get rid of the legacy index and replace it with a label based one. I created a label ‘Player’ for this purpose:

START player = node:players('name:*') SET player :Player RETURN player
CREATE INDEX on :Player(name)

Then I got rid of the legacy index lookup and replaced it with a label based one:

MATCH (player:Player)-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

Unfortunately you can’t use optional relationships like this:

PatternException: Can't use optional patterns without explicit START clause. Optional relationships: `t`

A neat workaround which Andres showed me is to first match all of the players and then make them available using a WITH statement:

MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

If we evaluate this query it returns us the same results as before. However, given that we now have better collection support it seemed a shame to still use HEAD and TAIL so I replaced those with the following:

MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g[1] IS NULL)| goals + g'[0]) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g[1] <> NULL)| goals + g'[0]) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + g'[0]) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

I didn’t like those array indexes either and since we can now create maps I thought the query would be clearer if we created one on line 4 of the query rather than an array:

MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT({goals: stats.goals, ontv: TYPE(t)}) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g.ontv IS NULL)| goals + g'.goals) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g.ontv <> NULL)| goals + g'.goals) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + g'.goals) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

The next tweak would be to remove the FILTER and replace that with a list comprehension:

MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT({goals: stats.goals, ontv: TYPE(t)}) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN [g IN games WHERE g.ontv IS NULL] | goals + g'.goals) AS nonTvGoals,
       REDUCE(goals = 0, g' IN [g IN games WHERE g.ontv <> NULL] | goals + g'.goals) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + g'.goals) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

Although the query ends up being a couple of lines longer than its 1.9 cousin I think there’s less noise and the intent is clearer.

Written by Mark Needham

October 3rd, 2013 at 4:16 pm

Posted in neo4j

Tagged with ,