Mark Needham

Thoughts on Software Development

neo4j/cypher: SQL style GROUP BY WITH LIMIT query

without comments

A few weeks ago I wrote a blog post where I described how we could construct a SQL GROUP BY style query in cypher and last week I wanted to write a similar query but with what I think would be a LIMIT clause in SQL.

I wanted to find the maximum number of goals that players had scored in a match for a specific team and started off with the following query to find all the matches that players had scored in:

START team = node:teams('name:"Manchester United"')
MATCH team-[:home_team|away_team]-game-[:scored_in]-player-[:played]-stats-[:for]-team, 
      stats-[:in]-game
RETURN DISTINCT player.name, stats.goals, game.name

We find all the matches where Manchester United were playing and then get a list of the players who scored for them in those games:

+--------------------------------------------------------------------------------+
| player.name         | stats.goals | game.name                                  |
+--------------------------------------------------------------------------------+
| "Javier Hernández"  | 1           | "Manchester United vs Wigan Athletic"      |
| "Robin Van Persie"  | 1           | "Manchester United vs Sunderland"          |
| "Danny Welbeck"     | 1           | "Manchester United vs Stoke City"          |
| "Rafael"            | 1           | "Queens Park Rangers vs Manchester United" |
| "Wayne Rooney"      | 1           | "Manchester United vs Norwich City"        |
| "Shinji Kagawa"     | 1           | "Manchester United vs Fulham"              |
| "Shinji Kagawa"     | 3           | "Manchester United vs Norwich City"        |
...
+--------------------------------------------------------------------------------+
50 rows

Our next step would be to only return the unique combinations of players and goals:

START team = node:teams('name:"Manchester United"')
MATCH team-[:home_team|away_team]-game-[:scored_in]-player-[:played]-stats-[:for]-team, 
      stats-[:in]-game
RETURN DISTINCT player.name, stats.goals
+-----------------------------------+
| player.name         | stats.goals |
+-----------------------------------+
| "Nemanja Vidic"     | 1           |
| "Shinji Kagawa"     | 1           |
| "Danny Welbeck"     | 1           |
| "Darren Fletcher"   | 1           |
| "Wayne Rooney"      | 2           |
| "Javier Hernández"  | 1           |
| "Nani"              | 1           |
| "Tom Cleverley"     | 1           |
| "Robin Van Persie"  | 2           |
| "Shinji Kagawa"     | 3           |
...
| "Robin Van Persie"  | 3           |
+-----------------------------------+
21 rows

Since we only want to return each player once along with their maximum value from the ‘stats.goals’ column all we have to do now is make use of the MAX function:

START team = node:teams('name:"Manchester United"')
MATCH team-[:home_team|away_team]-game-[:scored_in]-player-[:played]-stats-[:for]-team, 
      stats-[:in]-game
RETURN DISTINCT player.name, MAX(stats.goals) AS goals
ORDER BY goals DESC
+-----------------------------+
| player.name         | goals |
+-----------------------------+
| "Robin Van Persie"  | 3     |
| "Shinji Kagawa"     | 3     |
| "Javier Hernández"  | 2     |
| "Wayne Rooney"      | 2     |
| "Paul Scholes"      | 1     |
| "Ryan Giggs"        | 1     |
| "Tom Cleverley"     | 1     |
...
| "Rafael"            | 1     |
| "Darren Fletcher"   | 1     |
| "Nani"              | 1     |
+-----------------------------+
16 rows

Written by Mark Needham

March 18th, 2013 at 11:19 pm

Posted in neo4j

Tagged with ,