Mark Needham

Thoughts on Software Development

neo4j/cypher: Returning a row with zero count when no relationship exists

with 3 comments

I’ve been trying to see if I can match some of the football stats that OptaJoe posts on twitter and one that I was looking at yesterday was around the number of red cards different teams have received.

1 – Sunderland have picked up their first PL red card of the season. The only team without one now are Man Utd. Angels.

To refresh this is the sub graph that we’ll need to look at to work it out:

Sent off

I started off with the following query which traverses out from each match, finds the players who were sent off in the match and then groups the sendings off by the team they were playing for:

START game = node:matches('match_id:*')
MATCH game<-[:sent_off_in]-player-[:played]->likeThis-[:in]->game, 
      likeThis-[:for]->team
RETURN team.name, COUNT(game) AS redCards
ORDER BY redCards
LIMIT 5

When we run this we get the following results:

+------------------------------+
| team.name         | redCards |
+------------------------------+
| "Sunderland"      | 1        |
| "West Ham United" | 1        |
| "Norwich City"    | 1        |
| "Reading"         | 1        |
| "Liverpool"       | 2        |
+------------------------------+
5 rows

The problem we have here is that it hasn’t returned Manchester United because they haven’t yet received any red cards and therefore none of their players match the ‘sent_off_in’ relationship.

I ran into something similar in a post I wrote about a month ago where I was working out which day of the week players scored on.

The first step towards getting Manchester United to return with a count of 0 is to make the ‘sent_off_in’ relationship optional.

However, that on its own that isn’t enough because it now returns a count of all the player performances for each team:

START game = node:matches('match_id:*')
MATCH game<-[?:sent_off_in]-player-[:played]->likeThis-[:in]->game, 
      likeThis-[:for]->team
RETURN team.name, COUNT(game) AS redCards
ORDER BY redCards ASC
LIMIT 5
+-----------------------------+
| team.name        | redCards |
+-----------------------------+
| "Chelsea"        | 448      |
| "Wigan Athletic" | 459      |
| "Fulham"         | 460      |
| "Liverpool"      | 466      |
| "Everton"        | 467      |
+-----------------------------+
5 rows

Instead what we need to do is collect up all the ‘sent_off_in’ relationships and sum them up.

We can use the COLLECT function to do that and the neat thing about COLLECT is that it doesn’t bother collecting the empty relationships so we end up with exactly what we need:

START game = node:matches('match_id:*')
MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game, 
      likeThis-[:for]->team
RETURN team.name, COLLECT(r) AS redCards
LIMIT 5
+-----------------------------------------------------------------------------------------------------+
| team.name          | redCards                                                                       |
+-----------------------------------------------------------------------------------------------------+
| "Wigan Athletic"   | [:sent_off_in[26443] {},:sent_off_in[37785] {}]                                |
| "Everton"          | [:sent_off_in[6795] {minute:61},:sent_off_in[21735] {},:sent_off_in[34594] {}] |
| "Newcastle United" | [:sent_off_in[434] {minute:75},:sent_off_in[32389] {},:sent_off_in[34915] {}]  |
| "Southampton"      | [:sent_off_in[49393] {minute:70},:sent_off_in[49392] {minute:82}]              |
| "West Ham United"  | [:sent_off_in[21734] {minute:67}]                                              |
+-----------------------------------------------------------------------------------------------------+
5 rows

We then just need to call the LENGTH function to work out how many red cards there are in each collection and then we’re done:

START game = node:matches('match_id:*')
MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game, 
      likeThis-[:for]->team
RETURN team.name, LENGTH(COLLECT(r)) AS redCards
ORDER BY redCards
LIMIT 5
+--------------------------------+
| team.name           | redCards |
+--------------------------------+
| "Manchester United" | 0        |
| "West Ham United"   | 1        |
| "Sunderland"        | 1        |
| "Norwich City"      | 1        |
| "Reading"           | 1        |
+--------------------------------+
5 rows
Be Sociable, Share!

Written by Mark Needham

April 30th, 2013 at 7:02 am

Posted in neo4j

Tagged with ,

  • http://andypalmer.com Andy Palmer

    This feels like it’s too complicated to me. It also feels wrong that you’re starting with games when the information that you’re trying to extract relates to teams.
    I think that if you start with teams, collect a list of each game for those teams and then sum the number of sent_off relationships, you’d get the same data (including 0 results). It would probably be easier to read and understand too.

  • http://www.markhneedham.com/blog Mark Needham

    @andypalmer:disqus will give it a try the other way around when I get home.

    The tricky bit is attributing the red cards to a specific team – that’s where the complexity has come in from. Can you see an easier way to model that bit?

  • http://andypalmer.com Andy Palmer

    It should be similar to your first attempt, just with the graph traversal reversed
    Try something like:
    START team=node:matches(‘team_id:*’)
    MATCH team-[:for]-thisGame-[:played]-byPlayer-[:sent_off_in]-disgrace-[:in]-thisGame
    RETURN team.name, count(matches)

    (I’d probably start adding in some extra relationships for this query, because the name fudging has gone a bit too far :-D )