Mark Needham

Thoughts on Software Development

Neo4j: Cypher – Creating a time tree down to the day

with 5 comments

Michael recently wrote a blog post showing how to create a time tree representing time down to the second using Neo4j’s Cypher query language, something I built on top of for a side project I’m working on.

The domain I want to model is RSVPs to meetup invites – I want to understand how much in advance people respond and how likely they are to drop out at a later stage.

For this problem I only need to measure time down to the day so my task is a bit easier than Michael’s.

After a bit of fiddling around with leap years I believe the following query will create a time tree representing all the days from 2011 – 2014, which covers the time the London Neo4j meetup has been running:

WITH range(2011, 2014) AS years, range(1,12) as months
FOREACH(year IN years | 
  MERGE (y:Year {year: year})
  FOREACH(month IN months | 
    CREATE (m:Month {month: month})
    MERGE (y)-[:HAS_MONTH]->(m)
    FOREACH(day IN (CASE 
                      WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31) 
                      WHEN month = 2 THEN 
                        CASE
                          WHEN year % 4 <> 0 THEN range(1,28)
                          WHEN year % 100 <> 0 THEN range(1,29)
                          WHEN year % 400 <> 0 THEN range(1,29)
                          ELSE range(1,28)
                        END
                      ELSE range(1,30)
                    END) |      
      CREATE (d:Day {day: day})
      MERGE (m)-[:HAS_DAY]->(d))))

The next step is to link adjacent days together so that we can easily traverse between adjacent days without needing to go back up and down the tree. For example we should have something like this:

(jan31)-[:NEXT]->(feb1)-[:NEXT]->(feb2)

We can build this by first collecting all the ‘day’ nodes in date order like so:

MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
RETURN days

And then iterating over adjacent nodes to create the ‘NEXT’ relationship:

MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
FOREACH(i in RANGE(0, length(days)-2) | 
    FOREACH(day1 in [days[i]] | 
        FOREACH(day2 in [days[i+1]] | 
            CREATE UNIQUE (day1)-[:NEXT]->(day2))))

Now if we want to find the previous 5 days from the 1st February 2014 we could write the following query:

MATCH (y:Year {year: 2014})-[:HAS_MONTH]->(m:Month {month: 2})-[:HAS_DAY]->(:Day {day: 1})<-[:NEXT*0..5]-(day)
RETURN y,m,day
2014 04 19 22 14 04

If we want to we can create the time tree and then connect the day nodes all in one query by using ‘WITH *’ like so:

WITH range(2011, 2014) AS years, range(1,12) as months
FOREACH(year IN years | 
  MERGE (y:Year {year: year})
  FOREACH(month IN months | 
    CREATE (m:Month {month: month})
    MERGE (y)-[:HAS_MONTH]->(m)
    FOREACH(day IN (CASE 
                      WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31) 
                      WHEN month = 2 THEN 
                        CASE
                          WHEN year % 4 <> 0 THEN range(1,28)
                          WHEN year % 100 <> 0 THEN range(1,29)
                          WHEN year % 400 <> 0 THEN range(1,29)
                          ELSE range(1,28)
                        END
                      ELSE range(1,30)
                    END) |      
      CREATE (d:Day {day: day})
      MERGE (m)-[:HAS_DAY]->(d))))
 
WITH *
 
MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
FOREACH(i in RANGE(0, length(days)-2) | 
    FOREACH(day1 in [days[i]] | 
        FOREACH(day2 in [days[i+1]] | 
            CREATE UNIQUE (day1)-[:NEXT]->(day2))))

Now I need to connect the RSVP events to the tree!

Be Sociable, Share!

Written by Mark Needham

April 19th, 2014 at 9:15 pm

Posted in neo4j

Tagged with

  • http://andypalmer.com Andy Palmer

    There are well-known algorithms for calculating the difference between two dates, so you only really need to capture dates where something interesting happened (like an accept and a cancellation)

    Can you help me understand why I might want to populate the database with a calendar?

  • http://www.markhneedham.com/blog Mark Needham

    Hey Andy,

    Long time no see!

    A couple of reasons:

    1. Neo4j doesn’t have first class support for dates so you can’t easily do anything with dates just storing them as properties. Hopefully that’ll change at some stage
    2. Ideally you just create the parts of the calendar that you need but the code to do that is much trickier than just creating the whole calendar up front and since it’s only a few thousand nodes / relationships I take that trade off. Laziness #ftw

    Cheers
    Mark

  • http://andypalmer.com Andy Palmer

    I wouldn’t use properties for dates, they’re definitely first class relationships (I might want to see which days got most replies)
    I can see why it’s easier to create the whole calendar in advance, but it still feels aesthetically wrong to have so many irrelevant nodes. If you’re trying to solve the entire problem in Neo and cypher, I guess it’s ok, but if you’re using another language to interact with the data then I’d use a different tactic.

    I like that you can hack a “days between” into cypher by counting the path length between two dates via the :NEXT relationship :-)

  • Jim Salmons

    Randy, I take your points from a practical/solution-design context. But one of the things that this piece does is dramatically show the power of Cypher. The docs explain the various clauses, etc., but piecemeal examples often don’t show us neophytes how to put it all together.

    This exercise — while it has it’s own utility to Mark in the context of his problem solution — this Cypher itself is very instructive as we all share basic knowledge about dates, month lengths, etc. This shared familiarity makes the otherwise deep weeds of this elaborate code snippet understandable and instructive. I’m pretty sure I will never need queries like this to handle calendar data in my work, but I sure will learn a lot about Cypher by understanding it.

  • Karthik Srinivasan

    Thank you. Extremely useful code. Not only for cypher, but also for understanding underlying code for date manipulation logic.