Mark Needham

Thoughts on Software Development

2015: A year in the life of the Neo4j London meetup group

without comments

Given we’ve only got a few more hours left of 2015 I thought it’d be fun to do a quick overview of how things have been going in the London chapter of the Neo4j meetup using Neo4j with a bit of R mixed in.

We’re going to be using the RNeo4j library to interact with the database along with a few other libraries which will help us out with different tasks:

library(RNeo4j)
library(ggplot2)
library(dplyr)
library(zoo)
 
graph = startGraph("http://localhost:7474/db/data/", username = "neo4j", password = "myPassword")

Let’s get to it:

Members

query = "MATCH (:Group {name: {name}})<-[membership:MEMBER_OF]-()
         RETURN membership.joined AS timestamp"
 
joinedDF = cypher(graph, query, name = "Neo4j - London User Group")
joinedDF$joinDate = as.Date(as.POSIXct(joinedDF$timestamp / 1000, origin="1970-01-01"))
joinedDF$joinDate = as.Date(as.POSIXct(joinedDF$timestamp / 1000, origin="1970-01-01"))
 
ggplot(aes(x = year, y = n, label = n), 
       data = joinedDF %>% mutate(year = format(joinDate, "%Y")) %>% count(year)) + 
  geom_bar(stat = "identity", fill = "Dark Blue") + 
  ggtitle("Number of new members by year") +
  geom_text(vjust=-0.5)
2015 12 31 12 23 06

A bit down on 2014 but not too far away. We’re still attracting new people who are interested in learning about graphs. Let’s drill into those numbers a bit:

byYearMon = joinedDF %>% 
  filter(format(joinDate, "%Y") == 2015) %>% 
  mutate(yearmon = as.Date(as.yearmon(joinDate))) %>% 
  count(yearmon)
 
ggplot(aes(x = yearmon, y = n, label = n), data = byYearMon) + 
  geom_bar(stat = "identity", fill = "Dark Blue") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_x_date(labels = date_format("%B"), breaks = "1 month") +
  ggtitle("Number of new members by month/year")
2015 12 31 12 39 04

We had a bit of an end of year surge in October/November which was unexpected. December has been low in previous years, there was an April dip which I think is because we stopped doing events before Graph Connect 2015. I’m not sure about the September dip so let’s have a look:

eventsQuery = "MATCH (:Group {name: {name}})-[:HOSTED_EVENT]->(event)
               RETURN event.time + event.utcOffset AS timestamp"
eventsDF = cypher(graph, eventsQuery, name = "Neo4j - London User Group")
eventsDF$timestamp = as.Date(as.POSIXct(eventsDF$timestamp / 1000, origin="1970-01-01"))
 
eventsByYearMon = eventsDF %>% 
  filter(format(timestamp, "%Y") == 2015) %>% 
  mutate(yearmon = as.Date(as.yearmon(timestamp))) %>% 
  count(yearmon) 
 
merge(eventsByYearMon, byYearMon, by="yearmon")
 
      yearmon n.x n.y
1  2015-01-01   3  80
2  2015-02-01   6  76
3  2015-03-01   2  70
4  2015-04-01   2  53
5  2015-05-01   4  78
6  2015-06-01   5  83
7  2015-07-01   3  73
8  2015-08-01   5  73
9  2015-09-01   3  40
10 2015-10-01   3  94
11 2015-11-01   4 117
12 2015-12-01   3  48

At first glance there doesn’t seem to be any correlation between the number of events held and the number of new members so I think we’ll have to look for another predictor of that variable!

Events

Next let’s have a look at the events we ran in 2015. We’ll start with a quick chart showing the number of events we’ve run over the years:

ggplot(aes(x = year, y = n, label = n), data = eventsDF %>% mutate(year = format(timestamp, "%Y")) %>% count(year)) + 
  geom_bar(stat = "identity", fill = "Dark Blue") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  ggtitle("Number of events")

2015 12 31 13 43 15

So less events than last year but how many people RSVPD ‘yes’ to the ones we did host?

eventsQuery = "MATCH (:Group {name: {name}})-[:HOSTED_EVENT]->(event)<-[:RSVPD {response: 'yes'}]-()
               WHERE event.time + event.utcOffset < timestamp()
               WITH event, COUNT(*) AS rsvps
               RETURN event.time + event.utcOffset AS timestamp, rsvps"
eventsDF = cypher(graph, eventsQuery, name = "Neo4j - London User Group")
eventsDF$timestamp = as.Date(as.POSIXct(eventsDF$timestamp / 1000, origin="1970-01-01"))
 
ggplot(aes(x = year, y = rsvps), 
       data = eventsDF %>% mutate(year = format(timestamp, "%Y")) %>% group_by(year) %>% summarise(rsvps= sum(rsvps)) ) + 
  geom_bar(stat = "identity", fill = "Dark Blue") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  ggtitle("Number of attendees")
2015 12 31 13 54 50

Slightly more ‘yes’ RSVPs than last year. Now let’s drill into the repeat events we ran this year:

eventsQuery = "MATCH (:Group {name: {name}})-[:HOSTED_EVENT]->(event)
               WHERE {startYear} <= (event.time + event.utcOffset) < {endYear}
               RETURN event.name AS event, COUNT(*) AS times
               ORDER BY times DESC"
eventsDF = cypher(graph, eventsQuery, name = "Neo4j - London User Group", 
                  startYear  = as.numeric(as.POSIXct("2015-01-01", format="%Y-%m-%d")) * 1000, 
                  endYear = as.numeric(as.POSIXct("2015-12-31", format="%Y-%m-%d")) * 1000)
eventsDF %>% filter(times > 1)
 
                                                       event times
1                      Relational to graph: A worked example     7
2                                            Intro to Graphs     6
3                          Graph Modelling - Do's and Don'ts     5
4          Hands On Intro to Cypher - Neo4j's Query Language     3
5 Build your own recommendation engine with Neo4j in an hour     2
6                                Fraud Detection using Neo4j     2

I thought we’d run ‘Intro to Graphs’ most often but the data doesn’t lie – it’s all about relational to graph. And which were the most popular repeat events in terms of ‘yes’ RSVPs?

eventsQuery = "MATCH (:Group {name: {name}})-[:HOSTED_EVENT]->(event)
               WHERE {startYear} <= (event.time + event.utcOffset) < {endYear}
               MATCH (event)<-[:RSVPD {response: 'yes'}]-()
               WITH event, COUNT(*) AS yesRSVPs
               WITH event.name AS event, COUNT(*) AS times, SUM(yesRSVPs) AS rsvps
               RETURN event, times, rsvps, rsvps / times AS rsvpsPerEvent
               ORDER BY rsvpsPerEvent DESC"
eventsDF = cypher(graph, eventsQuery, name = "Neo4j - London User Group", 
                  startYear  = as.numeric(as.POSIXct("2015-01-01", format="%Y-%m-%d")) * 1000, 
                  endYear = as.numeric(as.POSIXct("2015-12-31", format="%Y-%m-%d")) * 1000)
eventsDF %>% filter(times > 1)
 
                                                       event times rsvps rsvpsPerEvent
1                                Fraud Detection using Neo4j     2   150            75
2                                            Intro to Graphs     6   352            58
3                          Graph Modelling - Do's and Don'ts     5   281            56
4                      Relational to graph: A worked example     7   367            52
5 Build your own recommendation engine with Neo4j in an hour     2    85            42
6          Hands On Intro to Cypher - Neo4j's Query Language     3   104            34

It looks like fraud is a popular topic although we’ve only run it twice so perhaps best not to read too much into that. We’re running that one again in a couple of weeks if you’re interested.

Ignoring repeat events let’s see which event drew the biggest crowd:

eventsQuery = "MATCH (:Group {name: {name}})-[:HOSTED_EVENT]->(event)
               WHERE {startYear} <= (event.time + event.utcOffset) < {endYear}
               MATCH (event)<-[:RSVPD {response: 'yes'}]-()
               WITH event.id AS id, event.name AS event, COUNT(*) AS rsvps
               RETURN event, rsvps
               ORDER BY rsvps DESC"
eventsDF = cypher(graph, eventsQuery, name = "Neo4j - London User Group", 
                  startYear  = as.numeric(as.POSIXct("2015-01-01", format="%Y-%m-%d")) * 1000, 
                  endYear = as.numeric(as.POSIXct("2015-12-31", format="%Y-%m-%d")) * 1000)
eventsDF %>% head(5)
 
                                                                         event rsvps
1 Neo4j Full Stack Applications + Python, R and Neo4j - The Data Science Stack   133
2                          Modelling a recommendation engine: A worked example   118
3                    Building a repository of biomedical ontologies with Neo4j   107
4                     GraphHack @ Graph Connect: The night before Election Day    91
5                                        Bootstrapping a Recommendation Engine    88

A double header featuring Nicole White and Matt Wright proved to be the most popular event of the year and in fact the most popular in terms of ‘yes’ RSVPs so far:

eventsQuery = "MATCH (:Group {name: {name}})-[:HOSTED_EVENT]->(event)<-[:RSVPD {response: 'yes'}]-()
               WITH event, COUNT(*) AS rsvps
               RETURN event.name AS event, event.time + event.utcOffset AS time, rsvps
               ORDER BY rsvps DESC"
eventsDF = cypher(graph, eventsQuery, name = "Neo4j - London User Group")
eventsDF$time = as.Date(as.POSIXct(eventsDF$time / 1000, origin="1970-01-01"))
eventsDF %>% mutate(year = format(time, "%Y")) %>% dplyr::select(-time) %>% head(10)
 
                                                                          event rsvps year
1  Neo4j Full Stack Applications + Python, R and Neo4j - The Data Science Stack   133 2015
2                           Modelling a recommendation engine: A worked example   118 2015
3                     Building a repository of biomedical ontologies with Neo4j   107 2015
4                                                    Real world Neo4j use cases    98 2014
5                                                           The transport graph    94 2014
6                                                     The Visualisation Special    93 2014
7                  Impossible is Nothing by Jim Webber, Neo4j's Chief Scientist    93 2014
8                      GraphHack @ Graph Connect: The night before Election Day    91 2015
9                                         Bootstrapping a Recommendation Engine    88 2015
10                                    Scraping and Graphing the Apple app store    88 2015

3 of the top 4 belong to 2015 and 6 of the top 10. Let’s see what 2016 has in store.

Thanks to everyone who’s come along to one of our meetups and Happy New Year!

Be Sociable, Share!

Written by Mark Needham

December 31st, 2015 at 1:58 pm

Posted in neo4j

Tagged with