Mark Needham

Thoughts on Software Development

Archive for the ‘neo4j’ tag

Neo4j: Cypher – Deleting duplicate nodes

without comments

I had a problem on a graph I was working on recently where I’d managed to create duplicate nodes because I hadn’t applied any unique constraints.

I wanted to remove the duplicates, and came across Jimmy Ruts’ excellent post which shows some ways to do this.

Let’s first create a graph with some duplicate nodes to play with:

UNWIND range(0, 100) AS id
CREATE (p1:Person {id: toInteger(rand() * id)})
MERGE (p2:Person {id: toInteger(rand() * id)})
MERGE (p3:Person {id: toInteger(rand() * id)})
MERGE (p4:Person {id: toInteger(rand() * id)})
CREATE (p1)-[:KNOWS]->(p2)
CREATE (p1)-[:KNOWS]->(p3)
CREATE (p1)-[:KNOWS]->(p4)
 
Added 173 labels, created 173 nodes, set 173 properties, created 5829 relationships, completed after 408 ms.

How do we find the duplicate nodes?

MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10
 
╒══════════════════════╤═════════════╕
│"ids"                 │"size(nodes)"│
╞══════════════════════╪═════════════╡
│[1,1,1,1,1,1,1,1]     │8            │
├──────────────────────┼─────────────┤
│[0,0,0,0,0,0,0,0]     │8            │
├──────────────────────┼─────────────┤
│[17,17,17,17,17,17,17]│7            │
├──────────────────────┼─────────────┤
│[4,4,4,4,4,4,4]       │7            │
├──────────────────────┼─────────────┤
│[2,2,2,2,2,2]         │6            │
├──────────────────────┼─────────────┤
│[5,5,5,5,5,5]         │6            │
├──────────────────────┼─────────────┤
│[19,19,19,19,19,19]   │6            │
├──────────────────────┼─────────────┤
│[11,11,11,11,11]      │5            │
├──────────────────────┼─────────────┤
│[25,25,25,25,25]      │5            │
├──────────────────────┼─────────────┤
│[43,43,43,43,43]      │5            │
└──────────────────────┴─────────────┘

Let’s zoom in on all the people with ‘id:1’ and work out how many relationships they have. Our plan is keep the node that has the most connections and get rid of the others.

MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
WITH nodes ORDER BY size(nodes) DESC
LIMIT 1
UNWIND nodes AS n 
RETURN n.id, id(n) AS internalId, size((n)--()) AS rels
ORDER BY rels DESC
 
╒══════╤════════════╤══════╕
│"n.id"│"internalId"│"rels"│
╞══════╪════════════╪══════╡
│1     │175         │1284  │
├──────┼────────────┼──────┤
│1     │184         │721   │
├──────┼────────────┼──────┤
│1     │180         │580   │
├──────┼────────────┼──────┤
│1     │2           │391   │
├──────┼────────────┼──────┤
│1     │195         │361   │
├──────┼────────────┼──────┤
│1     │199         │352   │
├──────┼────────────┼──────┤
│1     │302         │5     │
├──────┼────────────┼──────┤
│1     │306         │1     │
└──────┴────────────┴──────┘

So in this example we want to keep the node that has 210 relationships and delete the rest.

To make things easy we need the node with the highest cardinality to be first or last in our list. We can ensure that’s the case by ordering the nodes before we group them.

MATCH (p:Person)
WITH p 
ORDER BY p.id, size((p)--()) DESC
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
RETURN [ n in nodes | {id: n.id,rels: size((n)--()) } ] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10
 
╒══════════════════════════════════════════════════════════════════════╤═════════════╕
│"ids"                                                                 │"size(nodes)"│
╞══════════════════════════════════════════════════════════════════════╪═════════════╡
│[{"id":1,"rels":1284},{"id":1,"rels":721},{"id":1,"rels":580},{"id":1,│8            │
│"rels":391},{"id":1,"rels":361},{"id":1,"rels":352},{"id":1,"rels":5},│             │
│{"id":1,"rels":1}]                                                    │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":0,"rels":2064},{"id":0,"rels":2059},{"id":0,"rels":1297},{"id":│8            │
│0,"rels":1124},{"id":0,"rels":995},{"id":0,"rels":928},{"id":0,"rels":│             │
│730},{"id":0,"rels":702}]                                             │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":17,"rels":153},{"id":17,"rels":105},{"id":17,"rels":81},{"id":1│7            │
│7,"rels":31},{"id":17,"rels":15},{"id":17,"rels":14},{"id":17,"rels":1│             │
│}]                                                                    │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":4,"rels":394},{"id":4,"rels":320},{"id":4,"rels":250},{"id":4,"│7            │
│rels":201},{"id":4,"rels":162},{"id":4,"rels":162},{"id":4,"rels":14}]│             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":2,"rels":514},{"id":2,"rels":329},{"id":2,"rels":318},{"id":2,"│6            │
│rels":241},{"id":2,"rels":240},{"id":2,"rels":2}]                     │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":5,"rels":487},{"id":5,"rels":378},{"id":5,"rels":242},{"id":5,"│6            │
│rels":181},{"id":5,"rels":158},{"id":5,"rels":8}]                     │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":19,"rels":153},{"id":19,"rels":120},{"id":19,"rels":84},{"id":1│6            │
│9,"rels":53},{"id":19,"rels":45},{"id":19,"rels":1}]                  │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":11,"rels":222},{"id":11,"rels":192},{"id":11,"rels":172},{"id":│5            │
│11,"rels":152},{"id":11,"rels":89}]                                   │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":25,"rels":133},{"id":25,"rels":107},{"id":25,"rels":98},{"id":2│5            │
│5,"rels":15},{"id":25,"rels":2}]                                      │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":43,"rels":92},{"id":43,"rels":85},{"id":43,"rels":9},{"id":43,"│5            │
│rels":5},{"id":43,"rels":1}]                                          │             │
└──────────────────────────────────────────────────────────────────────┴─────────────┘───────────────────────────────────────────────────────┴─────────────┘

Now it’s time to delete the duplicates:

MATCH (p:Person)
WITH p 
ORDER BY p.id, size((p)--()) DESC
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
UNWIND nodes[1..] AS n
DETACH DELETE n
 
Deleted 143 nodes, deleted 13806 relationships, completed after 29 ms.

Now if we run our duplicate query:

MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10
 
(no changes, no records)

What about if we remove the WHERE clause?

MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes 
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10
 
╒═════╤═════════════╕
│"ids"│"size(nodes)"│
╞═════╪═════════════╡
│[23] │1            │
├─────┼─────────────┤
│[86] │1            │
├─────┼─────────────┤
│[77] │1            │
├─────┼─────────────┤
│[59] │1            │
├─────┼─────────────┤
│[50] │1            │
├─────┼─────────────┤
│[32] │1            │
├─────┼─────────────┤
│[41] │1            │
├─────┼─────────────┤
│[53] │1            │
├─────┼─────────────┤
│[44] │1            │
├─────┼─────────────┤
│[8]  │1            │
└─────┴─────────────┘

Hoorah, no more duplicates! Finally, let’s check that we kept the node we expected to keep. We expect it to have an ‘internalId’ of 175:

MATCH (p:Person {id: 1})
RETURN size((p)--()), id(p) AS internalId
 
╒═══════════════╤════════════╕
│"size((p)--())"│"internalId"│
╞═══════════════╪════════════╡
│242            │175         │
└───────────────┴────────────┘

Which it does! There are many fewer relationships than we had before because a lot of those relationships were to duplicate nodes that we’ve now deleted.

If we want to go a step further we could ‘merge’ the duplicate node’s relationships onto the nodes that we did keep, but that’s for another post!

Written by Mark Needham

October 6th, 2017 at 4:13 pm

Posted in neo4j

Tagged with ,

AWS: Spinning up a Neo4j instance with APOC installed

without comments

One of the first things I do after installing Neo4j is install the APOC library, but I find it’s a bit of a manual process when spinning up a server on AWS so I wanted to simplify it a bit.

There’s already a Neo4j AMI which installs Neo4j 3.2.0 and my colleague Michael pointed out that we could download APOC into the correct folder by writing a script and sending it as UserData.

I’ve been doing some work in JavaScript over the last two weeks so I thought I’d automate all the steps using the AWS library. You can find the full script on GitHub.

The UserData part of the script is actually very simple:

This script creates a key pair, security group, opens up that security group on ports 22 (SSH), 7474 (HTTP), 7473 (HTTPS), and 7687 (Bolt). The server created is m3.medium, but you can change that to something else if you prefer.

#!/bin/bash
curl -L https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.2.0.3/apoc-3.2.0.3-all.jar -O
sudo cp apoc-3.2.0.3-all.jar /var/lib/neo4j/plugins/

We can run it like this:

$ node neo4j-with-apoc.js 
Creating a Neo4j server
Key pair created. Save this to a file - you'll need to use it if you want to ssh into the Neo4j server
-----BEGIN RSA PRIVATE KEY-----
<Private key details>
-----END RSA PRIVATE KEY-----
Created Group Id:<Group Id>
Opened Neo4j ports
Instance Id: <Instance Id>
Your Neo4j server is now ready!
You'll need to login to the server and change the default password:
https://ec2-ip-address.compute-1.amazonaws.com:7473 or http://ec2-ip-address.compute-1.amazonaws.com:7474
User:neo4j, Password:<Instance Id>

We’ll need to wait a few seconds for Neo4j to spin up, but it’ll be accessible at the URI specified.

Once it’s accessible we can login with the username neo4j and password . We’ll then be instructed to choose a new password.

We can then run the following query to check that APOC has been installed:

call dbms.procedures() YIELD name
WHERE name starts with "apoc"
RETURN count(*)
 
╒══════════╕
│"count(*)"│
╞══════════╡
│214       │
└──────────┘

Cool, it worked and we can now Neo4j and APOC to our heart’s content! If we want to SSH into the server we can do that as well by first saving the private key printed on the command line to a file and then executing the following command:

$ cat aws-private-key.pem
-----BEGIN RSA PRIVATE KEY-----
<Private key details>
-----END RSA PRIVATE KEY-----
 
$ chmod 600 aws-private-key.pem
 
$ ssh -i aws-private-key.pem ubuntu@ec2-ip-address.compute-1.amazonaws.com
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-1013-aws x86_64)
 
 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
 
  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud
 
106 packages can be updated.
1 update is a security update.
 
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

You can start/stop neo4j by running the following command:

$ /etc/init.d/neo4j 
Usage: /etc/init.d/neo4j {start|stop|status|restart|force-reload}

The other commands you may be used to finding in the bin folder can be found here:

$ ls -lh /usr/share/neo4j/bin/
total 48K
-rwxr-xr-x 1 neo4j adm   15K May  9 09:22 neo4j
-rwxr-xr-x 1 neo4j adm  5.6K May  9 09:22 neo4j-admin
-rwxr-xr-x 1 root  root  612 May 12 00:03 neo4j-awspasswd
-rwxr-xr-x 1 neo4j adm  5.6K May  9 09:22 neo4j-import
-rwxr-xr-x 1 neo4j adm  5.6K May  9 09:22 neo4j-shell
drwxr-xr-x 2 neo4j adm  4.0K May 11 22:13 tools

Let me know if this is helpful and if you have any suggestions/improvements.

Written by Mark Needham

September 30th, 2017 at 9:23 pm

Posted in neo4j

Tagged with , , ,

Neo4j: Cypher – Create Cypher map with dynamic keys

without comments

I was recently trying to create a map in a Cypher query but wanted to have dynamic keys in that map. I started off with this query:

WITH "a" as dynamicKey, "b" as dynamicValue
RETURN { dynamicKey: dynamicValue } AS map
 
 
╒══════════════════╕
│"map"             │
╞══════════════════╡
│{"dynamicKey":"b"}│
└──────────────────┘

Not quite what we want! We want dynamicKey to be evaluated rather than treated as a literal. As usual, APOC comes to the rescue!

In fact APOC has several functions that will help us out here. Let’s take a look at them:

CALL dbms.functions() yield name, description
WHERE name STARTS WITH "apoc.map.from"
RETURN name, description
 
╒═════════════════════╤═════════════════════════════════════════════════════╕
│"name"               │"description"                                        │
╞═════════════════════╪═════════════════════════════════════════════════════╡
│"apoc.map.fromLists" │"apoc.map.fromLists([keys],[values])"                │
├─────────────────────┼─────────────────────────────────────────────────────┤
│"apoc.map.fromNodes" │"apoc.map.fromNodes(label, property)"                │
├─────────────────────┼─────────────────────────────────────────────────────┤
│"apoc.map.fromPairs" │"apoc.map.fromPairs([[key,value],[key2,value2],...])"│
├─────────────────────┼─────────────────────────────────────────────────────┤
│"apoc.map.fromValues"│"apoc.map.fromValues([key1,value1,key2,value2,...])" │
└─────────────────────┴─────────────────────────────────────────────────────┘

So we can generate a map like this:

WITH "a" as dynamicKey, "b" as dynamicValue
RETURN apoc.map.fromValues([dynamicKey, dynamicValue]) AS map
 
╒═════════╕
│"map"    │
╞═════════╡
│{"a":"b"}│
└─────────┘

or like this:

WITH "a" as dynamicKey, "b" as dynamicValue
RETURN apoc.map.fromLists([dynamicKey], [dynamicValue]) AS map
 
╒═════════╕
│"map"    │
╞═════════╡
│{"a":"b"}│
└─────────┘

or even like this:

WITH "a" as dynamicKey, "b" as dynamicValue
RETURN apoc.map.fromPairs([[dynamicKey, dynamicValue]]) AS map
 
╒═════════╕
│"map"    │
╞═════════╡
│{"a":"b"}│
└─────────┘

Written by Mark Needham

September 19th, 2017 at 7:30 pm

Posted in neo4j

Tagged with , ,

Neo4j: Cypher – Rounding of floating point numbers/BigDecimals

without comments

I was doing some data cleaning a few days ago and wanting to multiply a value by 1 million. My Cypher code to do this looked like this:

with "8.37" as rawNumeric 
RETURN toFloat(rawNumeric) * 1000000 AS numeric
 
╒═════════════════╕
│"numeric"        │
╞═════════════════╡
│8369999.999999999│
└─────────────────┘

Unfortunately that suffers from the classic rounding error when working with floating point numbers. I couldn’t figure out a way to solve it using pure Cypher, but there tends to be an APOC function to solve every problem and this was no exception.

I’m using Neo4j 3.2.3 so I downloaded the corresponding APOC jar and put it in a plugins directory:

$ ls -lh plugins/
total 3664
-rw-r--r--@ 1 markneedham  staff   1.8M  9 Aug 09:14 apoc-3.2.0.4-all.jar

I’m using Docker so I needed to tell that where my plugins folder lives:

$ docker run -v $PWD/plugins:/plugins \
    -p 7474:7474 \
    -p 7687:7687 \
    -e NEO4J_AUTH="none" \
    neo4j:3.2.3

Now we’re reading to try out our new function:

with "8.37" as rawNumeric 
RETURN apoc.number.exact.mul(rawNumeric,"1000000") AS apocConversion
 
╒════════════════╕
│"apocConversion"│
╞════════════════╡
│"8370000.00"    │
└────────────────┘

That almost does what we want, but the result is a string rather than numeric value. It’s not too difficult to fix though:

with "8.37" as rawNumeric 
RETURN toFloat(apoc.number.exact.mul(rawNumeric,"1000000")) AS apocConversion
 
╒════════════════╕
│"apocConversion"│
╞════════════════╡
│8370000         │
└────────────────┘

That’s more like it!

Written by Mark Needham

August 13th, 2017 at 7:23 am

Posted in neo4j

Tagged with , ,

Docker: Building custom Neo4j images on Mac OS X

without comments

I sometimes needs to create custom Neo4j Docker images to try things out and wanted to share my work flow, mostly for future Mark but also in case it’s useful to someone else.

There’s already a docker-neo4j repository so we’ll just tweak the files in there to achieve what we want.

$ git clone git@github.com:neo4j/docker-neo4j.git
$ cd docker-neo4j

If we want to build a Docker image for Neo4j Enterprise Edition we can run the following build target:

$ make clean build-enterprise
Makefile:9: *** This Make does not support .RECIPEPREFIX. Please use GNU Make 4.0 or later.  Stop.

Denied at the first hurdle! What version of make have we got on this machine?

$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
 
This program built for i386-apple-darwin11.3.0

We can sort that out by installing a newer version using brew:

$ brew install make
$ gmake --version
GNU Make 4.2.1
Built for x86_64-apple-darwin15.6.0
Copyright (C) 1988-2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

That’s more like it! brew installs make with the ‘g’ prefix and since I’m not sure if anything else on my system relies on the older version of make I won’t bother changing the symlink.

Let’s retry our original command:

$ gmake clean build-enterprise
Makefile:14: *** NEO4J_VERSION is not set.  Stop.

It’s still not happy with us! Let’s set that environment variable to the latest released version as of writing:

$ export NEO4J_VERSION="3.2.2"
$ gmake clean build-enterprise
...
Successfully built c16b6f2738de
Successfully tagged test/18334:latest
Neo4j 3.2.2-enterprise available as: test/18334

We can see that image in Docker land by running the following command:

$ docker images | head -n2
REPOSITORY                                     TAG                          IMAGE ID            CREATED             SIZE
test/18334                                     latest                       c16b6f2738de        4 minutes ago       303MB

If I wanted to deploy that image to my own Docker Hub I could run the following commands:

$ docker login --username=markhneedham
$ docker tag c16b6f2738de markhneedham/neo4j:3.2.2
$ docker push markhneedham/neo4j

Putting Neo4j Enterprise 3.2.2 on my Docker Hub isn’t very interesting though – that version is already on the official Neo4j Docker Hub.

I’ve actually been building versions of Neo4j against the HEAD of the Neo4j 3.2 branch (i.e. 3.2.3-SNAPSHOT), deploying those to S3, and then building a Docker image based on those archives.

To change the destination of the Neo4j artifact we need to tweak this line in the Makefile:

$ git diff Makefile
diff --git a/Makefile b/Makefile
index c77ed1f..98e05ca 100644
--- a/Makefile
+++ b/Makefile
@@ -15,7 +15,7 @@ ifndef NEO4J_VERSION
 endif
 
 tarball = neo4j-$(1)-$(2)-unix.tar.gz
-dist_site := http://dist.neo4j.org
+dist_site := https://s3-eu-west-1.amazonaws.com/core-edge.neotechnology.com/20170726
 series := $(shell echo "$(NEO4J_VERSION)" | sed -E 's/^([0-9]+\.[0-9]+)\..*/\1/')
 
 all: out/enterprise/.sentinel out/community/.sentinel

We can then update the Neo4j version environment variable:

$ export NEO4J_VERSION="3.2.3-SNAPSHOT"

And then repeat the Docker commands above. You’ll need to sub in your own Docker Hub user and repository names.

I’m using these custom images as part of Kubernetes deployments but you can use them anywhere that accepts a Docker container.

If anything on the post didn’t make sense or you want more clarification let me know @markhneedham.

Written by Mark Needham

July 26th, 2017 at 10:20 pm

Posted in neo4j

Tagged with ,

Neo4j: apoc.date.parse – java.lang.IllegalArgumentException: Illegal pattern character ‘T’ / java.text.ParseException: Unparseable date: “2012-11-12T08:46:15Z”

without comments

I often find myself wanting to convert date strings into Unix timestamps using Neo4j’s APOC library and unfortunately some sources don’t use the format that apoc.date.parse expects.

e.g.

return apoc.date.parse("2012-11-12T08:46:15Z",'s') 
AS ts
 
Failed to invoke function `apoc.date.parse`: 
Caused by: java.lang.IllegalArgumentException: java.text.ParseException: Unparseable date: "2012-11-12T08:46:15Z"

We need to define the format explicitly so the SimpleDataFormat documentation comes in handy. I tried the following:

return apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-ddTHH:mm:ssZ") 
AS ts
 
Failed to invoke function `apoc.date.parse`: 
Caused by: java.lang.IllegalArgumentException: Illegal pattern character 'T'

Hmmm, we need to quote the ‘T’ character – we can’t just include it in the pattern. Let’s try again:

return  apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-dd'T'HH:mm:ssZ") 
AS ts
 
Failed to invoke function `apoc.date.parse`: 
Caused by: java.lang.IllegalArgumentException: java.text.ParseException: Unparseable date: "2012-11-12T08:46:15Z"

The problem now is that we haven’t quoted the ‘Z’ but the error doesn’t indicate that – not sure why!

We can either quote the ‘Z’:

return  apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-dd'T'HH:mm:ss'Z'") 
AS ts
 
╒══════════╕
│"ts"      │
╞══════════╡
│1352709975│
└──────────┘

Or we can match the timezone using ‘XXX’:

return  apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-dd'T'HH:mm:ssXXX") 
AS ts
 
╒══════════╕
│"ts"      │
╞══════════╡
│1352709975│
└──────────┘

Written by Mark Needham

March 6th, 2017 at 8:52 pm

Posted in neo4j

Tagged with

Neo4j: Graphing the ‘My name is…I work’ Twitter meme

without comments

Over the last few days I’ve been watching the chain of ‘My name is…’ tweets kicked off by DHH with interest. As I understand it, the idea is to show that coding interview riddles/hard tasks on a whiteboard are ridiculous.

Other people quoted that tweet and added their own piece and yesterday Eduardo Hernacki suggested that traversing this chain of tweets seemed tailor made for Neo4j.

Michael was quickly on the scene and created a Cypher query which calls the Twitter API and creates a Neo4j graph from the resulting JSON response. The only tricky bit is creating a ‘bearer token’ but Jason Kotchoff has a helpful gist showing how to generate one from your Twitter consumer key and consumer secret.

Now that we’re got our bearer token let’s create a parameter to store it. Type the following in the Neo4j browser:

:param bearer: '<your-bearer-token-goes-here>'

Now we’re ready to query the Twitter API. We’ll start with the search API and find all tweets which contain the text ‘”my name” “I work”‘. That will return a JSON response containing lots of tweets. We’ll then create a node for each tweet it returns, a node for the user who posted the tweet, a node for the tweet it quotes, and relationships to glue them all together.

We’re going to use the apoc.load.jsonParams procedure from the APOC library to help us import the data. If you want to follow along you can use a Neo4j sandbox instance which comes with APOC installed. For your local Neo4j installation, grab the APOC jar and put it into your plugins folder before restarting Neo4j.

This is the query in full:

WITH 'https://api.twitter.com/1.1/search/tweets.json?count=100&result_type=recent&lang=en&q=' as url, {bearer} as bearer
 
CALL apoc.load.jsonParams(url + "%22my%20name%22%20is%22%20%22I%20work%22",{Authorization:"Bearer "+bearer},null) yield value
 
UNWIND value.statuses as status
WITH status, status.user as u, status.entities as e
WHERE status.quoted_status_id is not null
 
// create a node for the original tweet
MERGE (t:Tweet {id:status.id}) 
ON CREATE SET t.text=status.text,t.created_at=status.created_at,t.retweet_count=status.retweet_count, t.favorite_count=status.favorite_count
 
// create a node for the author + a POSTED relationship from the author to the tweet
MERGE (p:User {name:u.screen_name})
MERGE (p)-[:POSTED]->(t)
 
// create a MENTIONED relationship from the tweet to any users mentioned in the tweet
FOREACH (m IN e.user_mentions | MERGE (mu:User {name:m.screen_name}) MERGE (t)-[:MENTIONED]->(mu))
 
// create a node for the quoted tweet and create a QUOTED relationship from the original tweet to the quoted one
MERGE (q:Tweet {id:status.quoted_status_id})
MERGE (t)–[:QUOTED]->(q)
 
// repeat the above steps for the quoted tweet
WITH t as t0, status.quoted_status as status WHERE status is not null
WITH t0, status, status.user as u, status.entities as e
 
MERGE (t:Tweet {id:status.id}) 
ON CREATE SET t.text=status.text,t.created_at=status.created_at,t.retweet_count=status.retweet_count, t.favorite_count=status.favorite_count
 
MERGE (t0)-[:QUOTED]->(t)
 
MERGE (p:User {name:u.screen_name})
MERGE (p)-[:POSTED]->(t)
 
FOREACH (m IN e.user_mentions | MERGE (mu:User {name:m.screen_name}) MERGE (t)-[:MENTIONED]->(mu))
 
MERGE (q:Tweet {id:status.quoted_status_id})
MERGE (t)–[:QUOTED]->(q);

The resulting graph looks like this:

MATCH p=()-[r:QUOTED]->() RETURN p LIMIT 25

Graph  21

A more interesting query would be to find the path from DHH to Eduardo which we can find with the following query:

match path = (dhh:Tweet {id: 834146806594433025})<-[:QUOTED*]-(eduardo:Tweet{id: 836400531983724545})
UNWIND NODES(path) AS tweet
MATCH (tweet)<-[:POSTED]->(user)
RETURN tweet, user

This query:

  • starts from DHH’s tweet
  • traverses all QUOTED relationships until it finds Eduardo’s tweet
  • collects all those tweets and then finds the author
  • returns the tweet and the author

And this is the output:

Graph  20

I ran a couple of other queries against the Twitter API to hydrate some nodes that we hadn’t set all the properties on – you can see all the queries on this gist.

For the next couple of days I also have a sandbox running https://10-0-1-157-32898.neo4jsandbox.com/browser/. You can login using the credentials readonly/twitter.

If you have any questions/suggestions let me know in the comments, @markhneedham on twitter, or email the Neo4j DevRel team – devrel@neo4j.com.

Written by Mark Needham

February 28th, 2017 at 3:50 pm

Posted in neo4j

Tagged with ,

Neo4j: How do null values even work?

without comments

Every now and then I find myself wanting to import a CSV file into Neo4j and I always get confused with how to handle the various null values that can lurk within.

Let’s start with an example that doesn’t have a CSV file in sight. Consider the following list and my attempt to only return null values:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value = null
RETURN value
 
(no changes, no records)

Hmm that’s weird. I’d have expected that at least keep the first value in the collection. What about if we do the inverse?

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value <> null
RETURN value
 
(no changes, no records)

Still nothing! Let’s try returning the output of our comparisons rather than filtering rows:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
RETURN value = null AS outcome
 
╒═══════╤═════════╕
│"value"│"outcome"│
╞═══════╪═════════╡
│null   │null     │
├───────┼─────────┤
│"null" │null     │
├───────┼─────────┤
│""     │null     │
├───────┼─────────┤
│"Mark" │null     │
└───────┴─────────┘

Ok so that isn’t what we expected. Everything has an ‘outcome’ of ‘null’! What about if we want to check whether the value is the string “Mark”?

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
RETURN value = "Mark" AS outcome
 
╒═══════╤═════════╕
│"value"│"outcome"│
╞═══════╪═════════╡
│null   │null     │
├───────┼─────────┤
│"null" │false    │
├───────┼─────────┤
│""     │false    │
├───────┼─────────┤
│"Mark" │true     │
└───────┴─────────┘

From executing this query we learn that if one side of a comparison is null then the return value is always going to be null.

So how do we exclude a row if it’s null?

It turns out we have to use the ‘is’ keyword rather than using the equality operator. Let’s see what that looks like:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value is null
RETURN value
 
╒═══════╕
│"value"│
╞═══════╡
│null   │
└───────┘

And the positive case:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value is not null
RETURN value
 
╒═══════╕
│"value"│
╞═══════╡
│"null" │
├───────┤
│""     │
├───────┤
│"Mark" │
└───────┘

What if we want to get rid of empty strings?

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value <> ""
RETURN value
 
╒═══════╕
│"value"│
╞═══════╡
│"null" │
├───────┤
│"Mark" │
└───────┘

Interestingly that also gets rid of the null value which I hadn’t expected. But if we look for values matching the empty string:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value = ""
RETURN value
 
╒═══════╕
│"value"│
╞═══════╡
│""     │
└───────┘

It’s not there either! Hmm what’s going on here:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
RETURN value, value = "" AS isEmpty, value <> "" AS isNotEmpty
 
╒═══════╤═════════╤════════════╕
│"value"│"isEmpty"│"isNotEmpty"│
╞═══════╪═════════╪════════════╡
│null   │null     │null        │
├───────┼─────────┼────────────┤
│"null" │false    │true        │
├───────┼─────────┼────────────┤
│""     │true     │false       │
├───────┼─────────┼────────────┤
│"Mark" │false    │true        │
└───────┴─────────┴────────────┘

null values seem to get filtered out for every type of equality match unless we explicitly check that a value ‘is null’.

So how do we use this knowledge when we’re parsing CSV files using Neo4j’s LOAD CSV tool?

Let’s say we have a CSV file that looks like this:

$ cat nulls.csv
name,company
"Mark",
"Michael",""
"Will",null
"Ryan","Neo4j"

So none of the first three rows have a value for ‘company’. I don’t have any value at all, Michael has an empty string, and Will has a null value. Let’s see how LOAD CSV interprets this:

load csv with headers from "file:///nulls.csv" AS row
RETURN row
 
╒═════════════════════════════════╕
│"row"                            │
╞═════════════════════════════════╡
│{"name":"Mark","company":null}   │
├─────────────────────────────────┤
│{"name":"Michael","company":""}  │
├─────────────────────────────────┤
│{"name":"Will","company":"null"} │
├─────────────────────────────────┤
│{"name":"Ryan","company":"Neo4j"}│
└─────────────────────────────────┘

We’ve got the full sweep of all the combinations from above. We’d like to create a Person node for each row but only create a Company node and associated ‘WORKS_FOR’ relationshp if an actual company is defined – we don’t want to create a null company.

So we only want to create a company node and ‘WORKS_FOR’ relationship for the Ryan row.

The following query does the trick:

load csv with headers from "file:///nulls.csv" AS row
MERGE (p:Person {name: row.name})
WITH p, row
WHERE row.company <> "" AND row.company <> "null"
MERGE (c:Company {name: row.company})
MERGE (p)-[:WORKS_FOR]->(c)
 
Added 5 labels, created 5 nodes, set 5 properties, created 1 relationship, statement completed in 117 ms.

And if we visualise what’s been created:

Graph  15

Perfect. Perhaps this behaviour is obvious but it always trips me up so hopefully it’ll be useful to someone else as well!

There’s also a section on the Neo4j developer pages describing even more null scenarios that’s worth checking out.

Written by Mark Needham

February 22nd, 2017 at 11:28 pm

Posted in neo4j

Tagged with

Neo4j: Analysing a CSV file using LOAD CSV and Cypher

without comments

Last week we ran our first online meetup for several years and I wanted to wanted to analyse the stats that YouTube lets you download for an event.

The file I downloaded looked like this:

$ cat ~/Downloads/youtube_stats_pW9boJoUxO0.csv 
Video IDs:, pW9boJoUxO0, Start time:, Wed Feb 15 08:57:55 2017, End time:, Wed Feb 15 10:03:10 2017
Playbacks, Peak concurrent viewers, Total view time (hours), Average session length (minutes)
348, 112, 97.125, 16.7456896552, 
 
Country code, AR, AT, BE, BR, BY, CA, CH, CL, CR, CZ, DE, DK, EC, EE, ES, FI, FR, GB, HU, IE, IL, IN, IT, LB, LU, LV, MY, NL, NO, NZ, PK, PL, QA, RO, RS, RU, SE, TR, US, VN, ZA
Playbacks, 2, 2, 1, 14, 1, 10, 2, 1, 1, 1, 27, 1, 1, 1, 3, 1, 25, 54, 1, 4, 6, 8, 1, 1, 1, 1, 1, 23, 1, 1, 1, 1, 1, 1, 2, 6, 22, 1, 114, 1, 1
Peak concurrent viewers, 2, 1, 1, 4, 1, 5, 1, 1, 0, 0, 11, 1, 1, 1, 2, 1, 6, 25, 1, 3, 3, 2, 1, 1, 1, 1, 1, 9, 1, 1, 0, 1, 0, 1, 1, 3, 7, 0, 44, 1, 0
Total view time (hours), 1.075, 0.0166666666667, 0.175, 2.58333333333, 0.00833333333333, 3.01666666667, 0.858333333333, 0.0583333333333, 0.0, 0.0, 8.69166666667, 0.8, 0.0166666666667, 0.0583333333333, 0.966666666667, 0.0166666666667, 4.20833333333, 20.8333333333, 0.00833333333333, 1.39166666667, 1.75, 0.766666666667, 0.00833333333333, 0.15, 0.0333333333333, 1.05833333333, 0.0333333333333, 7.36666666667, 0.0583333333333, 0.916666666667, 0.0, 0.00833333333333, 0.0, 0.00833333333333, 0.4, 1.10833333333, 5.28333333333, 0.0, 32.7333333333, 0.658333333333, 0.0
Average session length (minutes), 32.25, 0.5, 10.5, 11.0714285714, 0.5, 18.1, 25.75, 3.5, 0.0, 0.0, 19.3148148148, 48.0, 1.0, 3.5, 19.3333333333, 1.0, 10.1, 23.1481481481, 0.5, 20.875, 17.5, 5.75, 0.5, 9.0, 2.0, 63.5, 2.0, 19.2173913043, 3.5, 55.0, 0.0, 0.5, 0.0, 0.5, 12.0, 11.0833333333, 14.4090909091, 0.0, 17.2280701754, 39.5, 0.0

I want to look at the country specific stats so the first 4 lines aren’t interesting to me:

$ tail -n+5 youtube_stats_pW9boJoUxO0.csv > youtube.csv

I then put the youtube.csv file into the import directory of Neo4j and wrote the following query to return a row representing each country and its score for each of the metrics:

load csv with headers from "file:///youtube.csv" AS row
WITH [key in keys(row) where key <> "Country code"] AS keys, row, row["Country code"] AS heading
UNWIND keys AS key
RETURN key AS country, heading AS key, row[key] AS value
 
╒═════════╤═══════════╤═══════╕
│"country"│"key"      │"value"│
╞═════════╪═══════════╪═══════╡
│" SE"    │"Playbacks"│"22"   │
├─────────┼───────────┼───────┤
│" GB"    │"Playbacks"│"54"   │
├─────────┼───────────┼───────┤
│" FR"    │"Playbacks"│"25"   │
├─────────┼───────────┼───────┤
│" RS"    │"Playbacks"│"2"    │
├─────────┼───────────┼───────┤
│" LV"    │"Playbacks"│"1"    │
└─────────┴───────────┴───────┘

Now I want to create a node representing each country and create a property for each of the metrics. Since the property names are going to be dynamic I’ll make use of the APOC library which I drop into my plugins directory. I then tweaked the query to create the nodes:

load csv with headers from "https://dl.dropboxusercontent.com/u/14493611/youtube.csv" AS row
WITH [key in keys(row) where key <> "Country code"] AS keys, row, row["Country code"] AS heading
UNWIND keys AS key
WITH key AS country, heading AS key, row[key] AS value
MERGE (c:Country {name: replace(country, " ", "")})
WITH *
CALL apoc.create.setProperty(c, key, toInteger(value))
YIELD node
RETURN COUNT(*)

We can now see which country provided the most viewers:

MATCH (n:Country) 
RETURN n.name, n.Playbacks AS playbacks, n.`Total view time (hours)` AS viewTimeInHours, n.`Peak concurrent viewers` AS peakConcViewers, n.`Average session length (minutes)` AS aveSessionMins
ORDER BY playbacks DESC
LIMIT 10
 
╒════════╤═══════════╤═════════════════╤═════════════════╤════════════════╕
│"n.name"│"playbacks"│"viewTimeInHours"│"peakConcViewers"│"aveSessionMins"│
╞════════╪═══════════╪═════════════════╪═════════════════╪════════════════╡
│"US"    │"114"      │"32"             │"44"             │"17"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"GB"    │"54"       │"20"             │"25"             │"23"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"DE"    │"27"       │"8"              │"11"             │"19"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"FR"    │"25"       │"4"              │"6"              │"10"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"NL"    │"23"       │"7"              │"9"              │"19"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"SE"    │"22"       │"5"              │"7"              │"14"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"BR"    │"14"       │"2"              │"4"              │"11"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"CA"    │"10"       │"3"              │"5"              │"18"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"IN"    │"8"        │"0"              │"2"              │"5"             │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"IL"    │"6"        │"1"              │"3"              │"17"            │
└────────┴───────────┴─────────────────┴─────────────────┴────────────────┘

The United States in first unsurprisingly followed by the UK, Germany, and France. We ran the meetup at 5pm UK time so it was a friendly enough time for this side of the globe but not so friendly for Asia or Australia so it’s not too surprising we don’t see anybody from there!

For my last trick I wanted to see the full names of the countries so I downloaded the 2 digit codes for each country along with their full name.

I then updated my graph:

load csv with headers from "file:///countries.csv" AS row
MATCH (c:Country {name: row.Code})
SET c.fullName = row.Name;

Now let’s re-run our query and show the country fullnames instead:

MATCH (n:Country) 
RETURN n.fullName, n.Playbacks AS playbacks, n.`Total view time (hours)` AS viewTimeInHours, n.`Peak concurrent viewers` AS peakConcViewers, n.`Average session length (minutes)` AS aveSessionMins
ORDER BY playbacks DESC
LIMIT 10
 
╒════════════════╤═══════════╤═════════════════╤═════════════════╤════════════════╕
│"n.fullName"    │"playbacks"│"viewTimeInHours"│"peakConcViewers"│"aveSessionMins"│
╞════════════════╪═══════════╪═════════════════╪═════════════════╪════════════════╡
│"United States" │"114"      │"32"             │"44"             │"17"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"United Kingdom"│"54"       │"20"             │"25"             │"23"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Germany"       │"27"       │"8"              │"11"             │"19"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"France"        │"25"       │"4"              │"6"              │"10"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Netherlands"   │"23"       │"7"              │"9"              │"19"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Sweden"        │"22"       │"5"              │"7"              │"14"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Brazil"        │"14"       │"2"              │"4"              │"11"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Canada"        │"10"       │"3"              │"5"              │"18"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"India"         │"8"        │"0"              │"2"              │"5"             │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Israel"        │"6"        │"1"              │"3"              │"17"            │
└────────────────┴───────────┴─────────────────┴─────────────────┴────────────────┘

And that’s the end of my analysis with no relationships in sight!

Written by Mark Needham

February 19th, 2017 at 10:39 pm

Posted in neo4j

Tagged with ,

Neo4j: Graphing the ThoughtWorks Technology Radar

without comments

For a bit of Christmas holiday fun I thought it’d be cool to create a graph of the different blips on the ThoughtWorks Technology Radar and how the recommendations have changed over time.

I wrote a script to extract each blip (e.g. .NET Core) and the recommendation made in each radar that it appeared in. I ended up with a CSV file:

|----------------------------------------------+----------+-------------|
|  technology                                  | date     | suggestion  |
|----------------------------------------------+----------+-------------|
|  AppHarbor                                   | Mar 2012 | Trial       |
|  Accumulate-only data                        | Nov 2015 | Assess      |
|  Accumulate-only data                        | May 2015 | Assess      |
|  Accumulate-only data                        | Jan 2015 | Assess      |
|  Buying solutions you can only afford one of | Mar 2012 | Hold        |
|----------------------------------------------+----------+-------------|

I then wrote a Cypher script to create the following graph model:

2016 12 23 16 52 08

WITH ["Hold", "Assess", "Trial", "Adopt"] AS positions
UNWIND RANGE (0, size(positions) - 2) AS index
WITH positions[index] AS pos1, positions[index + 1] AS pos2
MERGE (position1:Position {value: pos1})
MERGE (position2:Position {value: pos2})
MERGE (position1)-[:NEXT]->(position2);
 
load csv with headers from "file:///blips.csv" AS row
MATCH (position:Position {value:  row.suggestion })
MERGE (tech:Technology {name:  row.technology })
MERGE (date:Date {value: row.date})
MERGE (recommendation:Recommendation {
  id: tech.name + "_" + date.value + "_" + position.value})
MERGE (recommendation)-[:ON_DATE]->(date)
MERGE (recommendation)-[:POSITION]->(position)
MERGE (recommendation)-[:TECHNOLOGY]->(tech);
 
match (date:Date)
SET date.timestamp = apoc.date.parse(date.value, "ms", "MMM yyyy");
 
MATCH (date:Date)
WITH date
ORDER BY date.timestamp
WITH COLLECT(date) AS dates
UNWIND range(0, size(dates)-2) AS index
WITH dates[index] as month1, dates[index+1] AS month2
MERGE (month1)-[:NEXT]->(month2);
 
MATCH (tech)<-[:TECHNOLOGY]-(reco:Recommendation)-[:ON_DATE]->(date)
WITH tech, reco, date
ORDER BY tech.name, date.timestamp
WITH tech, COLLECT(reco) AS recos
UNWIND range(0, size(recos)-2) AS index
WITH recos[index] AS reco1, recos[index+1] AS reco2
MERGE (reco1)-[:NEXT]->(reco2);

Note that I installed the APOC procedures library so that I could convert the string representation of a date into a timestamp using the apoc.date.parse function. The blips.csv file needs to go in the import directory of Neo4j.

Now we’re reading to write some queries.

The Technology Radar has 4 positions that can be taken for a given technology: Hold, Assess, Trial, and Adopt:

  • Hold: Process with Caution
  • Assess: Worth exploring with the goal of understanding how it will affect your enterprise.
  • Trial: Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.
  • Adopt: We feel strongly that the industry should be adopting these items. We use them when appropriate on our projects.

I was curious whether there had ever been a technology where the advice was initially to ‘Hold’ but had later changed to ‘Assess’. I wrote the following query to find out:

MATCH (pos1:Position {value:"Hold"})<-[:POSITION]-(reco)-[:TECHNOLOGY]->(tech),
      (pos2:Position {value:"Assess"})<-[:POSITION]-(otherReco)-[:TECHNOLOGY]->(tech),
      (reco)-[:ON_DATE]->(recoDate),
      (otherReco)-[:ON_DATE]->(otherRecoDate)
WHERE (reco)-[:NEXT]->(otherReco)
RETURN tech.name AS technology, otherRecoDate.value AS dateOfChange;
 
╒════════════╤══════════════╕
│"technology"│"dateOfChange"│
╞════════════╪══════════════╡
│"Azure"     │"Aug 2010"    │
└────────────┴──────────────┘

Only Azure! The page doesn’t have any explanation for the initial ‘Hold’ advice in April 2010 which was presumably just before ‘the cloud’ became prominent. What about the other way around? Are there any technologies where the suggestion was initially to ‘Assess’ but later to ‘Hold’?

MATCH (pos1:Position {value:"Assess"})<-[:POSITION]-(reco)-[:TECHNOLOGY]->(tech),
      (pos2:Position {value:"Hold"})<-[:POSITION]-(otherReco)-[:TECHNOLOGY]->(tech),
      (reco)-[:ON_DATE]->(recoDate),
      (otherReco)-[:ON_DATE]->(otherRecoDate)
WHERE (reco)-[:NEXT]->(otherReco)
RETURN tech.name AS technology, otherRecoDate.value AS dateOfChange;
 
╒═══════════════════════════════════╤══════════════╕
│"technology"                       │"dateOfChange"│
╞═══════════════════════════════════╪══════════════╡
│"RIA"                              │"Apr 2010"    │
├───────────────────────────────────┼──────────────┤
│"Backbone.js"                      │"Oct 2012"    │
├───────────────────────────────────┼──────────────┤
│"Pace-layered Application Strategy"│"Nov 2015"    │
├───────────────────────────────────┼──────────────┤
│"SPDY"                             │"May 2015"    │
├───────────────────────────────────┼──────────────┤
│"AngularJS"                        │"Nov 2016"    │
└───────────────────────────────────┴──────────────┘

A couple of these are Javascript libraries/frameworks so presumably the advice is now to use React instead. Let’s check:

MATCH (t:Technology)<-[:TECHNOLOGY]-(reco)-[:ON_DATE]->(date), (reco)-[:POSITION]->(pos)
WHERE t.name contains "React.js"
RETURN pos.value, date.value 
ORDER BY date.timestamp
 
╒═══════════╤════════════╕
│"pos.value"│"date.value"│
╞═══════════╪════════════╡
│"Assess"   │"Jan 2015"  │
├───────────┼────────────┤
│"Trial"    │"May 2015"  │
├───────────┼────────────┤
│"Trial"    │"Nov 2015"  │
├───────────┼────────────┤
│"Adopt"    │"Apr 2016"  │
├───────────┼────────────┤
│"Adopt"    │"Nov 2016"  │
└───────────┴────────────┘

Ember is also popular:

MATCH (t:Technology)<-[:TECHNOLOGY]-(reco)-[:ON_DATE]->(date), (reco)-[:POSITION]->(pos)
WHERE t.name contains "Ember"
RETURN pos.value, date.value 
ORDER BY date.timestamp
 
╒═══════════╤════════════╕
│"pos.value"│"date.value"│
╞═══════════╪════════════╡
│"Assess"   │"May 2015"  │
├───────────┼────────────┤
│"Assess"   │"Nov 2015"  │
├───────────┼────────────┤
│"Trial"    │"Apr 2016"  │
├───────────┼────────────┤
│"Adopt"    │"Nov 2016"  │
└───────────┴────────────┘

Let’s go on a different tangent and look at how many technologies were introduced in the most recent radar?

MATCH (date:Date {value: "Nov 2016"})<-[:ON_DATE]-(reco)
WHERE NOT (reco)<-[:NEXT]-()
RETURN COUNT(*) 
 
╒══════════╕
│"COUNT(*)"│
╞══════════╡
│"45"      │
└──────────┘

Wow, 45 new things! How were they spread across the different positions?

MATCH (date:Date {value: "Nov 2016"})<-[:ON_DATE]-(reco)-[:TECHNOLOGY]->(tech), 
      (reco)-[:POSITION]->(position)
WHERE NOT (reco)<-[:NEXT]-()
WITH position, COUNT(*) AS count, COLLECT(tech.name) AS technologies
ORDER BY LENGTH((position)-[:NEXT*]->()) DESC
RETURN position.value, count, technologies
 
╒════════════════╤═══════╤══════════════════════════════════════════════╕
│"position.value"│"count"│"technologies"                                │
╞════════════════╪═══════╪══════════════════════════════════════════════╡
│"Hold"          │"1"    │["Anemic REST"]                               │
├────────────────┼───────┼──────────────────────────────────────────────┤
│"Assess"        │"28"   │["Nuance Mix","Micro frontends","Three.js","Sc│
│                │       │ikit-learn","WebRTC","ReSwift","Vue.js","Elect│
│                │       │ron","Container security scanning","wit.ai","D│
│                │       │ifferential privacy","Rapidoid","OpenVR","AWS │
│                │       │Application Load Balancer","Tarantool","IndiaS│
│                │       │tack","Ethereum","axios","Bottled Water","Cass│
│                │       │andra carefully","ECMAScript 2017","FBSnapshot│
│                │       │Testcase","Client-directed query","JuMP","Cloj│
│                │       │ure.spec","HoloLens","Android-x86","Physical W│
│                │       │eb"]                                          │
├────────────────┼───────┼──────────────────────────────────────────────┤
│"Trial"         │"13"   │["tmate","Lightweight Architecture Decision Re│
│                │       │cords","APIs as a product","JSONassert","Unity│
│                │       │ beyond gaming","Galen","Enzyme","Quick and Ni│
│                │       │mble","Talisman","fastlane","Auth0","Pa11y","P│
│                │       │hoenix"]                                      │
├────────────────┼───────┼──────────────────────────────────────────────┤
│"Adopt"         │"3"    │["Grafana","Babel","Pipelines as code"]       │
└────────────────┴───────┴──────────────────────────────────────────────┘

Lots of new things to explore over the holidays! The CSV files, import script, and queries used in this post are all available on github if you want to play around with them.

Written by Mark Needham

December 23rd, 2016 at 5:40 pm

Posted in neo4j

Tagged with