Mark Needham

Thoughts on Software Development

Neo4j: LOAD CSV – Handling empty columns

with 3 comments

A common problem that people encounter when trying to import CSV files into Neo4j using Cypher’s LOAD CSV command is how to handle empty or ‘null’ entries in said files.

For example let’s try and import the following file which has 3 columns, 1 populated, 2 empty:

$ cat /tmp/foo.csv
a,b,c
mark,,
load csv with headers from "file:/tmp/foo.csv" as row
MERGE (p:Person {a: row.a})
SET p.b = row.b, p.c = row.c
RETURN p

When we execute that query we’ll see that our Person node has properties ‘b’ and ‘c’ with no value:

==> +-----------------------------+
==> | p                           |
==> +-----------------------------+
==> | Node[5]{a:"mark",b:"",c:""} |
==> +-----------------------------+
==> 1 row
==> Nodes created: 1
==> Properties set: 3
==> Labels added: 1
==> 26 ms

That isn’t what we want – we don’t want those properties to be set unless they have a value.

TO achieve this we need to introduce a conditional when setting the ‘b’ and ‘c’ properties. We’ll assume that ‘a’ is always present as that’s the key for our Person nodes.

The following query will do what we want:

load csv with headers from "file:/tmp/foo.csv" as row
MERGE (p:Person {a: row.a})
FOREACH(ignoreMe IN CASE WHEN trim(row.b) <> "" THEN [1] ELSE [] END | SET p.b = row.b)
FOREACH(ignoreMe IN CASE WHEN trim(row.c) <> "" THEN [1] ELSE [] END | SET p.c = row.c)
RETURN p

Since there’s no if or else statements in cypher we create our own conditional statement by using FOREACH. If there’s a value in the CSV column then we’ll loop once and set the property and if not we won’t loop at all and therefore no property will be set.

==> +-------------------+
==> | p                 |
==> +-------------------+
==> | Node[4]{a:"mark"} |
==> +-------------------+
==> 1 row
==> Nodes created: 1
==> Properties set: 1
==> Labels added: 1
Be Sociable, Share!

Written by Mark Needham

August 22nd, 2014 at 12:51 pm

Posted in neo4j

Tagged with

  • Marco Guado

    What is the best strategy to import data to the deployed server.

  • Nice, Mark! And, if we want also a (hence the entire node) to be created only if present in the csv (a good use case is if we have a openRefine csv using records as sets of rows), we could use the same syntax putting the entire MERGE clause in the FOREACH, as this:

    load csv with headers from "file:/tmp/foo.csv" as row
    FOREACH(ignoreMe IN CASE WHEN trim(row.a) "" THEN [1] ELSE [] END | MERGE (p:Person {a: row.a}) )

    Is it right? (I gave it a try and it works).

  • Arijeet Patidar

    Is there a better way to do this than writing 1000 case statements, if instead of only two properties ‘b’ and ‘c’ I had to do this check for 1000 properties?