Mark Needham

Thoughts on Software Development

Neo4j: Cypher – Getting the hang of MERGE

with 3 comments

I’ve been trying to get the hang of cypher’s MERGE function and started out by writing a small file to import some people with random properties using the java-faker library.

public class Merge {
    private static Label PERSON = DynamicLabel.label("Person");
 
    public static void main(String[] args) throws IOException {
        File dbFile = new File("/tmp/test-db");
        FileUtils.deleteRecursively(dbFile);
 
        Faker faker = new Faker();
        Random random = new Random();
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase(dbFile.getPath());
 
        Transaction tx = db.beginTx();
 
        for (int i = 0; i < 100000; i++) {
            Node person = db.createNode(PERSON);
 
            person.setProperty("name", faker.name());
            person.setProperty("firstName", faker.firstName());
            person.setProperty("lastName", faker.lastName());
            person.setProperty("country", faker.country());
            person.setProperty("age", random.nextInt(50));
        }
 
        tx.success();
        tx.close();
    }
}

We can write the following query to get back a sample of the people that have been imported:

$ MATCH (p:Person) RETURN p LIMIT 5;
==> +------------------------------------------------------------------------------------------------------------------+
==> | p                                                                                                                |
==> +------------------------------------------------------------------------------------------------------------------+
==> | Node[1344]{name:"Benton Swaniawski",firstName:"Rossie",lastName:"Ankunding",country:"Guadeloupe",age:30}         |
==> | Node[1345]{name:"Dagmar Bartell",firstName:"Ashlynn",lastName:"Watsica",country:"French Guiana",age:35}          |
==> | Node[1346]{name:"Ms. Missouri Gaylord",firstName:"Muriel",lastName:"Streich",country:"Chile",age:43}             |
==> | Node[1347]{name:"Melvina Heathcote",firstName:"Geovanni",lastName:"Marks",country:"United Arab Emirates",age:33} |
==> | Node[1348]{name:"Brendan Schaefer",firstName:"Dayne",lastName:"Haley",country:"Tokelau",age:24}                  |
==> +------------------------------------------------------------------------------------------------------------------+

We can use the MERGE function to ensure that a node with specific properties exists so we might write something like the following:

MERGE (p:Person {name: "Benton Swaniawski",
                 firstName:"Rossie",
                 lastName:"Ankunding", 
                 country:"Guadeloupe",
                 age:30})
RETURN p

If we have a look at the PROFILE output of the query we’d see something like the following:

UpdateGraph(commands=["
	MergeNodeAction(
		p,
		Map(firstName(1) -> Literal(Rossie), country(3) -> Literal(Guadeloupe), 
		name(0) -> Literal(Benton Swaniawski), 
		lastName(2) -> Literal(Ankunding), 
		age(4) -> Literal(30)),
		List(Person(0)),
		ArrayBuffer(Property(p,lastName(2)) == Literal(Ankunding), 
		Property(p,name(0)) == Literal(Benton Swaniawski), 
		Property(p,age(4)) == Literal(30), 
		Property(p,country(3)) == Literal(Guadeloupe),
		Property(p,firstName(1)) == Literal(Rossie)),
		List(LabelAction(p,LabelSetOp,List(Person(0))), 
		PropertySetAction(Property(p,name),Literal(Benton Swaniawski)), 
		PropertySetAction(Property(p,country),Literal(Guadeloupe)), 
		PropertySetAction(Property(p,age),Literal(30)), 
		PropertySetAction(Property(p,lastName),Literal(Ankunding)), 
		PropertySetAction(Property(p,firstName),
		Literal(Rossie))),
		List(),
		Some(PlainMergeNodeProducer(<function2>)))"], _rows=1, _db_hits=100219)

The bit that stands out is that there were 100,219 db_hits which slows down the query considerably.

If we want to use MERGE then we need to make sure that we have an index or constraint on one of the properties e.g.

$ CREATE INDEX ON :Person(name);
==> +-------------------+
==> | No data returned. |
==> +-------------------+
==> Indexes added: 1

If we look at the profile of that we’ll see that the number of db_hits has reduced as it’s now using the index to do part of the lookup that the MERGE requires:

UpdateGraph(commands=["
	MergeNodeAction(
		p,
		Map(firstName(1) -> Literal(Rossie), country(3) -> Literal(Guadeloupe), 
		name(0) -> Literal(Benton Swaniawski), 
		... 
		Some(PlainMergeNodeProducer(<function2>)))"], _rows=1, _db_hits=4)

We can go one step further by only including the property that’s acting as our ‘key’ (i.e. name) in the first part of the statement and setting the other properties only if necessary:

MERGE (p:Person {name: "Benton Swaniawski"})
ON CREATE SET p.firstName="Rossie", 
              p.lastName="Ankunding", 
              p.country="Guadeloupe",
              p.age=30
RETURN p

If we profile that query we can see that things have improved:

 UpdateGraph(commands=["MergeNodeAction(
	p,
	Map(name(0) -> Literal(Benton Swaniawski)),
	List(Person(0)),ArrayBuffer(),
	List(LabelAction(p,LabelSetOp,List(Person(0))), 
	PropertySetAction(Property(p,name),Literal(Benton Swaniawski)), 
	PropertySetAction(Property(p,firstName),Literal(Rossie)), 
	PropertySetAction(Property(p,lastName),Literal(Ankunding)), 
	PropertySetAction(Property(p,country),Literal(Guadeloupe)), 
	PropertySetAction(Property(p,age),Literal(30))),
	List(),
	Some(PlainMergeNodeProducer(<function2>)))"], _rows=1, _db_hits=0)

In some cases we might want to update a property every time that ‘key’ gets matched in a MERGE statement which we could do like so:

MERGE (p:Person {name: "Benton Swaniawski"})
ON MATCH SET p.times = COALESCE(p.times, 0) + 1
RETURN p

You can also use MERGE to create relationships but for now I just wanted to explore how it should be used in the context of nodes which I think I’ve now figured out.

Always happy to take tips on how to do things better though!

Be Sociable, Share!

Written by Mark Needham

December 10th, 2013 at 11:46 pm

Posted in neo4j

Tagged with ,

  • http://jotomo.de/ Johannes Mockenhaupt

    Just a minor point (style point) on transactions. Rather than writing

    Transaction tx = db.beginTx();
    // stuff
    tx.success();
    tx.close();

    the new and encouraged 2.0 way(tm) is to use Java 7’s try-with-resources construct:

    try (Transaction tx = db.beginTx()) {
    // stuff
    tx.success();
    }

    which IMHO reads very nicely and not only handles the success case, but also takes care of properly cleaning up the transaction if an exception occurs (see Javadoc of Transaction for more details).

  • http://www.markhneedham.com/blog Mark Needham

    @jotomo:disqus yep you’re right. I’d never used the try-with-resources construct before but since your comment I’ve been using it all over the place so thanks for the tip!

  • Pingback: Neo4j: Cypher – Using MERGE with schema indexes/constraints at Mark Needham