Mark Needham

Thoughts on Software Development

Neo4j: Summarising neo4j-shell output

without comments

I frequently find myself trying to optimise a set of cypher queries and I tend to group them together in a script that I fed to the Neo4j shell.

When tweaking the queries it’s easy to make a mistake and end up not creating the same data so I decided to write a script which will show me the aggregates of all the commands executed.

I want to see the number of constraints created, indexes added, nodes, relationships and properties created. The first 2 don’t need to match across the scripts but the latter 3 should be the same.

I put together the following script:

import re
import sys
from tabulate import tabulate
 
lines = sys.stdin.readlines()
 
def search(term, line):
    m =  re.match(term + ": (.*)", line)
    return (int(m.group(1)) if m else 0)
 
nodes_created, relationships_created, constraints_added, indexes_added, labels_added, properties_set = 0, 0, 0, 0, 0, 0
for line in lines:
    nodes_created = nodes_created + search("Nodes created", line)
    relationships_created = relationships_created + search("Relationships created", line)
    constraints_added = constraints_added + search("Constraints added", line)
    indexes_added = indexes_added + search("Indexes added", line)
    labels_added = labels_added + search("Labels added", line)
    properties_set = properties_set + search("Properties set", line)
 
    time_match = re.match("real.*([0-9]+m[0-9]+\.[0-9]+s)$", line)
 
    if time_match:
        time = time_match.group(1)
 
table = [
            ["Constraints added", constraints_added],
            ["Indexes added", indexes_added],
            ["Nodes created", nodes_created],
            ["Relationships created", relationships_created],
            ["Labels added", labels_added],
            ["Properties set", properties_set],
            ["Time", time]
         ]
print tabulate(table)

Its input is the piped output of the neo4j-shell command which will contain a description of all the queries it executed.

$ cat import.sh
#!/bin/sh
 
{ ./neo4j-community-2.2.3/bin/neo4j stop; } 2>&1
rm -rf neo4j-community-2.2.3/data/graph.db/
{ ./neo4j-community-2.2.3/bin/neo4j start; } 2>&1
{ time ./neo4j-community-2.2.3/bin/neo4j-shell --file $1; } 2>&1

We can use the script in two ways.

Either we can pipe the output of our shell straight into it and just get the summary e.g.

$ ./import.sh local.import.optimised.cql | python summarise.py
 
---------------------  ---------
Constraints added      5
Indexes added          1
Nodes created          13249
Relationships created  32227
Labels added           21715
Properties set         36480
Time                   0m17.595s
---------------------  ---------

…or we can make use of the ‘tee’ function in Unix and pipe the output into stdout and into the file and then either tail the file on another window or inspect it afterwards to see the detailed timings. e.g.

$ ./import.sh local.import.optimised.cql | tee /tmp/output.txt |  python summarise.py
 
---------------------  ---------
Constraints added      5
Indexes added          1
Nodes created          13249
Relationships created  32227
Labels added           21715
Properties set         36480
Time                   0m11.428s
---------------------  ---------
$ tail -f /tmp/output.txt
+-------------+
| appearances |
+-------------+
| 3771        |
+-------------+
1 row
Nodes created: 3439
Properties set: 3439
Labels added: 3439
289 ms
+------------------------------------+
| appearances -> player, match, team |
+------------------------------------+
| 3771                               |
+------------------------------------+
1 row
Relationships created: 10317
1006 ms
...

My only dependency is the tabulate package to get the pretty table:

$ cat requirements.txt
 
tabulate==0.7.5

The cypher script I’m running creates a BBC football graph which is available as a github project. Feel free to grab it and play around – any problems let me know!

Written by Mark Needham

August 21st, 2015 at 8:59 pm

Posted in neo4j

Tagged with

Python: Extracting Excel spreadsheet into CSV files

without comments

I’ve been playing around with the Road Safety open data set and the download comes with several CSV files and an excel spreadsheet containing the legend.

There are 45 sheets in total and each of them looks like this:

2015 08 17 23 33 19

I wanted to create a CSV file for each sheet so that I can import the data set into Neo4j using the LOAD CSV command.

I came across the Python Excel website which pointed me at the xlrd library since I’m working with a pre 2010 Excel file.


The main documentation is very extensive but I found the github example much easier to follow.

I ended up with the following script which iterates through all but the first two sheets in the spreadsheet – the first two sheets contain instructions rather than data:

from xlrd import open_workbook
import csv
 
wb = open_workbook('Road-Accident-Safety-Data-Guide-1979-2004.xls')
 
for i in range(2, wb.nsheets):
    sheet = wb.sheet_by_index(i)
    print sheet.name
    with open("data/%s.csv" %(sheet.name.replace(" ","")), "w") as file:
        writer = csv.writer(file, delimiter = ",")
        print sheet, sheet.name, sheet.ncols, sheet.nrows
 
        header = [cell.value for cell in sheet.row(0)]
        writer.writerow(header)
 
        for row_idx in range(1, sheet.nrows):
            row = [int(cell.value) if isinstance(cell.value, float) else cell.value
                   for cell in sheet.row(row_idx)]
            writer.writerow(row)

I’ve replaced spaces in the sheet name so that the file name on a disk is a bit easier to work with. For some reason the numeric values were all floats whereas I wanted them as ints so I had to explicitly apply that transformation.

Here are a few examples of what the CSV files look like:

$ cat data/1stPointofImpact.csv
code,label
0,Did not impact
1,Front
2,Back
3,Offside
4,Nearside
-1,Data missing or out of range
 
$ cat data/RoadType.csv
code,label
1,Roundabout
2,One way street
3,Dual carriageway
6,Single carriageway
7,Slip road
9,Unknown
12,One way street/Slip road
-1,Data missing or out of range
 
$ cat data/Weather.csv
code,label
1,Fine no high winds
2,Raining no high winds
3,Snowing no high winds
4,Fine + high winds
5,Raining + high winds
6,Snowing + high winds
7,Fog or mist
8,Other
9,Unknown
-1,Data missing or out of range

And that’s it. Not too difficult!

Written by Mark Needham

August 19th, 2015 at 11:27 pm

Posted in Python

Tagged with

Unix: Stripping first n bytes in a file / Byte Order Mark (BOM)

without comments

I’ve previously written a couple of blog posts showing how to strip out the byte order mark (BOM) from CSV files to make loading them into Neo4j easier and today I came across another way to clean up the file using tail.

The BOM is 3 bytes long at the beginning of the file so if we know that a file contains it then we can strip out those first 3 bytes tail like this:

$ time tail -c +4 Casualty7904.csv > Casualty7904_stripped.csv
 
real	0m31.945s
user	0m31.370s
sys	0m0.518s

The -c command is described thus;

-c number
             The location is number bytes.

So in this case we start reading at byte 4 (i.e. skipping the first 3 bytes) and then direct the output into a new file.

Although using tail is quite simple, it took 30 seconds to process a 300MB CSV file which might actually be slower than opening the file with a Hex editor and manually deleting the bytes!

Written by Mark Needham

August 19th, 2015 at 11:27 pm

Posted in Shell Scripting

Tagged with

Unix: Redirecting stderr to stdout

without comments

I’ve been trying to optimise some Neo4j import queries over the last couple of days and as part of the script I’ve been executed I wanted to redirect the output of a couple of commands into a file to parse afterwards.

I started with the following script which doesn’t do any explicit redirection of the output:

#!/bin/sh
 
./neo4j-community-2.2.3/bin/neo4j start

Now let’s run that script and redirect the output to a file:

$ ./foo.sh > /tmp/output.txt
Unable to find any JVMs matching version "1.7".
 
$ cat /tmp/output.txt
Starting Neo4j Server...WARNING: not changing user
process [48230]... waiting for server to be ready.... OK.
http://localhost:7474/ is ready.

So the line about not finding a matching JVM is being printed to stderr. That’s reasonably easy to fix:

#!/bin/sh
 
./neo4j-community-2.2.3/bin/neo4j start 2>&1

Let’s run the script again:

$ ./foo.sh > /tmp/output.txt
 
$ cat /tmp/output.txt
Unable to find any JVMs matching version "1.7".
Starting Neo4j Server...WARNING: not changing user
process [47989]... waiting for server to be ready.... OK.
http://localhost:7474/ is ready.

Great, that worked as expected. Next I extended the script to stop Neo4j, delete all it’s data, start it again and execute a cypher script:

#!/bin/sh
 
./neo4j-community-2.2.3/bin/neo4j start 2>&1
rm -rf neo4j-community-2.2.3/data/graph.db/
./neo4j-community-2.2.3/bin/neo4j start 2>&1
time ./neo4j-community-2.2.3/bin/neo4j-shell --file foo.cql 2>&1

Let’s run that script and redirect the output:

$ ./foo.sh > /tmp/output.txt
Unable to find any JVMs matching version "1.7".
 
real	0m0.604s
user	0m0.334s
sys	0m0.054s
 
$ cat /tmp/output.txt
Unable to find any JVMs matching version "1.7".
Another server-process is running with [50614], cannot start a new one. Exiting.
Unable to find any JVMs matching version "1.7".
Another server-process is running with [50614], cannot start a new one. Exiting.
+---------+
| "hello" |
+---------+
| "hello" |
+---------+
1 row
4 ms

It looks like our stderr -> stdout redirection on the last line didn’t work. My understanding is that the ‘time’ command swallows all the arguments that follow whereas we want the redirection to be run afterwards.

We can work our way around this problem by putting the actual command in a code block and redirected the output of that:

#!/bin/sh
 
./neo4j-community-2.2.3/bin/neo4j start 2>&1
rm -rf neo4j-community-2.2.3/data/graph.db/
./neo4j-community-2.2.3/bin/neo4j start 2>&1
{ time ./neo4j-community-2.2.3/bin/neo4j-shell --file foo.cql; } 2>&1
$ ./foo.sh  > /tmp/output.txt
 
$ cat /tmp/output.txt
Unable to find any JVMs matching version "1.7".
Another server-process is running with [50614], cannot start a new one. Exiting.
Unable to find any JVMs matching version "1.7".
Another server-process is running with [50614], cannot start a new one. Exiting.
Unable to find any JVMs matching version "1.7".
+---------+
| "hello" |
+---------+
| "hello" |
+---------+
1 row
4 ms
 
real	0m0.615s
user	0m0.316s
sys	0m0.050s

Much better!

Written by Mark Needham

August 15th, 2015 at 3:55 pm

Posted in Shell Scripting

Tagged with

Sed: Using environment variables

without comments

I’ve been playing around with the BBC football data set that I wrote about a couple of months ago and I wanted to write some code that would take the import script and replace all instances of remote URIs with a file system path.

For example the import file contains several lines similar to this:

LOAD CSV WITH HEADERS 
FROM "https://raw.githubusercontent.com/mneedham/neo4j-bbc/master/data/matches.csv" 
AS row

And I want that to read:

LOAD CSV WITH HEADERS 
FROM "file:///Users/markneedham/repos/neo4j-bbc/data/matches.csv" 
AS row

The start of that path also happens to be my working directory:

$ echo $PWD
/Users/markneedham/repos/neo4j-bbc

So I wanted to write a script that would look for occurrences of ‘https://raw.githubusercontent.com/mneedham/neo4j-bbc/master’ and replace it with $PWD. I’m a fan of Sed so I thought I’d try and use it to solve my problem.

The first thing we can do to make life easy is to change the default delimiter. Sed usually uses ‘/’ to separate parts of the command but since we’re using URIs that’s going to be horrible so we’ll use an underscore instead.

For a first cut I tried just removing that first part of the URI but not replacing it with anything in particular:

$ sed 's_https://raw.githubusercontent.com/mneedham/neo4j-bbc/master__' import.cql
 
$ sed 's_https://raw.githubusercontent.com/mneedham/neo4j-bbc/master__' import.cql | grep LOAD
LOAD CSV WITH HEADERS FROM "/data/matches.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/players.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/players.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/fouls.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/attempts.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/attempts.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/corners.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/corners.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/cards.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/cards.csv" AS row
LOAD CSV WITH HEADERS FROM "/data/subs.csv" AS row

Cool! That worked as expected. Now let’s try and replace it with $PWD:

$ sed 's_https://raw.githubusercontent.com/mneedham/neo4j-bbc/master_file://$PWD_' import.cql | grep LOAD
LOAD CSV WITH HEADERS FROM "file://$PWD/data/matches.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/players.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/players.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/fouls.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/attempts.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/attempts.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/corners.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/corners.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/cards.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/cards.csv" AS row
LOAD CSV WITH HEADERS FROM "file://$PWD/data/subs.csv" AS row

Hmmm that didn’t work as expected. The $PWD is being treated as a literal instead of being evaluated like we want it to be.

It turns out this is a popular question on Stack Overflow and there are lots of suggestions – I tried a few of them and found that single quotes did the trick:

$ sed 's_https://raw.githubusercontent.com/mneedham/neo4j-bbc/master_file://'$PWD'_' import.cql | grep LOAD
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/matches.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/players.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/players.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/fouls.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/attempts.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/attempts.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/corners.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/corners.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/cards.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/cards.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/subs.csv" AS row

We could also use double quotes everywhere if we prefer:

$ sed "s_https://raw.githubusercontent.com/mneedham/neo4j-bbc/master_file://"$PWD"_" import.cql | grep LOAD
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/matches.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/players.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/players.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/fouls.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/attempts.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/attempts.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/corners.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/corners.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/cards.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/cards.csv" AS row
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/repos/neo4j-bbc/data/subs.csv" AS row

Written by Mark Needham

August 13th, 2015 at 7:30 pm

Posted in Shell Scripting

Tagged with

Java: Jersey – java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper. getContextClassLoaderPA()Ljava/security/PrivilegedAction;

without comments

I’ve been trying to put some tests around an Neo4j unmanaged extension I’ve been working on and ran into the following stack trace when launching the server using the Neo4j test harness:

public class ExampleResourceTest {
    @Rule
    public Neo4jRule neo4j = new Neo4jRule()
            .withFixture("CREATE (:Person {name: 'Mark'})")
            .withFixture("CREATE (:Person {name: 'Nicole'})")
            .withExtension( "/unmanaged", ExampleResource.class );
 
    @Test
    public void shouldReturnAllTheNodes() {
        // Given
        URI serverURI = neo4j.httpURI();
        // When
        HTTP.Response response = HTTP.GET(serverURI.resolve("/unmanaged/example/people").toString());
 
        // Then
        assertEquals(200, response.status());
        List content = response.content();
 
        assertEquals(2, content.size());
    }
}
07:51:32.985 [main] WARN  o.e.j.u.component.AbstractLifeCycle - FAILED o.e.j.s.ServletContextHandler@29eda4f8{/unmanaged,null,STARTING}: java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper.getContextClassLoaderPA()Ljava/security/PrivilegedAction;
java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper.getContextClassLoaderPA()Ljava/security/PrivilegedAction;
	at com.sun.jersey.spi.scanning.AnnotationScannerListener.<init>(AnnotationScannerListener.java:94) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.spi.scanning.PathProviderScannerListener.<init>(PathProviderScannerListener.java:59) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.ScanningResourceConfig.init(ScanningResourceConfig.java:79) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.init(PackagesResourceConfig.java:104) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.<init>(PackagesResourceConfig.java:78) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.<init>(PackagesResourceConfig.java:89) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:696) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:674) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:203) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:374) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:557) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at javax.servlet.GenericServlet.init(GenericServlet.java:244) ~[javax.servlet-api-3.1.0.jar:3.1.0]
	at org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:612) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletHolder.initialize(ServletHolder.java:395) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:871) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:298) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741) ~[jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61) [jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.Server.start(Server.java:387) [jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61) [jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.Server.doStart(Server.java:354) [jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.neo4j.server.web.Jetty9WebServer.startJetty(Jetty9WebServer.java:381) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.web.Jetty9WebServer.start(Jetty9WebServer.java:184) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.AbstractNeoServer.startWebServer(AbstractNeoServer.java:474) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:230) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.harness.internal.InProcessServerControls.start(InProcessServerControls.java:59) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.neo4j.harness.internal.InProcessServerBuilder.newServer(InProcessServerBuilder.java:72) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.neo4j.harness.junit.Neo4jRule$1.evaluate(Neo4jRule.java:64) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.junit.rules.RunRules.evaluate(RunRules.java:20) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) [junit-4.11.jar:na]
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) [junit-4.11.jar:na]
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309) [junit-4.11.jar:na]
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160) [junit-4.11.jar:na]
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78) [junit-rt.jar:na]
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212) [junit-rt.jar:na]
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68) [junit-rt.jar:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_51]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_51]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_51]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_51]
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) [idea_rt.jar:na]
07:51:32.991 [main] WARN  o.e.j.u.component.AbstractLifeCycle - FAILED org.eclipse.jetty.server.handler.HandlerList@2b22a1cc[o.e.j.s.h.MovedContextHandler@5e671e20{/,null,AVAILABLE}, o.e.j.s.ServletContextHandler@29eda4f8{/unmanaged,null,STARTING}, o.e.j.s.ServletContextHandler@62573c86{/db/manage,null,null}, o.e.j.s.ServletContextHandler@2418ba04{/db/data,null,null}, o.e.j.w.WebAppContext@14229fa7{/browser,jar:file:/Users/markneedham/.m2/repository/org/neo4j/app/neo4j-browser/2.2.3/neo4j-browser-2.2.3.jar!/browser,null}, o.e.j.s.ServletContextHandler@2ab0702e{/,null,null}]: java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper.getContextClassLoaderPA()Ljava/security/PrivilegedAction;
java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper.getContextClassLoaderPA()Ljava/security/PrivilegedAction;
	at com.sun.jersey.spi.scanning.AnnotationScannerListener.<init>(AnnotationScannerListener.java:94) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.spi.scanning.PathProviderScannerListener.<init>(PathProviderScannerListener.java:59) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.ScanningResourceConfig.init(ScanningResourceConfig.java:79) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.init(PackagesResourceConfig.java:104) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.<init>(PackagesResourceConfig.java:78) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.<init>(PackagesResourceConfig.java:89) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:696) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:674) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:203) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:374) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:557) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at javax.servlet.GenericServlet.init(GenericServlet.java:244) ~[javax.servlet-api-3.1.0.jar:3.1.0]
	at org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:612) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletHolder.initialize(ServletHolder.java:395) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:871) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:298) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741) ~[jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61) [jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.Server.start(Server.java:387) [jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61) [jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.Server.doStart(Server.java:354) [jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.neo4j.server.web.Jetty9WebServer.startJetty(Jetty9WebServer.java:381) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.web.Jetty9WebServer.start(Jetty9WebServer.java:184) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.AbstractNeoServer.startWebServer(AbstractNeoServer.java:474) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:230) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.harness.internal.InProcessServerControls.start(InProcessServerControls.java:59) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.neo4j.harness.internal.InProcessServerBuilder.newServer(InProcessServerBuilder.java:72) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.neo4j.harness.junit.Neo4jRule$1.evaluate(Neo4jRule.java:64) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.junit.rules.RunRules.evaluate(RunRules.java:20) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) [junit-4.11.jar:na]
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) [junit-4.11.jar:na]
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309) [junit-4.11.jar:na]
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160) [junit-4.11.jar:na]
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78) [junit-rt.jar:na]
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212) [junit-rt.jar:na]
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68) [junit-rt.jar:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_51]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_51]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_51]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_51]
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) [idea_rt.jar:na]
07:51:33.013 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@19962194{HTTP/1.1}{localhost:7475}
07:51:33.014 [main] WARN  o.e.j.u.component.AbstractLifeCycle - FAILED org.eclipse.jetty.server.Server@481e91b6: java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper.getContextClassLoaderPA()Ljava/security/PrivilegedAction;
java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper.getContextClassLoaderPA()Ljava/security/PrivilegedAction;
	at com.sun.jersey.spi.scanning.AnnotationScannerListener.<init>(AnnotationScannerListener.java:94) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.spi.scanning.PathProviderScannerListener.<init>(PathProviderScannerListener.java:59) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.ScanningResourceConfig.init(ScanningResourceConfig.java:79) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.init(PackagesResourceConfig.java:104) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.<init>(PackagesResourceConfig.java:78) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.api.core.PackagesResourceConfig.<init>(PackagesResourceConfig.java:89) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:696) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:674) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:203) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:374) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:557) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at javax.servlet.GenericServlet.init(GenericServlet.java:244) ~[javax.servlet-api-3.1.0.jar:3.1.0]
	at org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:612) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletHolder.initialize(ServletHolder.java:395) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:871) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:298) ~[jetty-servlet-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741) ~[jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) ~[jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132) ~[jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) ~[jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61) ~[jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) ~[jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132) ~[jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.Server.start(Server.java:387) ~[jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) ~[jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61) ~[jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.server.Server.doStart(Server.java:354) ~[jetty-server-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) ~[jetty-util-9.2.4.v20141103.jar:9.2.4.v20141103]
	at org.neo4j.server.web.Jetty9WebServer.startJetty(Jetty9WebServer.java:381) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.web.Jetty9WebServer.start(Jetty9WebServer.java:184) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.AbstractNeoServer.startWebServer(AbstractNeoServer.java:474) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:230) [neo4j-server-2.2.3.jar:2.2.3]
	at org.neo4j.harness.internal.InProcessServerControls.start(InProcessServerControls.java:59) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.neo4j.harness.internal.InProcessServerBuilder.newServer(InProcessServerBuilder.java:72) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.neo4j.harness.junit.Neo4jRule$1.evaluate(Neo4jRule.java:64) [neo4j-harness-2.2.3.jar:2.2.3]
	at org.junit.rules.RunRules.evaluate(RunRules.java:20) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) [junit-4.11.jar:na]
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) [junit-4.11.jar:na]
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) [junit-4.11.jar:na]
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309) [junit-4.11.jar:na]
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160) [junit-4.11.jar:na]
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78) [junit-rt.jar:na]
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212) [junit-rt.jar:na]
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68) [junit-rt.jar:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_51]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_51]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_51]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_51]
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) [idea_rt.jar:na]
 
org.neo4j.server.ServerStartupException: Starting Neo4j Server failed: com.sun.jersey.core.reflection.ReflectionHelper.getContextClassLoaderPA()Ljava/security/PrivilegedAction;
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:258)
	at org.neo4j.harness.internal.InProcessServerControls.start(InProcessServerControls.java:59)
	at org.neo4j.harness.internal.InProcessServerBuilder.newServer(InProcessServerBuilder.java:72)
	at org.neo4j.harness.junit.Neo4jRule$1.evaluate(Neo4jRule.java:64)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper.getContextClassLoaderPA()Ljava/security/PrivilegedAction;
	at com.sun.jersey.spi.scanning.AnnotationScannerListener.<init>(AnnotationScannerListener.java:94)
	at com.sun.jersey.spi.scanning.PathProviderScannerListener.<init>(PathProviderScannerListener.java:59)
	at com.sun.jersey.api.core.ScanningResourceConfig.init(ScanningResourceConfig.java:79)
	at com.sun.jersey.api.core.PackagesResourceConfig.init(PackagesResourceConfig.java:104)
	at com.sun.jersey.api.core.PackagesResourceConfig.<init>(PackagesResourceConfig.java:78)
	at com.sun.jersey.api.core.PackagesResourceConfig.<init>(PackagesResourceConfig.java:89)
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:696)
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:674)
	at com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:203)
	at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:374)
	at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:557)
	at javax.servlet.GenericServlet.init(GenericServlet.java:244)
	at org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:612)
	at org.eclipse.jetty.servlet.ServletHolder.initialize(ServletHolder.java:395)
	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:871)
	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:298)
	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
	at org.eclipse.jetty.server.Server.start(Server.java:387)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
	at org.eclipse.jetty.server.Server.doStart(Server.java:354)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
	at org.neo4j.server.web.Jetty9WebServer.startJetty(Jetty9WebServer.java:381)
	at org.neo4j.server.web.Jetty9WebServer.start(Jetty9WebServer.java:184)
	at org.neo4j.server.AbstractNeoServer.startWebServer(AbstractNeoServer.java:474)
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:230)
	... 22 more

I was a bit baffled at first but if we look closely about 5 – 10 lines down the stack trace we can see the mistake I’ve made:

	at com.sun.jersey.api.core.PackagesResourceConfig.(PackagesResourceConfig.java:89) ~[jersey-server-1.19.jar:1.19]
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:696) ~[jersey-servlet-1.17.1.jar:1.17.1]
	at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:674) ~[jersey-servlet-1.17.1.jar:1.17.1]

We have different versions of the Jersey libraries. Since we’re writing the extension for Neo4j 2.2.3 we can quickly check which version it depends on:

$ ls -alh neo4j-community-2.2.3/system/lib/ | grep jersey
-rwxr-xr-x@  1 markneedham  staff   426K 22 Jun 04:57 jersey-core-1.19.jar
-rwxr-xr-x@  1 markneedham  staff    52K 22 Jun 05:02 jersey-multipart-1.19.jar
-rwxr-xr-x@  1 markneedham  staff   686K 22 Jun 05:02 jersey-server-1.19.jar
-rwxr-xr-x@  1 markneedham  staff   126K 22 Jun 05:02 jersey-servlet-1.19.jar

So we should’t have 1.17.1 in our project and it was easy enough to find my mistake by looking in the pom file:

<properties>
...
    <jersey.version>1.17.1</jersey.version>
...
</properties>
 
<dependencies>
...
 
    <dependency>
        <groupId>com.sun.jersey</groupId>
        <artifactId>jersey-servlet</artifactId>
        <version>${jersey.version}</version>
    </dependency>
 
...
</dependencies>

Also easy enough to fix!

<properties>
...
    <jersey.version>1.19</jersey.version>
...
</properties>

You can see an example of this working on github.

Written by Mark Needham

August 11th, 2015 at 6:59 am

Posted in Java

Tagged with ,

Neo4j 2.2.3: Unmanaged extensions – Creating gzipped streamed responses with Jetty

without comments

Back in 2013 I wrote a couple of blog posts showing examples of an unmanaged extension which had a streamed and gzipped response but two years on I realised they were a bit out of date and deserved a refresh.

When writing unmanaged extensions in Neo4j a good rule of thumb is to try and reduce the amount of objects you keep hanging around. In this context this means that we should stream our response to the client as quickly as possible rather than building it up in memory and sending it in one go.

The documentation has a good example showing how to stream a list of colleagues but in this blog post we’ll look at how to do something simpler – we’ll create a couple of nodes representing people and then write an unmanaged extension to return them.

We’ll first create an unmanaged extension which runs a cypher query, iterates through the rows returned and sends them to the client:

@Path("/example")
public class ExampleResource {
    private final GraphDatabaseService db;
    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
 
    public ExampleResource(@Context GraphDatabaseService db) {
        this.db = db;
    }
 
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    @Path("/people")
    public Response allNodes() throws IOException {
        StreamingOutput stream = streamQueryResponse("MATCH (n:Person) RETURN n.name AS name");
        return Response.ok().entity(stream).type(MediaType.APPLICATION_JSON).build();
    }
 
    private StreamingOutput streamQueryResponse(final String query) {
        return new StreamingOutput() {
                @Override
                public void write(OutputStream os) throws IOException, WebApplicationException {
                    JsonGenerator jg = OBJECT_MAPPER.getJsonFactory().createJsonGenerator(os, JsonEncoding.UTF8);
                    jg.writeStartArray();
 
                    writeQueryResultTo(query, jg);
 
                    jg.writeEndArray();
                    jg.flush();
                    jg.close();
                }
            };
    }
 
    private void writeQueryResultTo(String query, JsonGenerator jg) throws IOException {
        try (Result result = db.execute(query)) {
            while (result.hasNext()) {
                Map<String, Object> row = result.next();
 
                jg.writeStartObject();
                for (Map.Entry<String, Object> entry : row.entrySet()) {
                    jg.writeFieldName(entry.getKey());
                    jg.writeString(entry.getValue().toString());
                }
                jg.writeEndObject();
            }
        }
    }
}

There’s nothing too complicated going on here although notice that we make much more fine grained calls to the JSON Library rather than created a JSON object in memory and calling ObjectMapper#writeValueAsString on it.

To get this to work we’d build a JAR containing this class, put that into the plugins folder and then add the following property to conf/neo4j-server.properties (or the Neo4j desktop equivalent) before restarting the server:

org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.unmanaged=/unmanaged

We can then test it out like this:

$ curl http://localhost:7474/unmanaged/example/people
[{"name":"Mark"},{"name":"Nicole"}]

I’ve put in a couple of test people nodes – full instructions are available on the github README page.

Next we want to make it possible to send that response in the gzip format. To do that we need to add a GzipFilter to the Neo4j lifecycle. This class has moved to a different namespace in Jetty 9 which Neo4j 2.2.3 depends on, but the following class does the job:

import org.eclipse.jetty.servlets.GzipFilter;
 
public class GZipInitialiser implements SPIPluginLifecycle {
    private WebServer webServer;
 
    @Override
    public Collection<Injectable<?>> start(NeoServer neoServer) {
        webServer = getWebServer(neoServer);
        GzipFilter filter = new GzipFilter();
 
        webServer.addFilter(filter, "/*");
        return Collections.emptyList();
    }
 
    private WebServer getWebServer(final NeoServer neoServer) {
        if (neoServer instanceof AbstractNeoServer) {
            return ((AbstractNeoServer) neoServer).getWebServer();
        }
        throw new IllegalArgumentException("expected AbstractNeoServer");
    }
 
    @Override
    public Collection<Injectable<?>> start(GraphDatabaseService graphDatabaseService, Configuration configuration) {
        throw new IllegalAccessError();
    }
 
    @Override
    public void stop() {
 
    }
}

I needed to include the jersey-servlets JAR in my unmanaged extension JAR in order for this to work correctly. Once we redeploy the JAR and restart Neo4j we can try making the same request as above but with a gzip header:

$ curl -v -H "Accept-Encoding:gzip,deflate" http://localhost:7474/unmanaged/example/people
��V�K�MU�R�M,�V�Ձ��2��sR�jcf(#

We can unpack that on the fly by piping it through gunzip to check we get a sensible result:

$ curl -v -H "Accept-Encoding:gzip,deflate" http://localhost:7474/unmanaged/example/people | gunzip
[{"name":"Mark"},{"name":"Nicole"}]

And there we have it – a gzipped streamed response. All the code is on github so give it a try and give me a shout if it doesn’t work. The fastest way to get me is probably on our new shiny neo4j-users Slack group.

Written by Mark Needham

August 10th, 2015 at 11:57 pm

Posted in neo4j

Tagged with

Record Linkage: Playing around with Duke

without comments

I’ve become quite interesting in record linkage recently and came across the Duke project which provides some tools to help solve this problem. I thought I’d give it a try.

The typical problem when doing record linkage is that we have two records from different data sets which represent the same entity but don’t have a common key that we can use to merge them together. We therefore need to come up with a heuristic that will allow us to do so.

Duke has a few examples showing it in action and I decided to go with the linking countries one. Here we have countries from Dbpedia and the Mondial database and we want to link them together.

The first thing we need to do is build the project:

export JAVA_HOME=`/usr/libexec/java_home`
mvn clean package -DskipTests

At the time of writing this will put a zip fail containing everything we need at duke-dist/target/. Let’s unpack that:

unzip duke-dist/target/duke-dist-1.3-SNAPSHOT-bin.zip

Next we need to download the data files and Duke configuration file:

wget https://raw.githubusercontent.com/larsga/Duke/master/doc/example-data/countries-dbpedia.csv
wget https://raw.githubusercontent.com/larsga/Duke/master/doc/example-data/countries.xml
wget https://raw.githubusercontent.com/larsga/Duke/master/doc/example-data/countries-mondial.csv
wget https://raw.githubusercontent.com/larsga/Duke/master/doc/example-data/countries-test.txt

Now we’re ready to give it a go:

java -cp "duke-dist-1.3-SNAPSHOT/lib/*" no.priv.garshol.duke.Duke --testfile=countries-test.txt --testdebug --showmatches countries.xml
 
...
 
NO MATCH FOR:
ID: '7706', NAME: 'guatemala', AREA: '108890', CAPITAL: 'guatemala city',
 
MATCH 0.9825124555160142
ID: '10052', NAME: 'pitcairn islands', AREA: '47', CAPITAL: 'adamstown',
ID: 'http://dbpedia.org/resource/Pitcairn_Islands', NAME: 'pitcairn islands', AREA: '47', CAPITAL: 'adamstown',
 
Correct links found: 200 / 218 (91.7%)
Wrong links found: 0 / 24 (0.0%)
Unknown links found: 0
Percent of links correct 100.0%, wrong 0.0%, unknown 0.0%
Records with no link: 18
Precision 100.0%, recall 91.74311926605505%, f-number 0.9569377990430622

We can look in countries.xml to see how the similarity between records is being calculated:

  <schema>
    <threshold>0.7</threshold>
...
    <property>
      <name>NAME</name>
      <comparator>no.priv.garshol.duke.comparators.Levenshtein</comparator>
      <low>0.09</low>
      <high>0.93</high>
    </property>
    <property>
      <name>AREA</name>
      <comparator>no.priv.garshol.duke.comparators.NumericComparator</comparator>
      <low>0.04</low>
      <high>0.73</high>
    </property>
    <property>
      <name>CAPITAL</name>
      <comparator>no.priv.garshol.duke.comparators.Levenshtein</comparator>
      <low>0.12</low>
      <high>0.61</high>
    </property>
  </schema>

So we’re working out similarity of the capital city and country by calculating their Levenshtein distance i.e. the minimum number of single-character edits required to change one word into the other

This works very well if there is a typo or difference in spelling in one of the data sets. However, I was curious what would happen if the country had two completely different names e.g Cote d’Ivoire is sometimes know as Ivory Coast. Let’s try changing the country name in one of the files:

"19147","Cote dIvoire","Yamoussoukro","322460"
java -cp "duke-dist-1.3-SNAPSHOT/lib/*" no.priv.garshol.duke.Duke --testfile=countries-test.txt --testdebug --showmatches countries.xml
 
NO MATCH FOR:
ID: '19147', NAME: 'ivory coast', AREA: '322460', CAPITAL: 'yamoussoukro',

I also tried it out with the BBC and ESPN match reports of the Man Utd vs Tottenham match – the BBC references players by surname, while ESPN has their full names.

When I compared the full name against surname using the Levenshtein comparator there were no matches as you’d expect. I had to split the ESPN names up into first name and surname to get the linking to work.

Equally when I varied the team name’s to be ‘Man Utd’ rather than ‘Manchester United’ and ‘Tottenham’ rather than ‘Tottenham Hotspur’ that didn’t work either.

I think I probably need to write a domain specific comparator but I’m also curious whether I could come up with a bunch of training examples and then train a model to detect what makes two records similar. It’d be less deterministic but perhaps more robust.

Written by Mark Needham

August 8th, 2015 at 10:50 pm

Spark: Convert RDD to DataFrame

without comments

As I mentioned in a previous blog post I’ve been playing around with the Databricks Spark CSV library and wanted to take a CSV file, clean it up and then write out a new CSV file containing some of the columns.

I started by processing the CSV file and writing it into a temporary table:

import org.apache.spark.sql.{SQLContext, Row, DataFrame}
 
val sqlContext = new SQLContext(sc)
val crimeFile = "Crimes_-_2001_to_present.csv"
sqlContext.load("com.databricks.spark.csv", Map("path" -> crimeFile, "header" -> "true")).registerTempTable("crimes")

I wanted to get to the point where I could call the following function which writes a DataFrame to disk:

private def createFile(df: DataFrame, file: String, header: String): Unit = {
  FileUtil.fullyDelete(new File(file))
  val tmpFile = "tmp/" + System.currentTimeMillis() + "-" + file
  df.distinct.save(tmpFile, "com.databricks.spark.csv")
}

The first file only needs to contain the primary type of crime, which we can extract with the following query:

val rows = sqlContext.sql("select `Primary Type` as primaryType FROM crimes LIMIT 10")
 
rows.collect()
res4: Array[org.apache.spark.sql.Row] = Array([ASSAULT], [ROBBERY], [CRIMINAL DAMAGE], [THEFT], [THEFT], [BURGLARY], [THEFT], [BURGLARY], [THEFT], [CRIMINAL DAMAGE])

Some of the primary types have trailing spaces which I want to get rid of. As far as I can tell Spark’s variant of SQL doesn’t have the LTRIM or RTRIM functions but we can map over ‘rows’ and use the String ‘trim’ function instead:

rows.map { case Row(primaryType: String) => Row(primaryType.trim) }
res8: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[29] at map at DataFrame.scala:776

Now we’ve got an RDD of Rows which we need to convert back to a DataFrame again. ‘sqlContext’ has a function which we might be able to use:

sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => Row(primaryType.trim) })
 
<console>:27: error: overloaded method value createDataFrame with alternatives:
  [A <: Product](data: Seq[A])(implicit evidence$4: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and>
  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$3: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
 cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row])
              sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => Row(primaryType.trim) })
                         ^

These are the signatures we can choose from:

2015 08 06 21 58 12

If we want to pass in an RDD of type Row we’re going to have to define a StructType or we can convert each row into something more strongly typed:

case class CrimeType(primaryType: String)
 
sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) })
res14: org.apache.spark.sql.DataFrame = [primaryType: string]

Great, we’ve got our DataFrame which we can now plug into the ‘createFile’ function like so:

createFile(
  sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) }),
  "/tmp/crimeTypes.csv",
  "crimeType:ID(CrimeType)")

We can actually do better though!

Since we’ve got an RDD of a specific class we can make use of the ‘rddToDataFrameHolder’ implicit function and then the ‘toDF’ function on ‘DataFrameHolder’. This is what the code looks like:

import sqlContext.implicits._
createFile(
  rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) }.toDF(),
  "/tmp/crimeTypes.csv",
  "crimeType:ID(CrimeType)")

And we’re done!

Written by Mark Needham

August 6th, 2015 at 9:11 pm

Posted in Spark

Tagged with

Spark: pyspark/Hadoop – py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

without comments

I’ve been playing around with pyspark – Spark’s Python library – and I wanted to execute the following job which takes a file from my local HDFS and then counts how many times each FBI code appears using Spark SQL:

from pyspark import SparkContext
from pyspark.sql import SQLContext
 
sc = SparkContext("local", "Simple App")
sqlContext = SQLContext(sc)
 
file = "hdfs://localhost:9000/user/markneedham/Crimes_-_2001_to_present.csv"
 
sqlContext.load(source="com.databricks.spark.csv", header="true", path = file).registerTempTable("crimes")
rows = sqlContext.sql("select `FBI Code` AS fbiCode, COUNT(*) AS times FROM crimes GROUP BY `FBI Code` ORDER BY times DESC").collect()
 
for row in rows:
    print("{0} -> {1}".format(row.fbiCode, row.times))

I submitted the job and waited:

$ ./spark-1.3.0-bin-hadoop1/bin/spark-submit --driver-memory 5g --packages com.databricks:spark-csv_2.10:1.1.0 fbi_spark.py
...
Traceback (most recent call last):
  File "/Users/markneedham/projects/neo4j-spark-chicago/fbi_spark.py", line 11, in <module>
    sqlContext.load(source="com.databricks.spark.csv", header="true", path = file).registerTempTable("crimes")
  File "/Users/markneedham/projects/neo4j-spark-chicago/spark-1.3.0-bin-hadoop1/python/pyspark/sql/context.py", line 482, in load
    df = self._ssql_ctx.load(source, joptions)
  File "/Users/markneedham/projects/neo4j-spark-chicago/spark-1.3.0-bin-hadoop1/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/Users/markneedham/projects/neo4j-spark-chicago/spark-1.3.0-bin-hadoop1/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.
: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
	at org.apache.hadoop.ipc.Client.call(Client.java:1070)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
	at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
	at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
	at org.apache.spark.rdd.RDD.take(RDD.scala:1156)
	at org.apache.spark.rdd.RDD.first(RDD.scala:1189)
	at com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:129)
	at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:127)
	at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:109)
	at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:62)
	at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:115)
	at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:40)
	at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:28)
	at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290)
	at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679)
	at org.apache.spark.sql.SQLContext.load(SQLContext.scala:667)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
	at py4j.Gateway.invoke(Gateway.java:259)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:207)
	at java.lang.Thread.run(Thread.java:745)

It looks like my Hadoop Client and Server are using different versions which in fact they are! We can see from the name of the spark folder that I’m using Hadoop 1.x there and if we check the local Hadoop version we’ll notice it’s using the 2.x seris:

$ hadoop version
Hadoop 2.6.0

In this case the easiest fix is use a version of Spark that’s compiled against Hadoop 2.6 which as of now means Spark 1.4.1.

Let’s try and run our job again:

$ ./spark-1.4.1-bin-hadoop2.6/bin/spark-submit --driver-memory 5g --packages com.databricks:spark-csv_2.10:1.1.0 fbi_spark.py
 
06 -> 859197
08B -> 653575
14 -> 488212
18 -> 457782
26 -> 431316
05 -> 257310
07 -> 197404
08A -> 188964
03 -> 157706
11 -> 112675
04B -> 103961
04A -> 60344
16 -> 47279
15 -> 40361
24 -> 31809
10 -> 22467
17 -> 17555
02 -> 17008
20 -> 15190
19 -> 10878
22 -> 8847
09 -> 6358
01A -> 4830
13 -> 1561
12 -> 835
01B -> 16

And it’s working!

Written by Mark Needham

August 4th, 2015 at 6:35 am

Posted in Spark

Tagged with