Archive for the ‘Java’ tag
clojure/Java Interop: The doto macro
I recently wrote about some code I’ve been playing with to import neo4j spatial data and while looking to simplify the code I came across the doto macro.
The doto macro allows us to chain method calls on an initial object and then returns the resulting object. e.g.
(doto (new java.util.HashMap) (.put "a" 1) (.put "b" 2)) -> {a=1, b=2}
In our case this comes in quite useful in the function used to create a stadium node which initially reads like this:
(defn create-stadium-node [db line] (let [stadium-node (.. db createNode)] (.. stadium-node (setProperty "wkt" (format "POINT(%s %s)" (:long line) (:lat line)))) (.. stadium-node (setProperty "name" (:stadium line))) stadium-node))
Here we first create a node, set a couple of properties on the node and then return it.
Using the macro it would read like this:
(defn create-stadium-node [db line] (doto (.. db createNode) (.setProperty "wkt" (format "POINT(%s %s)" (:long line) (:lat line))) (.setProperty "name" (:stadium line))))
We can also use it to close the transaction at the end of our function although we don't actually have a need for the transaction object which gets returned:
# the end of our main function (.. tx success) (.. tx finish)
...becomes…
(doto tx (.success) (.finish))
As far as I can tell this is pretty similar in functionality to the Object#tap function in Ruby:
{}.tap { |x| x[:a] = 1; x[:b] = 2 } => {:a=>1, :b=>2}
Either way it's a pretty neat way of simplifying code.
clojure/Java Interop – Importing neo4j spatial data
I wrote a post about a week ago where I described how I’d added football stadiums to my football graph using neo4j spatial and after I’d done that I wanted to put it into my import script along with the rest of the data.
I thought leiningen would probably work quite well for this as you can point it at a Java class and have it be executed.
To start with I had to change the import code slightly to link stadiums to teams which have already been added to the graph:
package main.java; // imports excluded public class StadiumsImport { public static void main(String[] args) throws IOException { List<String> lines = readFile("data/stadiums.csv"); EmbeddedGraphDatabase db = new EmbeddedGraphDatabase("neo4j-community-1.9.M04/data/graph.db"); Index<Node> stadiumsIndex = createSpatialIndex(db, "stadiumsLocation"); Transaction tx = db.beginTx(); for (String stadium : lines) { String[] columns = stadium.split(","); Index<Node> teamsIndex = db.index().forNodes("teams"); String team = columns[1].replaceAll("\"",""); Node teamNode = teamsIndex.get("name", team).getSingle(); if(teamNode != null) { Node stadiumNode = db.createNode(); stadiumNode.setProperty("wkt", String.format("POINT(%s %s)", columns[4], columns[3])); stadiumNode.setProperty("name", columns[0].replaceAll("\"","")); stadiumsIndex.add(stadiumNode, "dummy", "value"); teamNode.createRelationshipTo(stadiumNode, DynamicRelationshipType.withName("play_at")); } } tx.success(); tx.finish(); } private static Index<Node> createSpatialIndex(EmbeddedGraphDatabase db, String indexName) { return db.index().forNodes(indexName, SpatialIndexProvider.SIMPLE_WKT_CONFIG); } // readFile excluded }
I’ve excluded some bits of the code for brevity but it’s on this gist if you’re interested.
The only change from last week’s version is that we’re now looking up the team that a stadium belongs to and creating a ‘play_at’ relationship from the team to the stadium.
I was then able to execute that code by calling ‘lein run’ based on the following project.clj file:
(defproject neo4jfootball "1.0.0-SNAPSHOT"
:description "neo4j football project"
:main "main.java.StadiumsImport"
:dependencies [[org.clojure/clojure "1.4.0"]
[org.neo4j/neo4j-spatial "0.11-SNAPSHOT"]
[clojure-csv/clojure-csv "2.0.0-alpha1"]]
:jvm-opts ["-Xmx2g"]
:plugins [[lein-idea "1.0.1"]]
:repositories {"local" ~(str (.toURI (java.io.File. "maven_repository")))}
:java-source-paths ["src/main/java"] )I’m using a local Maven repository to store the neo4j spatial JAR. The Maven entry was created by executed the following command from where I had the neo4j spatial project checked out:
mvn install:install-file -Dfile=target/neo4j-spatial-0.11-SNAPSHOT.jar -DartifactId=neo4j-spatial -Dversion=0.11-SNAPSHOT -DgroupId=org.neo4j -Dpackaging=jar -DlocalRepositoryPath=/path/to/neo4j-football/maven_repository -DpomFile=pom.xml
That worked reasonably well but I thought it’d be interesting to see what the above code would look like if it was written in clojure instead.
This is what I ended up with:
(ns neo4jfootball.core (:require [clojure-csv.core :as csv]) (:use clojure.java.io) (:import (org.neo4j.kernel EmbeddedGraphDatabase) (org.neo4j.gis.spatial.indexprovider SpatialIndexProvider) (org.neo4j.graphdb DynamicRelationshipType))) (defn take-csv [fname] (with-open [file (reader fname)] (csv/parse-csv (slurp file)))) (defn transform [line] {:stadium (get line 0) :team (get line 1) :lat (get line 3) :long (get line 4)}) (def not-nil? (comp not nil?)) (defn create-stadium-node [db line] (let [stadium-node (.. db createNode)] (.. stadium-node (setProperty "wkt" (format "POINT(%s %s)" (:long line) (:lat line)))) (.. stadium-node (setProperty "name" (:stadium line))) stadium-node)) (defn -main [] (do (let [db (new EmbeddedGraphDatabase "neo4j-community-1.9.M04/data/graph.db") tx (.beginTx db) stadiums-index (.. db index (forNodes "stadiumsLocation" (SpatialIndexProvider/SIMPLE_WKT_CONFIG))) teams-index (.. db index (forNodes "teams"))] (doseq [line (drop 1 (map transform (take-csv "data/stadiums.csv")))] (let [team-node (.. teams-index (get "name" (:team line)) getSingle)] (if (not-nil? team-node) (let [stadium-node (create-stadium-node db line)] (.. stadiums-index (add stadium-node "dummy" "value")) (.. team-node (createRelationshipTo stadium-node (DynamicRelationshipType/withName "play_at"))))))) (.. tx success) (.. tx finish))))
The code is simplified quite a bit by using the clojure CSV library so I could probably have achieved similar in the Java version by using an equivalent library.
It’s a bit easier to see what properties of a row in the CSV file are being used where as a result of the transform function where we convert the array into a map.
It would have taken quite a bit more code to achieve a similar thing in Java so I didn’t bother.
The Java Interop page on the clojure website was quite useful for working out how to call the various methods on the Java API.
I’m mainly using the .. macro which allows us to chain Java method calls together. In a couple of cases we could just as easily have used the . macro instead.
We can then call this code from lein like so:
lein run -m neo4jfootball.core
Jersey: com.sun.jersey.api.client.ClientHandlerException: A message body reader for Java class [...] and MIME media type application/json was not found
We’ve used the Jersey library on the last couple of Java based applications that I’ve worked on and one thing we’ve done on both of them is write services that communicate with each other using JSON.
On both occasions we didn’t quite setup the Jersey client correctly and ended up with an error along these lines when making a call to an end point:
com.sun.jersey.api.client.ClientHandlerException: A message body reader for Java class java.util.ArrayList, and Java type java.util.ArrayList<com.blah.Message>, and MIME media type application/json was not found ! at com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:561) ! at com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:535) ! at com.sun.jersey.api.client.WebResource.handle(WebResource.java:696) ! at com.sun.jersey.api.client.WebResource.access$300(WebResource.java:74) ! at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:512) ... ! at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ! at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ! at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ! at java.lang.reflect.Method.invoke(Method.java:601) ! at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) ! at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) ! at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) ! at com.yammer.metrics.jersey.InstrumentedResourceMethodDispatchProvider$TimedRequestDispatcher.dispatch(InstrumentedResourceMethodDispatchProvider.java:34) ! at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) ! at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) ! at com.sun.jersey.server.impl.uri.rules.ResourceObjectRule.accept(ResourceObjectRule.java:100) ! at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) ! at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesR
To get around this problem we need to make sure we’ve added the ‘JacksonJsonProvider’ to the Jersey client config like so:
DefaultClientConfig defaultClientConfig = new DefaultClientConfig(); defaultClientConfig.getClasses().add(JacksonJsonProvider.class); Client client = Client.create(defaultClientConfig);
I’m pretty sure that’s documented somewhere in the depths of the Jersey wiki but since we’ve now ended up debugging this problem twice I thought it was worth writing down!
Java: java.lang.UnsupportedClassVersionError – Unsupported major.minor version 51.0
On my current project we’ve spent the last day or so setting up an environment where we can deploy a couple of micro services to.
Although the machines are Windows based we’re deploying the application onto a vagrant managed VM since the production environment will be a flavour of Linux.
Initially I was getting quite confused about whether or not we were in the VM or not and ended up with this error when trying to run the compiled JAR:
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/whatever/SomeService : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(Unknown Source)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$000(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
Could not find the main class: com/whatever/SomeService. Program will exit.These error means that you compiled the code with a higher JDK than the one you’re now trying to run it against.
Since I was accidentally trying to run the JAR against our Windows environment’s 1.6 JDK rather than the VM’s 1.7 JDK this is exactly what I was doing!
Muppet!
Java: Parsing CSV files
As I mentioned in a previous post I recently moved a bunch of neo4j data loading code from Ruby to Java and as part of that process I needed to parse some CSV files.
In Ruby I was using FasterCSV which became the standard CSV library from Ruby 1.9 but it’s been a while since I had to parse CSV files in Java so I wasn’t sure which library to use.
I needed a library which could parse a comma separated file where there might be commas in the values of one of the fields. I think that’s fairly standard behaviour in any CSV library but my googling led me to OpenCSV.
It can be downloaded from here and so far seems to do the job!
This is an example of how I’m using it:
String filePath = "/Users/mneedham/data/awesome-csv-file.csv"; CSVReader reader = new CSVReader(new FileReader(filePath), ','); List<String[]> csvEntries = reader.readAll(); Iterator<String[]> iterator = csvEntries.iterator(); while (iterator.hasNext()) { String[] row = iterator.next(); System.out.println("field 1: " + row[0]); }
There are more use cases described on the home page.
Java: Faking a closure with a factory to create a domain object
Recently we wanted to create a domain object which needed to have an external dependency in order to do a calculation and we wanted to be able to stub out that dependency in our tests.
Originally we were just new’ing up the dependency inside the domain class but that makes it impossible to control it’s value in a test.
Equally it didn’t seem like we should be passing that dependency into the constructor of the domain object since it’s not a piece of state which defines the object, just something that it uses.
We ended up with something similar to the following code where we have our domain object as an inner class:
public class FooFactory { private final RandomService randomService; public FooFactory(RandomService randomService) { this.randomService = randomService; } public Foo createFoo(String bar, int baz) { return new Foo(bar, baz); } class Foo { private String bar; private int baz; public Foo(String bar, int baz) { this.bar = bar; this.baz = baz; } public int awesomeStuff() { int random = randomService.random(bar, baz); return random * 3; } } }
A test on that code could then read like this:
public class FooFactoryTest { @Test public void createsAFoo() { RandomService randomService = mock(RandomService.class); when(randomService.random("bar", 12)).thenReturn(13); FooFactory.Foo foo = new FooFactory(randomService).createFoo("bar", 12); assertThat(foo.awesomeStuff(), equalTo(39)); } }
It’s a bit of a verbose way of getting around the problem but it seems to work reasonably well.
Java: Fooled by java.util.Arrays.asList
I’ve been playing around with the boilerpipe code base by writing some tests around it to check my understanding but ran into an interesting problem using java.util.Arrays.asList to pass a list into one of the functions.
I was testing the BlockProximityFusion class which is used to merge together adjacent text blocks.
I started off calling that class like this:
import static java.util.Arrays.asList; @Test public void willCallBlockProximityFustion() throws Exception { TextDocument document = new TextDocument(asList(contentBlock("some words"), contentBlock("followed by more words"))); BlockProximityFusion.MAX_DISTANCE_1.process(document); } private TextBlock contentBlock(String words) { TextBlock textBlock = new TextBlock(words, new BitSet(), wordCount(words), 0, 0, 0, 0); textBlock.setIsContent(true); return textBlock; }
Which blows up like this:
java.lang.UnsupportedOperationException at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion.process(BlockProximityFusion.java:115) at de.l3s.boilerpipe.filters.heuristics.BlockProximityFusionTest.willCallBlockProximityFustion(BlockProximityFusionTest.java:63)
The code around that area is trying to remove an element from an iterator…
113 114 115 116 117 | if (ok) { prevBlock.mergeNext(block); it.remove(); changes = true; } else { |
…which was created from the list that we passed into the constructor of TextDocument:
98 | for (Iterator<TextBlock> it = textBlocks.listIterator(offset); it |
The remove method is not implemented on the list created by ‘Arrays.asList’ which is weird since I thought it created an ArrayList which does implement remove!
I’ve now learnt that the ArrayList created by ‘Arrays.asList’ is actually a private inner class of Arrays and doesn’t implement the remove method!
Who knew…
Java/Scala: Runtime.exec hanging/in ‘pipe_w’ state
On the system that I’m currently working on we have a data ingestion process which needs to take zip files, unzip them and then import their contents into the database.
As a result we delegate from Scala code to the system unzip command like so:
def extract { var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to") var process: Process = null try { process = Runtime.getRuntime.exec(command) val exitCode = process.waitFor } catch { case e : Exception => // do some stuff } finally { // close the stream here } }
We ran into a problem where the unzipping process was hanging and executing ‘ps’ showed us that the ‘unzip’ process was stuck in the ‘pipe_w’ (pipe waiting) state which suggested that it was waiting for some sort of input.
After a bit of googling Duncan found this blog which explained that we needed to process the output stream from our process otherwise it might end up hanging
a.k.a. RTFM:
The Runtime.exec methods may not work well for special processes on certain native platforms, such as native windowing processes, daemon processes, Win16/DOS processes on Microsoft Windows, or shell scripts.
The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (Process.getOutputStream(), Process.getInputStream(), Process.getErrorStream()).
The parent process uses these streams to feed input to and get output from the subprocess.
Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.
For most of the zip files we presumably hadn’t been reaching the limit of the buffer because the list of files being sent to STDOUT by ‘unzip’ wasn’t that high.
In order to get around the problem we needed to gobble up the output stream from unzip like so:
import org.apache.commons.io.IOUtils def extract { var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to") var process: Process = null try { process = Runtime.getRuntime.exec(command) val thisVariableIsNeededToSuckDataFromUnzipDoNotRemove = "Output: " + IOUtils.readLines(process.getInputStream) val exitCode = process.waitFor } catch { case e : Exception => // do some stuff } finally { // close the stream here } }
We need to do the same thing with the error stream as well in case ‘unzip’ ends up overflowing that buffer as well.
On a couple of blog posts that we came across it was suggested that we should ‘gobble up’ the output and error streams on separate threads but we weren’t sure why exactly that was considered necessary…
If anyone knows then please let me know in the comments.
Java: Faking System.in
We ran a refactoring dojo a couple of days ago at ThoughtWorks University and in preparation I wrote some system level tests around the coding problem that we were going to use during the session.
It’s a command line application which is called through the main method of ‘Program’ and since there’s no dependency injection we need to be able to set System.in and System.out in order to do any testing.
My initial thinking was that it should be possible to fake System.in with the following code:
String input = "1\n9\n"; System.setIn(new ByteArrayInputStream(input.getBytes()));
This works fine when I just want to simulate one value being passed to System.in but it doesn’t work so well if I want to simulate passing more than one value because we had a BufferedReader being created each time we loop.
... while(true) { ... InputStreamReader inputStream = new InputStreamReader(System.in); BufferedReader reader = new BufferedReader(inputStream); ... }
This means that the second time System.in gets read it is empty.
Jim and I paired on the problem for a bit and came to the conclusion that we’d need to ‘stub’ the ‘read’ method of ‘InputStream’ if we wanted to be able to control exactly what was being returned by System.in.
We eventually ended up with the following StubbedInputStream:
class StubbedInputStream extends InputStream { private Queue<String> input; public StubbedInputStream(Queue<String> input) { this.input = input; } @Override public int read(byte[] bytes, int i, int i1) throws IOException { if(input.isEmpty()) { return -1; } int byteLocation = 0; for(byte b : input.remove().getBytes()) { bytes[byteLocation] = b; byteLocation++; } bytes[byteLocation] = "\n".getBytes()[0]; return byteLocation + 1; } public static InputStreamBuilder stubInputStream() { return new InputStreamBuilder(); } ... }
Which can be constructed using the following DSL:
System.setIn(stubInputStream().toReturn("1").then("9").atSomePoint());
The code we wrote is on github – I’m not sure that it covers every possible scenario that you might come up with but it does pass the tests that I’ve managed to come up with!
Writing a Java function in Clojure
A function that we had to write in Java on a project that I worked on recently needed to indicate whether there was a gap in a series of data points or not.
If there were gaps at the beginning or end of the sequence then that was fine but gaps in the middle of the sequence were not.
null, 1, 2, 3 => no gaps 1, 2, 3, null => no gaps 1, null, 2, 3 => gaps
The Java version looked a bit like this:
public boolean hasGaps(List<BigInteger> values) { Iterator<BigInteger> fromHead = values.iterator(); while (fromHead.hasNext() && fromHead.next() == null) { fromHead.remove(); } Collections.reverse(values); Iterator<BigInteger> fromTail = values.iterator(); while (fromTail.hasNext() && fromTail.next() == null) { fromTail.remove(); } return values.contains(null); }
We take the initial list and then remove all the null values from the beginning of it, then reverse the list and remove all the values from the end.
We then check if there’s a null value and if there is then it would indicate there is indeed a gap in the list.
To write this function in Clojure we can start off by using the ‘drop-while‘ function to get rid of the trailing nil values.
I started off with this attempt:
(defn has-gaps? [list] let [no-nils] [drop-while #(= % nil) list] no-nils)
Unfortunately that gives us the following error!
Can't take value of a macro: #'clojure.core/let (NO_SOURCE_FILE:16)
It thinks we’re trying to pass around the ‘let’ macro instead of evaluating it – I forgot to put in the brackets around the ‘let’!
I fixed that with this next version:
(defn has-gaps? [list] (let [no-nils] [drop-while nil? list] no-nils))
But again, no love:
java.lang.IllegalArgumentException: let requires an even number of forms in binding vector (NO_SOURCE_FILE:23)
The way I understand it the ‘let’ macro takes in a vector of bindings as its first argument and what I’ve done here is pass in two vectors instead of one.
In the bindings vector we need to ensure that there are an even number of forms so that each symbol can be bound to an expression.
I fixed this by putting the two vectors defined above into another vector:
(defn has-gaps? [list] (let [[no-nils] [(drop-while nil? list)]] no-nils))
We can simplify that further so that we don’t have nested vectors:
(defn has-gaps? [list] (let [no-nils (drop-while nil? list)] no-nils))
The next step was to make ‘no-nils’ a function so that I could make use of that function when the list was reversed as well:
(defn has-gaps? [list] (let [no-nils (fn [x] (drop-while nil? x))] (no-nils list)))
I then wrote the rest of the function to reverse the list and then check the remaining list for nil:
(defn has-gaps? [list] (let [[no-nils] [(fn [x] (drop-while nil? x))] [nils-removed] [(fn [x] ((comp no-nils reverse no-nils) x))]] (some nil? (nils-removed list))))
The ‘comp‘ function can be used to compose a set of functions which is what I needed.
It seemed like the ‘nils-removed’ function wasn’t really necessary so I inlined that:
(defn has-gaps? [list] (let [no-nils (fn [x] (drop-while nil? x))] (some nil? ((comp no-nils reverse no-nils) list))))
The function can now be used like this:
user=> (has-gaps? '(1 2 3)) nil user=> (has-gaps? '(nil 1 2 3)) nil user=> (has-gaps? '(1 2 3 nil)) nil user=> (has-gaps? '(1 2 nil 3)) true
I’d be intrigued to know if there’s a better way to do this.