Mark Needham

Thoughts on Software Development

Archive for the ‘Clojure’ Category

Neo4j’s Cypher vs Clojure – Group by and Sorting

without comments

One of the points that I emphasised during my talk on building Neo4j backed applications using Clojure last week is understanding when to use Cypher to solve a problem and when to use the programming language.

A good example of this is in the meetup application I’ve been working on. I have a collection of events and want to display past events in descending order and future events in ascending order.

First let’s create some future and some past events based on the current timestamp of 1404006050535:

CREATE (event1:Event {name: "Future Event 1", timestamp: 1414002772427 })
CREATE (event2:Event {name: "Future Event 2", timestamp: 1424002772427 })
CREATE (event3:Event {name: "Future Event 3", timestamp: 1416002772427 })
 
CREATE (event4:Event {name: "Past Event 1", timestamp: 1403002772427 })
CREATE (event5:Event {name: "Past Event 2", timestamp: 1402002772427 })

If we return all the events we see the following:

$ MATCH (e:Event) RETURN e;
==> +------------------------------------------------------------+
==> | e                                                          |
==> +------------------------------------------------------------+
==> | Node[15414]{name:"Future Event 1",timestamp:1414002772427} |
==> | Node[15415]{name:"Future Event 2",timestamp:1424002772427} |
==> | Node[15416]{name:"Future Event 3",timestamp:1416002772427} |
==> | Node[15417]{name:"Past Event 1",timestamp:1403002772427}   |
==> | Node[15418]{name:"Past Event 2",timestamp:1402002772427}   |
==> +------------------------------------------------------------+
==> 5 rows
==> 13 ms

We can achieve the desired grouping and sorting with the following cypher query:

(def sorted-query "MATCH (e:Event)
WITH COLLECT(e) AS events
WITH [e IN events WHERE e.timestamp <= timestamp()] AS pastEvents,
     [e IN events WHERE e.timestamp > timestamp()] AS futureEvents
UNWIND pastEvents AS pastEvent
WITH pastEvent, futureEvents ORDER BY pastEvent.timestamp DESC
WITH COLLECT(pastEvent) as orderedPastEvents, futureEvents
UNWIND futureEvents AS futureEvent
WITH futureEvent, orderedPastEvents ORDER BY futureEvent.timestamp
RETURN COLLECT(futureEvent) AS orderedFutureEvents, orderedPastEvents")

We then use the following function to call through to the Neo4j server using the excellent neocons library:

(ns neo4j-meetup.db
  (:require [clojure.walk :as walk])
  (:require [clojurewerkz.neocons.rest.cypher :as cy])
  (:require [clojurewerkz.neocons.rest :as nr]))
 
(def NEO4J_HOST "http://localhost:7521/db/data/")
 
(defn cypher
  ([query] (cypher query {}))
  ([query params]
     (let [conn (nr/connect! NEO4J_HOST)]
       (->> (cy/tquery query params)
            walk/keywordize-keys))))

We call that function and grab the first row since we know there won’t be any other rows in the result:

(def query-result (->> ( db/cypher sorted-query) first))

Now we need to extract the past and future collections so that we can display them on the page which we can do like so:

> (map #(% :data) (query-result :orderedPastEvents))
({:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"})
 
> (map #(% :data) (query-result :orderedFutureEvents))
({:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1416002772427, :name "Future Event 3"} {:timestamp 1424002772427, :name "Future Event 2"})

An alternative approach is to return the events from cypher and then handle the grouping and sorting in clojure. In that case our query is much simpler:

(def unsorted-query "MATCH (e:Event) RETURN e")

We’ll use the clj-time library to determine the current time:

(def now (clj-time.coerce/to-long (clj-time.core/now)))

First let’s split the events into past and future:

> (def grouped-by-events 
     (->> (db/cypher unsorted-query)
          (map #(->> % :e :data))
          (group-by #(> (->> % :timestamp) now))))
 
> grouped-by-events
{true [{:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1424002772427, :name "Future Event 2"} {:timestamp 1416002772427, :name "Future Event 3"}], 
 false [{:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"}]}

And finally we sort appropriately using these functions:

(defn time-descending [row] (* -1 (->> row :timestamp)))
(defn time-ascending [row] (->> row :timestamp))
> (sort-by time-descending (get grouped-by-events false))
({:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"})
 
> (sort-by time-ascending (get grouped-by-events true))
({:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1416002772427, :name "Future Event 3"} {:timestamp 1424002772427, :name "Future Event 2"})

I used Clojure to do the sorting and grouping in my project because the query to get the events was a bit more complicated and became very difficult to read with the sorting and grouping mixed in.

Unfortunately cypher doesn’t provide an easy way to sort within a collection so we need our sorting in the row context and then collect the elements back again afterwards.

Written by Mark Needham

June 29th, 2014 at 2:56 am

Posted in Clojure,neo4j

Tagged with , ,

Clojure: Destructuring group-by’s output

with 2 comments

One of my favourite features of Clojure is that it allows you to destructure a data structure into values that are a bit easier to work with.

I often find myself referring to Jay Fields’ article which contains several examples showing the syntax and is a good starting point.

One recent use of destructuring I had was where I was working with a vector containing events like this:

user> (def events [{:name "e1" :timestamp 123} {:name "e2" :timestamp 456} {:name "e3" :timestamp 789}])

I wanted to split the events in two – those containing events with a timestamp greater than 123 and those less than or equal to 123.

After remembering that the function I wanted was group-by and not partition-by (I always make that mistake!) I had the following:

user> (group-by #(> (->> % :timestamp) 123) events)
{false [{:name "e1", :timestamp 123}], true [{:name "e2", :timestamp 456} {:name "e3", :timestamp 789}]}

I wanted to get 2 vectors that I could pass to the web page and this is fairly easy with destructuring:

user> (let [{upcoming true past false} (group-by #(> (->> % :timestamp) 123) events)] 
       (println upcoming) (println past))
[{:name e2, :timestamp 456} {:name e3, :timestamp 789}]
[{:name e1, :timestamp 123}]
nil

Simple!

Written by Mark Needham

May 31st, 2014 at 12:03 am

Posted in Clojure

Tagged with

Clojure: Create a directory

with 4 comments

I spent much longer than I should have done trying to work out how to create a directory in Clojure as part of an import script I’m working out so for my future self this is how you do it:

(.mkdir (java.io.File. "/path/to/dir/to/create"))

I’m creating a directory which contains today’s date so I’d want something like ‘members-2014-05-24′ if I was running it today. The clj-time library is very good for working with dates.

To create a folder containing today’s date this is what we’d have:

(ns neo4j-meetup.core
  (:require [clj-time.format :as f]))
 
(def format-as-year-month-day (f/formatter "yyyy-MM-dd"))
 
(defn create-directory-for-today []
  (let [date (f/unparse format-as-year-month-day (t/now))]
    (.mkdir (java.io.File. (str "data/members-" date)))))

Initial code shamelessly stolen from Shu Wang’s gist so thanks to him as well!

Written by Mark Needham

May 24th, 2014 at 12:12 am

Posted in Clojure

Tagged with

Clojure: Paging meetup data using lazy sequences

with one comment

I’ve been playing around with the meetup API to do some analysis on the Neo4j London meetup and one thing I wanted to do was download all the members of the group.

A feature of the meetup API is that each end point will only allow you to return a maximum of 200 records so I needed to make use of offsets and paging to retrieve everybody.

It seemed like a good chance to use some lazy sequences to keep track of the offsets and then stop making calls to the API once I wasn’t retrieving any more results.

I wrote the following functions to take care of that bit:

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))
 
(defn offsets []
  (unchunk (range)))
 
 
(defn get-all [api-fn]
  (flatten
   (take-while seq
               (map #(api-fn {:perpage 200 :offset % :orderby "name"}) (offsets)))))

I previously wrote about the chunking behaviour of lazy collections which meant that I ended up with a minimum of 32 calls to each URI which wasn’t what I had in mind!

To get all the members in the group I wrote the following function which is passed to get-all:

(:require [clj-http.client :as client])
 
(defn members
  [{perpage :perpage offset :offset orderby :orderby}]
  (->> (client/get
        (str "https://api.meetup.com/2/members?page=" perpage
             "&offset=" offset
             "&orderby=" orderby
             "&group_urlname=" MEETUP_NAME
             "&key=" MEETUP_KEY)
        {:as :json})
       :body :results))

So to get all the members we’d do this:

(defn all-members []
  (get-all members))

I’m told that using lazy collections when side effects are involved is a bad idea – presumably because the calls to the API might never end – but since I only run it manually I can just kill the process if anything goes wrong.

I’d be interested in how others would go about solving this problem – core.async was suggested but that seems to result in much more / more complicated code than this version.

The code is on github if you want to take a look.

Written by Mark Needham

April 30th, 2014 at 12:20 am

Posted in Clojure

Tagged with

Clojure: clj-time – Formatting a date / timestamp with day suffixes e.g. 1st, 2nd, 3rd

without comments

I’ve been using the clj-time library recently – a Clojure wrapper around Joda Time – and one thing I wanted to do is format a date with day suffixes e.g. 1st, 2nd, 3rd.

I started with the following timestamp:

1309368600000

The first step was to convert that into a DateTime object like so:

user> (require '[clj-time.coerce :as c])
user> (c/from-long 1309368600000)
#<DateTime 2011-06-29T17:30:00.000Z>

I wanted to output that date in the following format:

29th June 2011

We can get quite close by using a custom time formatter:

user> (require '[clj-time.format :as f])
nil
user> (f/unparse (f/formatter "d MMMM yyyy") (c/from-long 1309368600000))
"29 June 2011"

Unfortunately I couldn’t find anywhere in the documentation explaining how to get the elusive ‘th’ or ‘st’ to print. I was hoping for something similar to PHP date formatting:

2014 04 26 08 38 39

Eventually I came across a Stack Overflow post about Joda Time suggesting that you can’t actually format a day in the way I was hoping to.

So I now have the following function to do it for me:

(defn day-suffix [day]
  (let [stripped-day (if (< day 20) day (mod day 10))]
    (cond (= stripped-day 1) "st"
          (= stripped-day 2) "nd"
          (= stripped-day 3) "rd"
          :else "th")))

and the code to get the date in my favoured format looks like this:

user> (def my-time (c/from-long 1309368600000))
#'user/my-time
user> (def day (read-string (f/unparse (f/formatter "d") my-time)))
#'user/day
user> (str day (day-suffix day) " " (f/unparse (f/formatter "MMMM yyyy") my-time))
"29th June 2011"

I’m assuming there’s a better way but what is it?!

Written by Mark Needham

April 26th, 2014 at 7:50 am

Posted in Clojure

Tagged with

Clojure: Not so lazy sequences a.k.a chunking behaviour

with 3 comments

I’ve been playing with Clojure over the weekend and got caught out by the behaviour of lazy sequences due to chunking – something which was obvious to experienced Clojurians although not me.

I had something similar to the following bit of code which I expected to only evaluate the first item of the infinite sequence that the range function generates:

> (take 1 (map (fn [x] (println (str "printing..." x))) (range)))
(printing...0
printing...1
printing...2
printing...3
printing...4
printing...5
printing...6
printing...7
printing...8
printing...9
printing...10
printing...11
printing...12
printing...13
printing...14
printing...15
printing...16
printing...17
printing...18
printing...19
printing...20
printing...21
printing...22
printing...23
printing...24
printing...25
printing...26
printing...27
printing...28
printing...29
printing...30
printing...31
nil)

The reason this was annoying is because I wanted to shortcut the lazy sequence using take-while, much like the poster of this StackOverflow question.

As I understand it when we have a lazy sequence the granularity of that laziness is 32 items at a time a.k.a one chunk, something that Michael Fogus wrote about 4 years ago. This was a bit surprising to me but it sounds like it makes sense for the majority of cases.

However, if we want to work around that behaviour we can wrap the lazy sequence in the following unchunk function provided by Stuart Sierra:

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))

Now if we repeat our initial code we’ll see it only prints once:

> (take 1 (map (fn [x] (println (str "printing..." x))) (unchunk (range))))
(printing...0
nil)

Written by Mark Needham

April 6th, 2014 at 10:07 pm

Posted in Clojure

Tagged with

Clojure: Writing JSON to a file – “Exception Don’t know how to write JSON of class org.joda.time.DateTime”

with 2 comments

As I mentioned in an earlier post I’ve been transforming Clojure hash’s into JSON strings using data.json but ran into trouble while trying to parse a hash which contained a Joda Time DateTime instance.

The date in question was constructed like this:

(ns json-date-example
  (:require [clj-time.format :as f])
  (:require [clojure.data.json :as json]))
 
(defn as-date [date-field]
  (f/parse (f/formatter "dd MMM YYYY") date-field ))
 
(def my-date 
  (as-date "18 Mar 2012"))

And when I tried to convert a hash containing that object into a string I got the following exception:

> (json/write-str {:date my-date)})
 
java.lang.Exception: Don't know how to write JSON of class org.joda.time.DateTime
 at clojure.data.json$write_generic.invoke (json.clj:367)
    clojure.data.json$eval2818$fn__2819$G__2809__2826.invoke (json.clj:284)
    clojure.data.json$write_object.invoke (json.clj:333)
    clojure.data.json$eval2818$fn__2819$G__2809__2826.invoke (json.clj:284)
    clojure.data.json$write.doInvoke (json.clj:450)
    clojure.lang.RestFn.invoke (RestFn.java:425)

Luckily it’s quite easy to get around this by passing a function to write-str that converts the DateTime into a string representation before writing that part of the hash to a string.

The function looks like this:

(defn as-date-string [date]
  (f/unparse (f/formatter "dd MMM YYYY") date))
 
(defn date-aware-value-writer [key value] 
  (if (= key :date) (as-date-string value) value))

And we make use of the writer like so:

> (json/write-str {:date my-date} :value-fn date-aware-value-writer)
"{\"date\":\"18 Mar 2012\"}"

If we want to read that string back again and reify our date we create a reader function which converts a string into a DateTime. The as-date function from the beginning of this post does exactly what we want so we’ll use that:

(defn date-aware-value-reader [key value] 
  (if (= key :date) (as-date value) value))

We can then pass the reader as an argument to read-str:

> (json/read-str "{\"date\":\"18 Mar 2012\"}" :value-fn date-aware-value-reader :key-fn keyword)
{:date #<DateTime 2012-03-18T00:00:00.000Z>}

Written by Mark Needham

September 26th, 2013 at 7:11 pm

Posted in Clojure

Tagged with

Clojure: Writing JSON to a file/reading JSON from a file

with 9 comments

A few weeks ago I described how I’d scraped football matches using Clojure’s Enlive, and the next step after translating the HTML representation into a Clojure map was to save it as a JSON document.

I decided to follow a two step process to achieve this:

  • Convert hash to JSON string
  • Write JSON string to file

I imagine there’s probably a way to convert the hash to a stream and pipe that into a file but my JSON document isn’t very large so I think this way is ok for now.

data.json seems to be the way to go to convert a Hash to a JSON string and I had the following code:

> (require '[clojure.data.json :as json])
nil
 
> (json/write-str { :key1 "val1" :key2 "val2" })
"{\"key2\":\"val2\",\"key1\":\"val1\"}"

The next step was to write that into a file and this StackOverflow post describes a couple of ways that we can do this:

> (use 'clojure.java.io)
> (with-open [wrtr (writer "/tmp/test.json")]
    (.write wrtr (json/write-str {:key1 "val1" :key2 "val2"})))

or

> (spit "/tmp/test.json" (json/write-str {:key1 "val1" :key2 "val2"}))

Now I wanted to read the file back into a hash and I started with the following:

> (json/read-str (slurp "/tmp/test.json"))
{"key2" "val2", "key1" "val1"}

That’s not bad but I wanted the keys to be what I know as symbols (e.g. ‘:key1′) from Ruby land. I re-learnt that this is called a keyword in Clojure.

Since I’m not very good at reading the documentation I wrote a function to convert all the keys in a map from strings to keywords:

> (defn string-keys-to-symbols [map]
    (reduce #(assoc %1 (-> (key %2) keyword) (val %2)) {} map))
 
> (string-keys-to-symbols (json/read-str (slurp "/tmp/test.json")))
{:key1 "val1", :key2 "val2"}

What I should have done is pass the keyword function as an argument to read-str instead:

> (json/read-str (slurp "/tmp/test.json") :key-fn keyword)
{:key2 "val2", :key1 "val1"}

Simple!

Written by Mark Needham

September 26th, 2013 at 7:47 am

Posted in Clojure

Tagged with

Clojure: Anonymous functions using short notation and the ‘ArityException Wrong number of args (0) passed to: PersistentVector’

with one comment

In the time I’ve spent playing around with Clojure one thing I’ve always got confused by is the error message you get when trying to return a vector using the anonymous function shorthand.

For example, if we want function which creates a vector with the values 1, 2, and the argument passed into the function we could write the following:

> ((fn [x] [1 2 x]) 6)
[1 2 6]

However, when I tried to convert it to the shorthand ‘#()’ syntax I got the following exception:

> (#([1 2 %]) 6)
clojure.lang.ArityException: Wrong number of args (0) passed to: PersistentVector
                                      AFn.java:437 clojure.lang.AFn.throwArity
                                       AFn.java:35 clojure.lang.AFn.invoke
                                  NO_SOURCE_FILE:1 user/eval575[fn]
                                  NO_SOURCE_FILE:1 user/eval575

On previous occasions I’ve just stopped there and gone back to the long hand notation but this time I wanted to figure out why it didn’t work as I expected.

I came across this StackOverflow post which explained the way the shorthand gets expanded:

#() becomes (fn [arg1 arg2] (...))

which means that:

#(([1 2 %]) 6) becomes ((fn [arg] ([1 2 arg])) 6)

We are evaluating the vector [1 2 arg] as a function but aren’t passing any arguments to it. One way it can be used as a function is if we want to return a value at a specific index e.g.

> ([1 2 6] 2)
6

We don’t want to evaluate a vector as a function, rather we want to return the vector using the shorthand syntax. To do that we need to find a function which will return the argument passed to it and then pass the vector to that function.

The identity function is one such function:

> (#(identity [1 2 %]) 6)
[1 2 6]

Or if we want to be more concise the thread-first (->) works too:

> (#(-> [1 2 %]) 6)
[1 2 6]

Written by Mark Needham

September 23rd, 2013 at 9:42 pm

Posted in Clojure

Tagged with

Clojure/Emacs/nrepl: Stacktrace-less error messages

without comments

Ever since I started using the Emacs + nrepl combination to play around with Clojure I’ve been getting fairly non descript error messages whenever I pass the wrong parameters to a function.

For example if I try to update a non existent key in a form I get a Null Pointer Exception:

> (update-in {} [:mark] inc)
NullPointerException   clojure.lang.Numbers.ops (Numbers.java:942)

In this case it’s clear that the hash doesn’t have a key ‘:mark’ so the function blows up. However, sometimes the functions are more complicated and this type of reduced stack trace isn’t very helpful for working out where the problem lies.

I eventually came across a thread in the nrepl-el forum where Tim King suggested that adding the following lines to the Emacs configuration file should sort things out:

~/.emacs.d/init.el

(setq nrepl-popup-stacktraces nil)
(setq nrepl-popup-stacktraces-in-repl t)

I added those two lines, restarted Emacs and after calling the function again got a much more detailed stack trace:

> (update-in {} [:mark] inc)
 
java.lang.NullPointerException: 
                 Numbers.java:942 clojure.lang.Numbers.ops
                 Numbers.java:110 clojure.lang.Numbers.inc
                     core.clj:863 clojure.core/inc
                     AFn.java:161 clojure.lang.AFn.applyToHelper
                     AFn.java:151 clojure.lang.AFn.applyTo
                     core.clj:603 clojure.core/apply
                    core.clj:5472 clojure.core/update-in
                  RestFn.java:445 clojure.lang.RestFn.invoke
                 NO_SOURCE_FILE:1 user/eval9
...

From reading this stack trace we learn that the problem happens when the inc function is called with a parameter of ‘nil’. We’d see the same thing if we called it directly:

> (inc nil)
 
java.lang.NullPointerException: 
                                  Numbers.java:942 clojure.lang.Numbers.ops
                                  Numbers.java:110 clojure.lang.Numbers.inc
                                  NO_SOURCE_FILE:1 user/eval14
...

Although Clojure error messages do baffle me at times, I hope things will be better now that I’ll be able to see on which line the error occurred.

Written by Mark Needham

September 22nd, 2013 at 11:07 pm

Posted in Clojure

Tagged with , ,