Mark Needham

Thoughts on Software Development

Archive for the ‘Clojure’ tag

Clojure: First steps with reducers

without comments

I’ve been playing around with Clojure a bit today in preparation for a talk I’m giving next week and found myself writing the following code to apply the same function to three different scores:

(defn log2 [n]
  (/ (Math/log n) (Math/log 2)))
 
(defn score-item [n]
  (if (= n 0) 0 (log2 n)))
 
(+ (score-item 12) (score-item 13) (score-item 5)) 9.60733031374961

I’d forgotten about folding over a collection but quickly remembered that I could achieve the same result with the following code:

(reduce #(+ %1 (score-item %2)) 0 [12 13 5]) 9.60733031374961

The added advantage here is that if I want to add a 4th score to the mix all I need to do is append it to the end of the vector:

(reduce #(+ %1 (score-item %2)) 0 [12 13 5 6]) 12.192292814470767

However, while Googling to remind myself of the order of the arguments to reduce I kept coming across articles and documentation about reducers which I’d heard about but never used.

As I understand they’re used to achieve performance gains and easier composition of functions over collections so I’m not sure how useful they’ll be to me but I thought I’d give them a try.

Our first step is to bring the namespace into scope:

(require '[clojure.core.reducers :as r])

Now we can compute the same result using the reduce function:

(r/reduce #(+ %1 (score-item %2)) 0 [12 13 5 6]) 12.192292814470767

So far, so identical. If we wanted to calculate individual scores and then filter out those below a certain threshold the code would behave a little differently:

(->>[12 13 5 6]
    (map score-item)
    (filter #(> % 3))) (3.5849625007211565 3.700439718141092)
 
(->> [12 13 5 6]
     (r/map score-item)
     (r/filter #(> % 3))) #object[clojure.core.reducers$folder$reify__19192 0x5d0edf21 "clojure.core.reducers$folder$reify__19192@5d0edf21"]

Instead of giving us a vector of scores the reducers version returns a reducer which can pass into reduce or fold if we want an accumulated result or into if we want to output a collection. In this case we want the latter:

(->> [12 13 5 6]
     (r/map score-item)
     (r/filter #(> % 3))
     (into [])) (3.5849625007211565 3.700439718141092)

With a measly 4 item collection I don’t think the reducers are going to provide much speed improvement here but we’d need to use the fold function if we want processing of the collection to be done in parallel.

One for next time!

Written by Mark Needham

January 24th, 2016 at 10:01 pm

Posted in Clojure

Tagged with

Neo4j’s Cypher vs Clojure – Group by and Sorting

without comments

One of the points that I emphasised during my talk on building Neo4j backed applications using Clojure last week is understanding when to use Cypher to solve a problem and when to use the programming language.

A good example of this is in the meetup application I’ve been working on. I have a collection of events and want to display past events in descending order and future events in ascending order.

First let’s create some future and some past events based on the current timestamp of 1404006050535:

CREATE (event1:Event {name: "Future Event 1", timestamp: 1414002772427 })
CREATE (event2:Event {name: "Future Event 2", timestamp: 1424002772427 })
CREATE (event3:Event {name: "Future Event 3", timestamp: 1416002772427 })
 
CREATE (event4:Event {name: "Past Event 1", timestamp: 1403002772427 })
CREATE (event5:Event {name: "Past Event 2", timestamp: 1402002772427 })

If we return all the events we see the following:

$ MATCH (e:Event) RETURN e;
==> +------------------------------------------------------------+
==> | e                                                          |
==> +------------------------------------------------------------+
==> | Node[15414]{name:"Future Event 1",timestamp:1414002772427} |
==> | Node[15415]{name:"Future Event 2",timestamp:1424002772427} |
==> | Node[15416]{name:"Future Event 3",timestamp:1416002772427} |
==> | Node[15417]{name:"Past Event 1",timestamp:1403002772427}   |
==> | Node[15418]{name:"Past Event 2",timestamp:1402002772427}   |
==> +------------------------------------------------------------+
==> 5 rows
==> 13 ms

We can achieve the desired grouping and sorting with the following cypher query:

(def sorted-query "MATCH (e:Event)
WITH COLLECT(e) AS events
WITH [e IN events WHERE e.timestamp <= timestamp()] AS pastEvents,
     [e IN events WHERE e.timestamp > timestamp()] AS futureEvents
UNWIND pastEvents AS pastEvent
WITH pastEvent, futureEvents ORDER BY pastEvent.timestamp DESC
WITH COLLECT(pastEvent) as orderedPastEvents, futureEvents
UNWIND futureEvents AS futureEvent
WITH futureEvent, orderedPastEvents ORDER BY futureEvent.timestamp
RETURN COLLECT(futureEvent) AS orderedFutureEvents, orderedPastEvents")

We then use the following function to call through to the Neo4j server using the excellent neocons library:

(ns neo4j-meetup.db
  (:require [clojure.walk :as walk])
  (:require [clojurewerkz.neocons.rest.cypher :as cy])
  (:require [clojurewerkz.neocons.rest :as nr]))
 
(def NEO4J_HOST "http://localhost:7521/db/data/")
 
(defn cypher
  ([query] (cypher query {}))
  ([query params]
     (let [conn (nr/connect! NEO4J_HOST)]
       (->> (cy/tquery query params)
            walk/keywordize-keys))))

We call that function and grab the first row since we know there won’t be any other rows in the result:

(def query-result (->> ( db/cypher sorted-query) first))

Now we need to extract the past and future collections so that we can display them on the page which we can do like so:

> (map #(% :data) (query-result :orderedPastEvents))
({:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"})
 
> (map #(% :data) (query-result :orderedFutureEvents))
({:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1416002772427, :name "Future Event 3"} {:timestamp 1424002772427, :name "Future Event 2"})

An alternative approach is to return the events from cypher and then handle the grouping and sorting in clojure. In that case our query is much simpler:

(def unsorted-query "MATCH (e:Event) RETURN e")

We’ll use the clj-time library to determine the current time:

(def now (clj-time.coerce/to-long (clj-time.core/now)))

First let’s split the events into past and future:

> (def grouped-by-events 
     (->> (db/cypher unsorted-query)
          (map #(->> % :e :data))
          (group-by #(> (->> % :timestamp) now))))
 
> grouped-by-events
{true [{:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1424002772427, :name "Future Event 2"} {:timestamp 1416002772427, :name "Future Event 3"}], 
 false [{:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"}]}

And finally we sort appropriately using these functions:

(defn time-descending [row] (* -1 (->> row :timestamp)))
(defn time-ascending [row] (->> row :timestamp))
> (sort-by time-descending (get grouped-by-events false))
({:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"})
 
> (sort-by time-ascending (get grouped-by-events true))
({:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1416002772427, :name "Future Event 3"} {:timestamp 1424002772427, :name "Future Event 2"})

I used Clojure to do the sorting and grouping in my project because the query to get the events was a bit more complicated and became very difficult to read with the sorting and grouping mixed in.

Unfortunately cypher doesn’t provide an easy way to sort within a collection so we need our sorting in the row context and then collect the elements back again afterwards.

Written by Mark Needham

June 29th, 2014 at 2:56 am

Posted in Clojure,neo4j

Tagged with , ,

Clojure: Destructuring group-by’s output

with 2 comments

One of my favourite features of Clojure is that it allows you to destructure a data structure into values that are a bit easier to work with.

I often find myself referring to Jay Fields’ article which contains several examples showing the syntax and is a good starting point.

One recent use of destructuring I had was where I was working with a vector containing events like this:

user> (def events [{:name "e1" :timestamp 123} {:name "e2" :timestamp 456} {:name "e3" :timestamp 789}])

I wanted to split the events in two – those containing events with a timestamp greater than 123 and those less than or equal to 123.

After remembering that the function I wanted was group-by and not partition-by (I always make that mistake!) I had the following:

user> (group-by #(> (->> % :timestamp) 123) events)
{false [{:name "e1", :timestamp 123}], true [{:name "e2", :timestamp 456} {:name "e3", :timestamp 789}]}

I wanted to get 2 vectors that I could pass to the web page and this is fairly easy with destructuring:

user> (let [{upcoming true past false} (group-by #(> (->> % :timestamp) 123) events)] 
       (println upcoming) (println past))
[{:name e2, :timestamp 456} {:name e3, :timestamp 789}]
[{:name e1, :timestamp 123}]
nil

Simple!

Written by Mark Needham

May 31st, 2014 at 12:03 am

Posted in Clojure

Tagged with

Clojure: Create a directory

with 4 comments

I spent much longer than I should have done trying to work out how to create a directory in Clojure as part of an import script I’m working out so for my future self this is how you do it:

(.mkdir (java.io.File. "/path/to/dir/to/create"))

I’m creating a directory which contains today’s date so I’d want something like ‘members-2014-05-24’ if I was running it today. The clj-time library is very good for working with dates.

To create a folder containing today’s date this is what we’d have:

(ns neo4j-meetup.core
  (:require [clj-time.format :as f]))
 
(def format-as-year-month-day (f/formatter "yyyy-MM-dd"))
 
(defn create-directory-for-today []
  (let [date (f/unparse format-as-year-month-day (t/now))]
    (.mkdir (java.io.File. (str "data/members-" date)))))

Initial code shamelessly stolen from Shu Wang’s gist so thanks to him as well!

Written by Mark Needham

May 24th, 2014 at 12:12 am

Posted in Clojure

Tagged with

Clojure: Paging meetup data using lazy sequences

with one comment

I’ve been playing around with the meetup API to do some analysis on the Neo4j London meetup and one thing I wanted to do was download all the members of the group.

A feature of the meetup API is that each end point will only allow you to return a maximum of 200 records so I needed to make use of offsets and paging to retrieve everybody.

It seemed like a good chance to use some lazy sequences to keep track of the offsets and then stop making calls to the API once I wasn’t retrieving any more results.

I wrote the following functions to take care of that bit:

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))
 
(defn offsets []
  (unchunk (range)))
 
 
(defn get-all [api-fn]
  (flatten
   (take-while seq
               (map #(api-fn {:perpage 200 :offset % :orderby "name"}) (offsets)))))

I previously wrote about the chunking behaviour of lazy collections which meant that I ended up with a minimum of 32 calls to each URI which wasn’t what I had in mind!

To get all the members in the group I wrote the following function which is passed to get-all:

(:require [clj-http.client :as client])
 
(defn members
  [{perpage :perpage offset :offset orderby :orderby}]
  (->> (client/get
        (str "https://api.meetup.com/2/members?page=" perpage
             "&offset=" offset
             "&orderby=" orderby
             "&group_urlname=" MEETUP_NAME
             "&key=" MEETUP_KEY)
        {:as :json})
       :body :results))

So to get all the members we’d do this:

(defn all-members []
  (get-all members))

I’m told that using lazy collections when side effects are involved is a bad idea – presumably because the calls to the API might never end – but since I only run it manually I can just kill the process if anything goes wrong.

I’d be interested in how others would go about solving this problem – core.async was suggested but that seems to result in much more / more complicated code than this version.

The code is on github if you want to take a look.

Written by Mark Needham

April 30th, 2014 at 12:20 am

Posted in Clojure

Tagged with

Clojure: clj-time – Formatting a date / timestamp with day suffixes e.g. 1st, 2nd, 3rd

without comments

I’ve been using the clj-time library recently – a Clojure wrapper around Joda Time – and one thing I wanted to do is format a date with day suffixes e.g. 1st, 2nd, 3rd.

I started with the following timestamp:

1309368600000

The first step was to convert that into a DateTime object like so:

user> (require '[clj-time.coerce :as c])
user> (c/from-long 1309368600000)
#<DateTime 2011-06-29T17:30:00.000Z>

I wanted to output that date in the following format:

29th June 2011

We can get quite close by using a custom time formatter:

user> (require '[clj-time.format :as f])
nil
user> (f/unparse (f/formatter "d MMMM yyyy") (c/from-long 1309368600000))
"29 June 2011"

Unfortunately I couldn’t find anywhere in the documentation explaining how to get the elusive ‘th’ or ‘st’ to print. I was hoping for something similar to PHP date formatting:

2014 04 26 08 38 39

Eventually I came across a Stack Overflow post about Joda Time suggesting that you can’t actually format a day in the way I was hoping to.

So I now have the following function to do it for me:

(defn day-suffix [day]
  (let [stripped-day (if (< day 20) day (mod day 10))]
    (cond (= stripped-day 1) "st"
          (= stripped-day 2) "nd"
          (= stripped-day 3) "rd"
          :else "th")))

and the code to get the date in my favoured format looks like this:

user> (def my-time (c/from-long 1309368600000))
#'user/my-time
user> (def day (read-string (f/unparse (f/formatter "d") my-time)))
#'user/day
user> (str day (day-suffix day) " " (f/unparse (f/formatter "MMMM yyyy") my-time))
"29th June 2011"

I’m assuming there’s a better way but what is it?!

Written by Mark Needham

April 26th, 2014 at 7:50 am

Posted in Clojure

Tagged with

Clojure: Not so lazy sequences a.k.a chunking behaviour

with 3 comments

I’ve been playing with Clojure over the weekend and got caught out by the behaviour of lazy sequences due to chunking – something which was obvious to experienced Clojurians although not me.

I had something similar to the following bit of code which I expected to only evaluate the first item of the infinite sequence that the range function generates:

> (take 1 (map (fn [x] (println (str "printing..." x))) (range)))
(printing...0
printing...1
printing...2
printing...3
printing...4
printing...5
printing...6
printing...7
printing...8
printing...9
printing...10
printing...11
printing...12
printing...13
printing...14
printing...15
printing...16
printing...17
printing...18
printing...19
printing...20
printing...21
printing...22
printing...23
printing...24
printing...25
printing...26
printing...27
printing...28
printing...29
printing...30
printing...31
nil)

The reason this was annoying is because I wanted to shortcut the lazy sequence using take-while, much like the poster of this StackOverflow question.

As I understand it when we have a lazy sequence the granularity of that laziness is 32 items at a time a.k.a one chunk, something that Michael Fogus wrote about 4 years ago. This was a bit surprising to me but it sounds like it makes sense for the majority of cases.

However, if we want to work around that behaviour we can wrap the lazy sequence in the following unchunk function provided by Stuart Sierra:

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))

Now if we repeat our initial code we’ll see it only prints once:

> (take 1 (map (fn [x] (println (str "printing..." x))) (unchunk (range))))
(printing...0
nil)

Written by Mark Needham

April 6th, 2014 at 10:07 pm

Posted in Clojure

Tagged with

Elo Rating System: Ranking Champions League teams using Clojure Part 2

without comments

A few weeks ago I wrote about ranking Champions League teams using the Elo Rating algorithm, and since I wrote that post I’ve collated data for 10 years worth of matches so I thought an update was in order.

After extracting the details of all those matches I saved them to a JSON file so that I wouldn’t have to parse the HTML pages every time I tweaked the algorithm. This should also make it easier for other people to play with the data.

I described the algorithm in the previous post and had analysed the rankings for one season. However, it was difficult to understand why teams has been ranked in a certain order so I drilled into the data to find out.

Since the 2012/2013 season is the freshest in my memory I started with that.

The first thing to do was load the matches from disk:

(ns ranking-algorithms.uefa
  (:require [clj-time.format :as f])
  (:require [clojure.data.json :as json]))
 
(defn as-date [date-field] (f/parse (f/formatter "dd MMM YYYY") date-field ))
 
(defn date-aware-value-reader [key value] (if (= key :date) (as-date value) value))
 
(defn read-from-file [file]
  (json/read-str (slurp file)
                 :value-fn date-aware-value-reader
                 :key-fn keyword))

I’ve written previously about reifying a date masquerading as a string but for now we want to give those matches a name and then process them.

> (def the-matches (read-from-file "data/cl-matches-2013.json"))
#'ranking-algorithms.uefa/the-matches
 
> (count the-matches)
213

I already had a function top-teams which would apply the Elo algorithm across the matches, but since I wanted to drill into each team’s performance I wrapped that function in another one called print-top-teams:

(comment "other functions excluded for brevity")
 
(defn format-for-printing [all-matches idx [team ranking & [rd]]]
  (let [team-matches (show-matches team all-matches)]
    (merge  {:rank (inc idx)
             :team team
             :ranking ranking
             :rd rd
             :round (performance team-matches)}
            (match-record team-matches))))
 
(defn print-top-teams
  ([number all-matches] (print-top-teams number all-matches {}))
  ([number all-matches base-rankings]
      (clojure.pprint/print-table
       [:rank :team :ranking :round :wins :draw :loses]
       (map-indexed
        (partial format-for-printing all-matches)
        (top-teams number all-matches base-rankings)))))
> (ranking-algorithms.core/print-top-teams 10 the-matches)
========================================================================
:rank | :team       | :ranking | :round         | :wins | :draw | :loses
========================================================================
1     | Bayern      | 1272.74  | Final          | 10    | 1     | 2     
2     | PSG         | 1230.02  | Quarter-finals | 6     | 3     | 1     
3     | Dortmund    | 1220.96  | Final          | 7     | 4     | 2     
4     | Real Madrid | 1220.33  | Semi-finals    | 6     | 3     | 3     
5     | Porto       | 1216.97  | Round of 16    | 5     | 1     | 2     
6     | CFR Cluj    | 1216.56  | Group stage    | 7     | 1     | 2     
7     | Galatasaray | 1215.56  | Quarter-finals | 5     | 2     | 3     
8     | Juventus    | 1214.0   | Quarter-finals | 5     | 3     | 2     
9     | Málaga      | 1211.53  | Quarter-finals | 5     | 5     | 2     
10    | Valencia    | 1211.0   | Round of 16    | 4     | 2     | 2     
========================================================================

I’ve excluded most of the functions but you can find the source in core.clj on github.

Clojure-wise I learnt about the print-table function which came in handy and Elo-wise I realised that the ranking places a heavy emphasis on winning matches.

If you follow the Champions League closely you’ll have noticed that Barcelona are missing from the top 10 despite reaching the Semi Final. The Elo algorithm actually ranks them in 65th position:

=========================================================================================
:rank | :team               | :ranking | :round                  | :wins | :draw | :loses
=========================================================================================
63    | Motherwell          | 1195.04  | Third qualifying round  | 0     | 0     | 2     
64    | Feyenoord           | 1195.04  | Third qualifying round  | 0     | 0     | 2     
65    | Barcelona           | 1194.68  | Semi-finals             | 5     | 3     | 4     
66    | BATE                | 1194.36  | Group stage             | 5     | 3     | 4     
67    | Anderlecht          | 1193.41  | Group stage             | 4     | 2     | 4 
=========================================================================================

It’s all about winning

I thought there might be a bug in my implementation but having looked through it multiple times, Barcelona’s low ranking results from losing multiple games – to Celtic, AC Milan and twice to Bayern Munich – and progressing from their Quarter Final tie against PSG without winning either match.

I did apply a higher weighting to matches won later on in the competition but otherwise the Elo algorithm doesn’t take into account progress in a tournament.

Low variation in rankings

The initial ranking of each team was 1200, so I was surprised to see that the top ranked team had only achieved a ranking of 1272 – I expected it to be higher.

I read a bit more about the algorithm and learnt that a 200 points gap in ranking signifies that the higher ranked team should win 75% of the time.

For this data set the top ranked team has 1272 points and the lowest ranked team has 1171 points so we probably need to tweak the algorithm to make it more accurate.

Accuracy of the Elo algorithm

My understanding of the Elo algorithm is that it becomes more accurate as teams play more matches so I decided to try it out on all the matches from 2004 – 2012.

I adapted the print-top-teams function to exclude ‘:round’ since it doesn’t make sense in this context:

(comment "I really need to pull out the printing stuff into a function but I'm lazy so I haven't...yet")
 
(defn print-top-teams-without-round
  ([number all-matches] (print-top-teams-without-round number all-matches {}))
  ([number all-matches base-rankings]
      (clojure.pprint/print-table
       [:rank :team :ranking :wins :draw :loses]
       (map-indexed
        (partial format-for-printing all-matches)
        (top-teams number all-matches base-rankings)))))

If we evaluate that function we see the following rankings:

> (def the-matches (read-from-file "data/cl-matches-2004-2012.json"))
#'ranking-algorithms.uefa/the-matches
 
> (ranking-algorithms.core/print-top-teams-without-round 10 the-matches)
==========================================================
:rank | :team          | :ranking | :wins | :draw | :loses
==========================================================
1     | Barcelona      | 1383.85  | 55    | 25    | 12    
2     | Man. United    | 1343.54  | 49    | 21    | 14    
3     | Chelsea        | 1322.0   | 44    | 27    | 17    
4     | Real Madrid    | 1317.68  | 42    | 14    | 18    
5     | Bayern         | 1306.18  | 42    | 13    | 19    
6     | Arsenal        | 1276.83  | 47    | 21    | 18    
7     | Liverpool      | 1272.52  | 41    | 17    | 17    
8     | Internazionale | 1260.27  | 36    | 18    | 21    
9     | Milan          | 1257.63  | 34    | 22    | 18    
10    | Bordeaux       | 1243.04  | 12    | 3     | 7     
==========================================================

The only finalists missing from this list are Monaco and Porto who contested the final in 2004 but haven’t reached that level of performance since.

Bordeaux are the only unlikely entry in this list and have played ~60 games less than the other teams which suggests that we might not have as much confidence in their ranking. It was at this stage that I started looking at the Glicko algorithm which calculates a rating reliability as well as the rating itself.

I’ve added instructions to the github README showing some examples but if you have any questions feel free to ping me on here or twitter.

Written by Mark Needham

September 30th, 2013 at 8:26 pm

Posted in Ranking Systems

Tagged with ,

Clojure: Writing JSON to a file – “Exception Don’t know how to write JSON of class org.joda.time.DateTime”

with 4 comments

As I mentioned in an earlier post I’ve been transforming Clojure hash’s into JSON strings using data.json but ran into trouble while trying to parse a hash which contained a Joda Time DateTime instance.

The date in question was constructed like this:

(ns json-date-example
  (:require [clj-time.format :as f])
  (:require [clojure.data.json :as json]))
 
(defn as-date [date-field]
  (f/parse (f/formatter "dd MMM YYYY") date-field ))
 
(def my-date 
  (as-date "18 Mar 2012"))

And when I tried to convert a hash containing that object into a string I got the following exception:

> (json/write-str {:date my-date)})
 
java.lang.Exception: Don't know how to write JSON of class org.joda.time.DateTime
 at clojure.data.json$write_generic.invoke (json.clj:367)
    clojure.data.json$eval2818$fn__2819$G__2809__2826.invoke (json.clj:284)
    clojure.data.json$write_object.invoke (json.clj:333)
    clojure.data.json$eval2818$fn__2819$G__2809__2826.invoke (json.clj:284)
    clojure.data.json$write.doInvoke (json.clj:450)
    clojure.lang.RestFn.invoke (RestFn.java:425)

Luckily it’s quite easy to get around this by passing a function to write-str that converts the DateTime into a string representation before writing that part of the hash to a string.

The function looks like this:

(defn as-date-string [date]
  (f/unparse (f/formatter "dd MMM YYYY") date))
 
(defn date-aware-value-writer [key value] 
  (if (= key :date) (as-date-string value) value))

And we make use of the writer like so:

> (json/write-str {:date my-date} :value-fn date-aware-value-writer)
"{\"date\":\"18 Mar 2012\"}"

If we want to read that string back again and reify our date we create a reader function which converts a string into a DateTime. The as-date function from the beginning of this post does exactly what we want so we’ll use that:

(defn date-aware-value-reader [key value] 
  (if (= key :date) (as-date value) value))

We can then pass the reader as an argument to read-str:

> (json/read-str "{\"date\":\"18 Mar 2012\"}" :value-fn date-aware-value-reader :key-fn keyword)
{:date #<DateTime 2012-03-18T00:00:00.000Z>}

Written by Mark Needham

September 26th, 2013 at 7:11 pm

Posted in Clojure

Tagged with

Clojure: Writing JSON to a file/reading JSON from a file

with 9 comments

A few weeks ago I described how I’d scraped football matches using Clojure’s Enlive, and the next step after translating the HTML representation into a Clojure map was to save it as a JSON document.

I decided to follow a two step process to achieve this:

  • Convert hash to JSON string
  • Write JSON string to file

I imagine there’s probably a way to convert the hash to a stream and pipe that into a file but my JSON document isn’t very large so I think this way is ok for now.

data.json seems to be the way to go to convert a Hash to a JSON string and I had the following code:

> (require '[clojure.data.json :as json])
nil
 
> (json/write-str { :key1 "val1" :key2 "val2" })
"{\"key2\":\"val2\",\"key1\":\"val1\"}"

The next step was to write that into a file and this StackOverflow post describes a couple of ways that we can do this:

> (use 'clojure.java.io)
> (with-open [wrtr (writer "/tmp/test.json")]
    (.write wrtr (json/write-str {:key1 "val1" :key2 "val2"})))

or

> (spit "/tmp/test.json" (json/write-str {:key1 "val1" :key2 "val2"}))

Now I wanted to read the file back into a hash and I started with the following:

> (json/read-str (slurp "/tmp/test.json"))
{"key2" "val2", "key1" "val1"}

That’s not bad but I wanted the keys to be what I know as symbols (e.g. ‘:key1’) from Ruby land. I re-learnt that this is called a keyword in Clojure.

Since I’m not very good at reading the documentation I wrote a function to convert all the keys in a map from strings to keywords:

> (defn string-keys-to-symbols [map]
    (reduce #(assoc %1 (-> (key %2) keyword) (val %2)) {} map))
 
> (string-keys-to-symbols (json/read-str (slurp "/tmp/test.json")))
{:key1 "val1", :key2 "val2"}

What I should have done is pass the keyword function as an argument to read-str instead:

> (json/read-str (slurp "/tmp/test.json") :key-fn keyword)
{:key2 "val2", :key1 "val1"}

Simple!

Written by Mark Needham

September 26th, 2013 at 7:47 am

Posted in Clojure

Tagged with