Mark Needham

Thoughts on Software Development

Archive for the ‘Clojure’ Category

Clojure/Emacs/nrepl: Stacktrace-less error messages

without comments

Ever since I started using the Emacs + nrepl combination to play around with Clojure I’ve been getting fairly non descript error messages whenever I pass the wrong parameters to a function.

For example if I try to update a non existent key in a form I get a Null Pointer Exception:

> (update-in {} [:mark] inc)
NullPointerException   clojure.lang.Numbers.ops (Numbers.java:942)

In this case it’s clear that the hash doesn’t have a key ‘:mark’ so the function blows up. However, sometimes the functions are more complicated and this type of reduced stack trace isn’t very helpful for working out where the problem lies.

I eventually came across a thread in the nrepl-el forum where Tim King suggested that adding the following lines to the Emacs configuration file should sort things out:

~/.emacs.d/init.el

(setq nrepl-popup-stacktraces nil)
(setq nrepl-popup-stacktraces-in-repl t)

I added those two lines, restarted Emacs and after calling the function again got a much more detailed stack trace:

> (update-in {} [:mark] inc)
 
java.lang.NullPointerException: 
                 Numbers.java:942 clojure.lang.Numbers.ops
                 Numbers.java:110 clojure.lang.Numbers.inc
                     core.clj:863 clojure.core/inc
                     AFn.java:161 clojure.lang.AFn.applyToHelper
                     AFn.java:151 clojure.lang.AFn.applyTo
                     core.clj:603 clojure.core/apply
                    core.clj:5472 clojure.core/update-in
                  RestFn.java:445 clojure.lang.RestFn.invoke
                 NO_SOURCE_FILE:1 user/eval9
...

From reading this stack trace we learn that the problem happens when the inc function is called with a parameter of ‘nil’. We’d see the same thing if we called it directly:

> (inc nil)
 
java.lang.NullPointerException: 
                                  Numbers.java:942 clojure.lang.Numbers.ops
                                  Numbers.java:110 clojure.lang.Numbers.inc
                                  NO_SOURCE_FILE:1 user/eval14
...

Although Clojure error messages do baffle me at times, I hope things will be better now that I’ll be able to see on which line the error occurred.

Written by Mark Needham

September 22nd, 2013 at 11:07 pm

Posted in Clojure

Tagged with , ,

Clojure/Emacs/nrepl: Ctrl X + Ctrl E leads to ‘FileNotFoundException Could not locate […] on classpath’

without comments

I’ve been playing around with Clojure using Emacs and nrepl recently and my normal work flow is to write some code in Emacs and then have it evaluated in nrepl by typing Ctrl X + Ctrl E at the end of the function.

I tried this once recently and got the following exception instead of a successful evaluation:

FileNotFoundException Could not locate ranking_algorithms/ranking__init.class or ranking_algorithms/ranking.clj on classpath: clojure.lang.RT.load (RT.java:432)

I was a bit surprised because I had nrepl running already (via (Meta + X) + Enter + nrepl-jack-in) and I’d only ever seen that exception refer to dependencies which weren’t in my project.clj file at the time I launched nrepl.

I eventually came across this StackOverflow post which suggested that you either launch nrepl using leiningen and then connect to it from Emacs or have your project.clj open when running (Meta + X) + Enter + nrepl-jack-in.

To launch nrepl from leiningen we’d run the following command from the terminal:

$ lein repl
nREPL server started on port 52265
REPL-y 0.1.0-beta10
Clojure 1.4.0
    Exit: Control+D or (exit) or (quit)
Commands: (user/help)
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
          (user/sourcery function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
Examples from clojuredocs.org: [clojuredocs or cdoc]
          (user/clojuredocs name-here)
          (user/clojuredocs "ns-here" "name-here")

We can then connect to that nrepl server from Emacs by typing (Meta + X) + Enter + nrepl which seems to work quite nicely.

To check the nrepl-jack-in approach works when we’ve got project.clj open we need to first kill the existing server by typing (Meta + X) + Enter + nrepl-quit.

Now if we type (Meta + X) + Enter + nrepl-jack-in our functions are evaluated correctly and all is well with the world again.

Written by Mark Needham

September 22nd, 2013 at 9:23 pm

Posted in Clojure

Tagged with ,

Clojure: Stripping all the whitespace

with 8 comments

When putting together data sets to play around with, one of the more boring tasks is stripping out characters that you’re not interested in and more often than not those characters are white spaces.

Since I’ve been building data sets using Clojure I wanted to write a function that would do this for me.

I started out with the following string:

(def word " with a  little bit of space we can make it through the night  ")

which I wanted to format in such a way that there would be a maximum of one space between each word.

I start out by using the trim function but that only removes white space from the beginning and end of a string:

> (clojure.string/trim word)
"with a  little bit of space we can make it through the night"

I wanted to get rid of the space in between ‘a’ and ‘little’ as well so I wrote the following code to split on a space and filter out any excess spaces that still remained before joining the words back together:

> (clojure.string/join " " 
                       (filter #(not (clojure.string/blank? %)) 
                               (clojure.string/split word #" ")))
"with a little bit of space we can make it through the night"

I wanted to try and make it a bit easier to read by using the thread last (->>) macro but that didn’t work as well as I’d hoped because clojure.string/split doesn’t take the string in as its last parameter:

>  (->> (clojure.string/split word #" ") 
   (filter #(not (clojure.string/blank? %))) 
   (clojure.string/join " "))
"with a little bit of space we can make it through the night"

I worked around it by creating a specific function for splitting on a space:

(defn split-on-space [word] 
  (clojure.string/split word #"\s"))

which means we can now chain everything together nicely:

>  (->> word 
        split-on-space 
        (filter #(not (clojure.string/blank? %))) 
        (clojure.string/join " "))
"with a little bit of space we can make it through the night"

I couldn’t find a cleaner way to do this but I’m sure there is one and my googling just isn’t up to scratch so do let me know in the comments!

Written by Mark Needham

September 22nd, 2013 at 6:54 pm

Posted in Clojure

Tagged with

Clojure: Converting an array/set into a hash map

with 3 comments

When I was implementing the Elo Rating algorithm a few weeks ago one thing I needed to do was come up with a base ranking for each team.

I started out with a set of teams that looked like this:

(def teams #{ "Man Utd" "Man City" "Arsenal" "Chelsea"})

and I wanted to transform that into a map from the team to their ranking e.g.

Man Utd -> {:points 1200}
Man City -> {:points 1200}
Arsenal -> {:points 1200}
Chelsea -> {:points 1200}

I had read the documentation of array-map, a function which can be used to transform a collection of pairs into a map, and it seemed like it might do the trick.

I started out by building an array of pairs using mapcat:

> (mapcat (fn [x] [x {:points 1200}]) teams)
("Chelsea" {:points 1200} "Man City" {:points 1200} "Arsenal" {:points 1200} "Man Utd" {:points 1200})

array-map constructs a map from pairs of values e.g.

> (array-map "Chelsea" {:points 1200} "Man City" {:points 1200} "Arsenal" {:points 1200} "Man Utd" {:points 1200})
("Chelsea" {:points 1200} "Man City" {:points 1200} "Arsenal" {:points 1200} "Man Utd" {:points 1200})

Since we have a collection of pairs rather than individual pairs we need to use the apply function as well:

> (apply array-map ["Chelsea" {:points 1200} "Man City" {:points 1200} "Arsenal" {:points 1200} "Man Utd" {:points 1200}])
{"Chelsea" {:points 1200}, "Man City" {:points 1200}, "Arsenal" {:points 1200}, "Man Utd" {:points 1200}}

And if we put it all together we end up with the following:

> (apply array-map (mapcat (fn [x] [x {:points 1200}]) teams))
{"Man Utd"  {:points 1200}, "Man City" {:points 1200}, "Arsenal"  {:points 1200}, "Chelsea"  {:points 1200}}

It works but the function we pass to mapcat feels a bit clunky. Since we just need to create a collection of team/ranking pairs we can use the vector and repeat functions to build that up instead:

> (mapcat vector teams (repeat {:points 1200}))
("Chelsea" {:points 1200} "Man City" {:points 1200} "Arsenal" {:points 1200} "Man Utd" {:points 1200})

And if we put the apply array-map code back in we still get the desired result:

> (apply array-map (mapcat vector teams (repeat {:points 1200})))
{"Chelsea" {:points 1200}, "Man City" {:points 1200}, "Arsenal" {:points 1200}, "Man Utd" {:points 1200}}

Alternatively we could use assoc like this:

> (apply assoc {} (mapcat vector teams (repeat {:points 1200})))
{"Man Utd" {:points 1200}, "Arsenal" {:points 1200}, "Man City" {:points 1200}, "Chelsea" {:points 1200}}

I also came across the into function which seemed useful but took in a collection of vectors:

> (into {} [["Chelsea" {:points 1200}] ["Man City" {:points 1200}] ["Arsenal" {:points 1200}] ["Man Utd" {:points 1200}] ])

We therefore need to change the code to use map instead of mapcat:

> (into {} (map vector teams (repeat {:points 1200})))
{"Chelsea" {:points 1200}, "Man City" {:points 1200}, "Arsenal" {:points 1200}, "Man Utd" {:points 1200}}

However, my favourite version so far uses the zipmap function like so:

> (zipmap teams (repeat {:points 1200}))
{"Man Utd" {:points 1200}, "Arsenal" {:points 1200}, "Man City" {:points 1200}, "Chelsea" {:points 1200}}

I’m sure there are other ways to do this as well so if you know any let me know in the comments.

Written by Mark Needham

September 20th, 2013 at 9:13 pm

Posted in Clojure

Tagged with

Clojure: Converting a string to a date

without comments

I wanted to do some date manipulation in Clojure recently and figured that since clj-time is a wrapper around Joda Time it’d probably do the trick.

The first thing we need to do is add the dependency to our project file and then run lein reps to pull down the appropriate JARs. The project file should look something like this:

project.clj

(defproject ranking-algorithms "0.1.0-SNAPSHOT"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.4.0"]
                 [clj-time "0.6.0"]])

Now let’s load the clj-time.format namespace into the REPL since we know we’ll be parsing dates:

> (require '(clj-time [format :as f]))

The string that I want to convert into a date looks like this:

(def string-date "18 September 2012")

The first thing we should do is check whether there is an existing formatter that we can use by evaluating the following function:

> (f/show-formatters)
...
:hour-minute                            06:45
:hour-minute-second                     06:45:22
:hour-minute-second-fraction            06:45:22.473
:hour-minute-second-ms                  06:45:22.473
:mysql                                  2013-09-20 06:45:22
:ordinal-date                           2013-263
:ordinal-date-time                      2013-263T06:45:22.473Z
:ordinal-date-time-no-ms                2013-263T06:45:22Z
:rfc822                                 Fri, 20 Sep 2013 06:45:22 +0000
...

There are a lot of different built in formatters but unfortunately I couldn’t find one that exactly matched our date format so we’ll have to write our own one.

For that we’ll need to refresh our knowledge of Java date formatting:

2013 09 20 07 48 52

We end up with the following formatter:

> (f/parse (f/formatter "dd MMM YYYY") string-date)
#<DateTime 2012-09-18T00:00:00.000Z>

It took me much longer than it should have to remember that ‘MMM’ is the pattern to match a short form of a month but it’s just the same as what we’d have to do in Java but with some neat wrapper functions.

Written by Mark Needham

September 20th, 2013 at 7:00 am

Posted in Clojure

Tagged with

Clojure: See every step of a reduce

without comments

Last year I wrote about a Haskell function called scanl which returned the intermediate steps of a fold over a collection and last week I realised that I needed a similar function in Clojure to analyse a reduce I’d written.

A simple reduce which adds together the numbers 1-10 would look like this:

> (reduce + 0 (range 1 11))
55

If we want to see the intermediate values of this function called then instead of using reduce there’s a function called reductions which gives us exactly what we want:

> (reductions + 0 (range 1 11))
(0 1 3 6 10 15 21 28 36 45 55)

I found this function especially useful when analysing my implementation of the Glicko ranking algorithm to work out whether a team’s ranking was being updated correctly after a round of matches.

I initially thought the reductions function was only useful as a debugging tool and that you’d always end up changing your code back to use reduce after you’d solved the problem but I realise I was mistaken.

As part of my implementation of the Glicko algorithm I wrote a bit of code that applied a reduce across a collection of football seasons and initially just returned the final ranking of each team:

(def initial-team-rankings { "Man Utd" {:points 1200} "Man City" {:points 1300}})
 
(defn update-team-rankings [teams year]
  (reduce (fn [ts [team _]] (update-in ts [team :points] inc)) teams teams))
> (reduce update-team-rankings initial-team-rankings (range 2004 2013))
{"Man City" {:points 1309}, "Man Utd" {:points 1209}}

I realised it would actually be quite interesting to see the rankings after each season for which reductions comes in quite handy.

For example if we want to find the rankings after 3 seasons we could write the following code:

> (nth (reductions update-team-rankings initial-team-rankings (range 2004 2013)) 3)
{"Man City" {:points 1303}, "Man Utd" {:points 1203}}

Or we could join the result back onto our collection of years and create a map so we can look up the year more easily:

(def final-rankings
  (zipmap (range 2003 2013) (reductions update-team-rankings initial-team-rankings (range 2004 2013))))
> (get final-rankings 2006)
{"Man City" {:points 1303}, "Man Utd" {:points 1203}}

Written by Mark Needham

September 19th, 2013 at 11:57 pm

Posted in Clojure

Tagged with

Clojure: Merge two maps but only keep the keys of one of them

with 2 comments

I’ve been playing around with Clojure maps recently and I wanted to merge two maps of rankings where the rankings in the second map overrode those in the first while only keeping the teams from the first map.

The merge function overrides keys in earlier maps but also adds keys that only appear in later maps. For example, if we merge the following maps:

> (merge {"Man. United" 1500 "Man. City" 1400} {"Man. United" 1550 "Arsenal" 1450})
{"Arsenal" 1450, "Man. United" 1550, "Man. City" 1400}

we get back all 3 teams but I wanted a function which only returned ‘Man. United’ and ‘Man. City’ since those keys appear in the first map and ‘Arsenal’ doesn’t.

I wrote the following function:

(defn merge-rankings [initial-rankings override-rankings]
  (merge initial-rankings
         (into {} (filter #(contains? initial-rankings (key %)) override-rankings))))

If we call that we get the desired result:

> (merge-rankings {"Man. United" 1500 "Man. City" 1400} {"Man. United" 1550 "Arsenal" 1450})
{"Man. United" 1550, "Man. City" 1400}

An alternative version of that function could use select-keys like so:

(defn merge-rankings [initial-rankings override-rankings]
  (select-keys (merge initial-rankings override-rankings) (map key initial-rankings)))

bitemyapp points out in the comments that we can go even further and use the keys function instead of map key, like so:

(defn merge-rankings [initial-rankings override-rankings]
  (select-keys (merge initial-rankings override-rankings) (keys initial-rankings)))

Now let’s generify the function so it would make sense in the context of any maps, not just ranking related ones:

(defn merge-keep-left [left right]
  (select-keys (merge left right) (keys left)))

Written by Mark Needham

September 17th, 2013 at 1:03 am

Posted in Clojure

Tagged with

Clojure: Updating keys in a map

with one comment

I’ve been playing with Clojure over the last few weeks and as a result I’ve been using a lot of maps to represent the data.

For example if we have the following map of teams to Glicko ratings and ratings deviations:

(def teams { "Man. United" {:points 1500 :rd 350} 
             "Man. City"   {:points 1450 :rd 300} })

We might want to increase Man. United’s points score by one for which we could use the update-in function:

> (update-in teams ["Man. United" :points] inc)
{"Man. United" {:points 1501, :rd 350}, "Man. City" {:points 1450, :rd 300}}

The 2nd argument to update-in is a nested associative structure i.e. a sequence of keys into the map in this instance.

If we wanted to reset Man. United’s points score we could use assoc-in:

> (assoc-in teams ["Man. United" :points] 1)
{"Man. United" {:points 1, :rd 350}, "Man. City" {:points 1450, :rd 300}}

If we want to update multiple keys at once then we can chain them using the -> (thread first) macro:

(-> teams
    (assoc-in ["Man. United" :points] 1600)
    (assoc-in ["Man. United" :rd] 200))
{"Man. United" {:points 1600, :rd 200}, "Man. City" {:points 1450, :rd 300}}

If instead of replacing just one part of the value we want to replace the whole entry we could use assoc instead:

> (assoc teams "Man. United" {:points 1600 :rd 300})
{"Man. United" {:points 1600, :rd 300}, "Man. City" {:points 1450, :rd 300}}

assoc can also be used to add a new key/value to the map. e.g.

> (assoc teams "Arsenal" {:points 1500 :rd 330})
{"Man. United" {:points 1500, :rd 350}, "Arsenal" {:points 1500, :rd 330}, "Man. City" {:points 1450, :rd 300}}

dissoc plays the opposite role and returns a new map without the specified keys:

> (dissoc teams "Man. United" "Man. City")
{}

And those are all the map based functions I’ve played around with so far…

Written by Mark Needham

September 17th, 2013 at 12:24 am

Posted in Clojure

Tagged with

Glicko Rating System: A simple example using Clojure

with 2 comments

A couple of weeks ago I wrote about the Elo Rating system and when reading more about it I learnt that one of its weaknesses is that it doesn’t take into account the reliability of a players’ rating.

For example, a player may not have played for a long time. When they next play a match we shouldn’t assume that the accuracy of that rating is the same as for another player with the same rating but who plays regularly.

Mark Glickman wrote the Glicko Rating System to take the uncertainty into account by introducing a ‘ratings deviation’ (RD). A low RD indicates that a player competes frequently and a higher RD indicates that they don’t.

One other difference between Glicko and Elo is the following:

It is interesting to note that, in the Glicko system, rating changes are not balanced as they usually are in the Elo system.

If one player’s rating increases by x, the opponent’s rating does not usually decrease by x as in the Elo system.

In fact, in the Glicko system, the amount by which the opponent’s rating decreases is governed by both players’ RD’s.

The RD value effectively tells us the range in which the player’s actual rating probably exists. i.e. a 95% confidence interval.

e.g. if a player has a rating of 1850 and a RD of 50 then the interval is 1750 – 1950 or (Rating – 2*RD)(Rating + 2*RD)

The algorithm has 2 steps:

  1. Determine a rating and RD for each player at the onset of the rating period. If the player is unrated use a value of 1500 and RD of 350. If they do have a rating we’ll calculate the new RD from the old RD using this formula:
    Glicko rd

    where:

    • t is the number of rating periods since last competition (e.g., if the player
      competed in the most recent rating period, t = 1)
    • c is a constant that governs the increase in uncertainty over time.
  2. Update each players rating and RD separately using the following formula:
    Glicko

    where:

    • r is the player’s pre-period rating
    • RD is the player’s pre-period ratings deviation
    • r1, r2,…,rm are the pre-period ratings of their opponents
    • RD1, RD2,…,RDm are the pre-period ratings deviations of their opponents
    • s1, s2,…,2m are the scores against the opponents. 1 is a win, 1/2 is a draw, 0 is a defeat.
    • r’ is the player’s post-period rating
    • RD’ is the player’s post-period ratings deviation

The paper provides an example to follow and includes the intermediate workings which made it easier to build the algorithm one function at a time.

The q function was the simplest to implement so I created that and the g function at the same time:

(ns ranking-algorithms.glicko
  (:require [clojure.math.numeric-tower :as math]))
 
(def q
  (/ (java.lang.Math/log 10) 400))
 
(defn g [rd]
  (/ 1
     (java.lang.Math/sqrt (+ 1
                             (/ (* 3 (math/expt q 2) (math/expt rd 2))
                                (math/expt ( . Math PI) 2))))))

We can use the following table to check we get the right results when we call it.:

Glicko table

> (g 30)
0.9954980060779481
> (g 100)
0.953148974234587
> (g 300)
0.7242354637384434

The next easiest function to write was the E function:

(defn e [rating opponent-rating opponent-rd]
  (/ 1
     (+ 1
        (math/expt 10 (/ (* (- (g opponent-rd))
                            (- rating opponent-rating))
                         400)))))

And if we test that assuming that we have a rating of 1500 with a RD of 200:

> (e 1500 1400 30)
0.639467736007921
> (e 1500 1550 100)
0.43184235355955686
> (e 1500 1700 300)
0.30284072524764

Finally we need to write the d2 supporting function:

(defn d2 [opponents]
  (/ 1  (* (math/expt q 2)
           (reduce process-opponent 0 opponents))))
 
(defn process-opponent [total opponent]
  (let [{:keys [g e]} opponent]
    (+ total (* (math/expt g 2) e (- 1 e)))))

In this function we need to sum a combination of the g and e values we calculated earlier for each opponent so we can use a reduce over a collection of those values for each opponent to do that:

> (d2 [{:g (g 30) :e (e 1500 1400 30)} 
       {:g (g 100) :e (e 1500 1550 100)} 
       {:g (g 300) :e (e 1500 1700 300)}])
53685.74290197874

I get a slightly different value for this function which I think is because I didn’t round the intermediate values to 2 decimal places as the example does.

Now we can introduce the r’ function which returns our ranking after taking the matches against these opponents into account:

(defn update-ranking [ranking-delta opponent]
  (let [{:keys [ranking opponent-ranking opponent-ranking-rd score]} opponent]
    (+ ranking-delta
       (* (g opponent-ranking-rd)
          (- score (e ranking opponent-ranking opponent-ranking-rd))))))
 
(defn g-and-e
  [ranking {o-rd :opponent-ranking-rd o-ranking :opponent-ranking}]
  {:g (g o-rd) :e (e ranking o-ranking o-rd)})
 
(defn ranking-after-round
  [{ ranking :ranking rd :ranking-rd opponents :opponents}]  
  (+ ranking
     (* (/ q
           (+ (/ 1 (math/expt rd 2))
              (/ 1 (d2 (map (partial g-and-e ranking) opponents)))))
        (reduce update-ranking 0 (map #(assoc-in % [:ranking] ranking) opponents)))))

One thing I wasn’t sure about here was the use of partial which is a bit of a Haskell idiom. I’m not sure what the favoured approach is in Clojure land yet.

If we execute that function we get the expected result:

> (ranking-after-round { :ranking 1500 
                         :ranking-rd 200 
                         :opponents[{:opponent-ranking 1400 :opponent-ranking-rd 30 :score 1} 
                                    {:opponent-ranking 1550 :opponent-ranking-rd 100 :score 0} 
                                    {:opponent-ranking 1700 :opponent-ranking-rd 300 :score 0}]})
1464.1064627569112

The only function missing now is RD’ which returns our RD after taking these matches into account:

(defn rd-after-round
  [{ ranking :ranking rd :ranking-rd opponents :opponents}]
  (java.lang.Math/sqrt (/ 1 (+ (/ 1 (math/expt rd 2)
                                  )
                               (/ 1 (d2 (map (partial g-and-e ranking) opponents)))))))

If we execute that function we get the expected result and we’re done!

> (rd-after-round { :ranking 1500 
                    :ranking-rd 200 
                    :opponents[{:opponent-ranking 1400 :opponent-ranking-rd 30 :score 1} 
                               {:opponent-ranking 1550 :opponent-ranking-rd 100 :score 0} 
                               {:opponent-ranking 1700 :opponent-ranking-rd 300 :score 0}]})
151.39890244796933

The next step is to run this algorithm against the football data and see if its results differ to the ones I got with the Elo algorithm.

I’m still not quite sure what I should set the rating period to. My initial thinking was that the rating period could be a season but that would mean that a team’s rating only really makes sense after a few seasons of matches.

The code is on github if you want to play with it and if you have any suggestions on how to make the code more idiomatic I’d love to hear them.

Written by Mark Needham

September 14th, 2013 at 9:02 pm

Clojure: All things regex

without comments

I’ve been doing some scrapping of web pages recently using Clojure and Enlive and as part of that I’ve had to write regular expressions to extract the data I’m interested in.

On my travels I’ve come across a few different functions and I’m never sure which is the right one to use so I thought I’d document what I’ve tried for future me.

Check if regex matches

The first regex I wrote was while scrapping the Champions League results from the Rec.Sport.Soccer Statistics Foundation and I wanted to determine which spans contained the match result and which didn’t.

A matching line would look like this:

Real Madrid-Juventus Turijn 2 - 1

And a non matching one like this:

53’Nedved 0-1, 66'Xavi Hernández 1-1, 114’Zalayeta 1-2

I wrote the following regex to detect match results:

[a-zA-Z\s]+-[a-zA-Z\s]+ [0-9][\s]?.[\s]?[0-9]

I then wrote the following function using re-matches which would return true or false depending on the input:

(defn recognise-match? [row]
  (not (clojure.string/blank? (re-matches #"[a-zA-Z\s]+-[a-zA-Z\s]+ [0-9][\s]?.[\s]?[0-9]" row))))
> (recognise-match? "Real Madrid-Juventus Turijn 2 - 1")
true
> (recognise-match? "53’Nedved 0-1, 66'Xavi Hernández 1-1, 114’Zalayeta 1-2")
false

re-matches only returns matches if the whole string matches the pattern which means if we had a line with some spurious text after the score it wouldn’t match:

> (recognise-match? "Real Madrid-Juventus Turijn 2 - 1 abc")
false

If we don’t mind that and we just want some part of the string to match our pattern then we can use re-find instead:

(defn recognise-match? [row]
  (not (clojure.string/blank? (re-find #"[a-zA-Z\s]+-[a-zA-Z\s]+ [0-9][\s]?.[\s]?[0-9]" row))))
> (recognise-match? "Real Madrid-Juventus Turijn 2 - 1 abc")
true

Extract capture groups

The next thing I wanted to do was to capture the teams and the score of the match which I initially did using re-seq:

> (first (re-seq #"([a-zA-Z\s]+)-([a-zA-Z\s]+) ([0-9])[\s]?.[\s]?([0-9])" "FC Valencia-Internazionale Milaan 2 - 1"))
["FC Valencia-Internazionale Milaan 2 - 1" "FC Valencia" "Internazionale Milaan" "2" "1"]

I then extracted the various parts like so:

> (def result (first (re-seq #"([a-zA-Z\s]+)-([a-zA-Z\s]+) ([0-9])[\s]?.[\s]?([0-9])" "FC Valencia-Internazionale Milaan 2 - 1")))
 
> result
["FC Valencia-Internazionale Milaan 2 - 1" "FC Valencia" "Internazionale Milaan" "2" "1"]
 
 
> (nth result 1)
"FC Valencia"
 
> (nth result 2)
"Internazionale Milaan"

re-seq returns a list which contains consecutive matches of the regex. The list will either contain strings if we don’t specify capture groups or a vector containing the pattern matched and each of the capture groups.

For example if we now match only sequences of A-Z or spaces and remove the rest of the pattern from above we’d get the following results:

> (re-seq #"([a-zA-Z\s]+)" "FC Valencia-Internazionale Milaan 2 - 1")
(["FC Valencia" "FC Valencia"] ["Internazionale Milaan " "Internazionale Milaan "] [" " " "] [" " " "])
 
> (re-seq #"[a-zA-Z\s]+" "FC Valencia-Internazionale Milaan 2 - 1")
("FC Valencia" "Internazionale Milaan " " " " ")

In our case re-find or re-matches actually makes more sense since we only want to match the pattern once. If there are further matches after this those aren’t included in the results. e.g.

> (re-find #"[a-zA-Z\s]+" "FC Valencia-Internazionale Milaan 2 - 1")
"FC Valencia"
 
> (re-matches #"[a-zA-Z\s]*" "FC Valencia-Internazionale Milaan 2 - 1")
nil

re-matches returns nil here because there are characters in the string which don’t match the pattern i.e. the hyphen between the two scores.

If we tie that in with our capture groups we end up with the following:

> (def result 
    (re-find #"([a-zA-Z\s]+)-([a-zA-Z\s]+) ([0-9])[\s]?.[\s]?([0-9])" "FC Valencia-Internazionale Milaan 2 - 1"))
 
> result
["FC Valencia-Internazionale Milaan 2 - 1" "FC Valencia" "Internazionale Milaan" "2" "1"]
 
> (nth result 1)
"FC Valencia"
 
> (nth result 2)
"Internazionale Milaan"

I also came across the re-pattern function which provides a more verbose way of creating a pattern and then evaluating it with re-find:

> (re-find (re-pattern "([a-zA-Z\\s]+)-([a-zA-Z\\s]+) ([0-9])[\\s]?.[\\s]?([0-9])") "FC Valencia-Internazionale Milaan 2 - 1")
["FC Valencia-Internazionale Milaan 2 - 1" "FC Valencia" "Internazionale Milaan" "2" "1"]

One difference here is that I had to escape the special sequence ‘\s’ otherwise I was getting the following exception:

RuntimeException Unsupported escape character: \s  clojure.lang.Util.runtimeException (Util.java:170)

I wanted to play around with re-groups as well but that seemed to throw an exception reasonably frequently when I expected it to work.

The last function I looked at was re-matcher which seemed to be a long-hand for the ‘#””‘ syntax used earlier in the post to define matchers:

> (re-find (re-matcher #"([a-zA-Z\s]+)-([a-zA-Z\s]+) ([0-9])[\s]?.[\s]?([0-9])" "FC Valencia-Internazionale Milaan 2 - 1"))
["FC Valencia-Internazionale Milaan 2 - 1" "FC Valencia" "Internazionale Milaan" "2" "1"]

In summary

So in summary I think most use cases are covered by re-find and re-matches and maybe re-seq on special occasions. I couldn’t see where I’d use the other functions but I’m happy to be proved wrong.

Written by Mark Needham

September 14th, 2013 at 1:24 am

Posted in Clojure

Tagged with