Archive for the ‘Clojure’ Category
Clojure: Unit testing in the REPL
One thing which I think is great about coding with F# is the quick feedback that we can get by defining and then testing out functions in the REPL.
We can do the same thing in Clojure but it’s even better because we can also define and run unit tests which I think is pretty neat.
Nurullah Akkaya has a good post which describes how to use clojure.test, a testing framework written by Stuart Sierra so I’ve been using that to define some tests cases for the little RSS feed parser that I’m writing.
To use clojure.test straight out the box you need the latest version of the clojure source code as Stuart Sierra points out on his website.
I ran the ant task for the project and then launched the REPL pointing to the ‘alpha snapshot’ jar instead of the ’1.0.0′ jar and it seems to work fine.
I managed to break the ‘get-title’ function while playing with it before so I thought that would be a good one to try out the tests in the REPL with.
This function is supposed to strip out the name and the following colon which appears in every title and just show the title of the blog post.
I originally had this definition:
(defn get-title [title] (second (first (re-seq #".*:\s(.*)" title))))
I hadn’t realised that this strips from the last colon in the string and therefore returns the wrong result for some inputs.
I created the following tests:
(use 'clojure.test) (deftest test-get-title (is (= "Clojure - It's awesome" (get-title "Mark Needham: Clojure - It's awesome"))) (is (= "A Book: Book Review" (get-title "Mark Needham: A Book: Book Review"))))
We can run those with the following function:
(run-tests)
FAIL in (test-get-title) (NO_SOURCE_FILE:19) expected: (= "A Book: Book Review" (get-title "Mark Needham: A Book: Book Review")) actual: (not (= "A Book: Book Review" "Book Review")) Ran 1 tests containing 2 assertions. 1 failures, 0 errors.
Changing the function helps solve the problem:
(defn- get-title [title] (second (first (re-seq #"[a-zA-Z0-9 ]+:\s(.*)" title))))
Ran 1 tests containing 2 assertions. 0 failures, 0 errors.
We can also run the assertions directly without having to call ‘run-tests’:
(is (= "A Book: Book Review" (get-title "Mark Needham: A Book: Book Review")))
true
(is (= "Something Else" (get-title "Mark Needham: A Book: Book Review")))
expected: (= "Something Else" (get-title "Mark Needham: A Book: Book Review")) actual: (not (= "Something Else" "A Book: Book Review")) false
Nurullah has more detail in his post about how to integrate tests into a build although I don’t need to do that just yet!
Clojure: Parsing an RSS feed
I’ve been playing around with a little script in Clojure to parse the ThoughtWorks Blogs RSS feed and then create a tweet for each of them which contains a link to the blog post and the person’s Twitter ID if they have one.
It’s not finished yet but I’m finding the way that we parse documents like this in Clojure quite intriguing.
The xml to parse looks roughly like this:
<rss version="2.0"> <channel> ... <item> <title>Simon Brunning: Links for 2009-11-27 [del.icio.us]</title> <link>http://feedproxy.google.com/~r/SmallValuesOfCool/~3/WDqeLyMA-RE/brunns</link> </item> <item> <title>Alex Hung: Extending iPhone battery life</title> <link>http://alexhung.vox.com/library/post/extending-iphone-battery-life.html?_c=feed-atom-full</link> </item> ... </channel> </rss>
I’ve only included the parts of the document that I’m interested in getting.
Following the examples from Stuart Halloway’s book one approach to do this is to make use of the ‘clojure.xml.parse‘ and ‘clojure.core.xml-seq‘ functions to create a sequence representing the tree structure of the feed.
I’m used to parsing XML with XPath but that doesn’t make as much sense when we have a sequence of hash maps. Instead I’m using ‘filter‘ and ‘map‘ to try and achieve the same outcome.
I found that while I was trying to work out how to use these functions together I was often trying to solve the whole problem in one go instead of breaking it down into smaller more manageable pieces.
I also noticed that I was using ‘filter’ more often than I needed to instead of filtering the data to the point that everything I wanted to extract was in the remaining data set.
When I was playing with F# I got into the habit of trying to minimise the number of intermediate values I created but this seemed to be making life more difficult so I’ve allowed myself some intermediate values for the moment!
The goal is to poll the ThoughtWorks RSS feed and then update the planettw account with the latest blog posts. The current setup does that but doesn’t include people’s Twitter names in the tweet so I’m trying to sort that out.
This is the code I have so far:
(use '[clojure.xml :only (parse)]) (def feed (xml-seq (parse (java.io.File. "clojure-play/tw-blogs-rss.txt")))) (def rss-entries (filter #(= :item (:tag %)) feed)) (defn- get-href [link] ((comp :href :attrs) link)) (defn- get-value [node] (first (:content node))) (defn rss-link [entry] (get-value (first (filter #(= :link (:tag %)) (:content entry))))) (defn rss-title [entry] (get-value (first (filter #(= :title (:tag %)) (:content entry))))) (def rss-titles (map #(rss-title %) rss-entries)) (def rss-links (map #(rss-link %) rss-entries)) (defn- get-author [title] (second (first (re-seq #"([\w ]+):" title)))) (defn- get-title [title] (second (first (re-seq #"[a-zA-Z0-9 ]+:\s(.*)" title)))) (def authors (map #(get-author %) rss-titles)) (def titles (map #(get-title %) rss-titles)) (defn- get-display-name [twitter-names real-name] (let [twitter-name (twitter-names real-name)] (if twitter-name (str "@" twitter-name) real-name))) (def twitter-names {"Mark Needham" "markhneedham" "Alex Hung" "alexhung" "Simon Brunning" "brunns" "Ola Bini" "olabini" "Patrick Kua" "patkua" "Marc McNeill" "dancingmango" "Dahlia Bock" "dlbock" "Sumeet Moghe" "sumeet_moghe" "Brian Guthrie" "bguthrie" "Ian Robinson" "iansrobinson" "Ian Cartwright" "cartwrightian" "Duncan Cragg" "duncancragg" "David Cameron" "davcamer" "Steven List" "athought" "Philip Calcado" "pcalcado" "Perryn Fowler" "perrynfowler" "Jason Yip" "jchyip" "Christopher Read" "cread" "Jim Webber" "jimwebber" "John Hume" "duelin_markers" }) (defn- create-blog-post [title link author] {:tweet (str title " by " (get-display-name twitter-names author) " " link)}) (defn create-blog-posts [titles links authors] (map #(create-blog-post %1 %2 %3) titles links authors))
To use that you’d need to do this:
(create-blog-posts titles rss-links authors)
Which returns a sequence of hash maps with key ‘tweet’ and a value of the tweet to display on Twitter:
({:tweet "Links for 2009-11-27 [del.icio.us] by @brunns http://feedproxy.google.com/~r/SmallValuesOfCool/~3/WDqeLyMA-RE/brunns"}
{:tweet "Extending iPhone battery life by @alexhung http://alexhung.vox.com/library/post/extending-iphone-battery-life.html?_c=feed-atom-full"}
{:tweet "Threshold Anxiety by Adrian Wible http://thoughtadrian.blogspot.com/2009/11/threshold-anxiety.html"})The next step is to get this hooked up to the Twitter API.
There are still some things I’m unsure of when it comes to writing applications in Clojure:
- I’m not sure what to do with ‘twitter-names’. It’s pretty much a global data store so I can’t decide whether to just refer to it directly inside other functions or if it should be passed in as a parameter.
- I used ‘first’ quite a few times in the code to get the first value in a sequence but it doesn’t feel like the code expresses the structure of the document very well.
- What’s the best way to lay out code for ‘defn’ expressions? I’ve been putting the signature on it’s own line and then the implementation on other lines which seems to be the way that it’s done in the Clojure source code but it sometimes seems like I could just write it all on one line.
Clojure: The ‘apply’ function
In my continued playing around with Clojure I came across the ‘apply‘ function which is used when we want to call another function with a number of arguments but have actually been given a single argument which contains the argument list.
The example that I’ve been trying to understand is applying ‘str‘ to a collection of values.
I started off with the following:
(str [1 2 3]) => "[1 2 3]"
This just returns the string representation of the vector that we passed it, but what we actually want is to get an output of “123″.
The ‘apply’ function allows us to do that:
(apply str [1 2 3]) => "123"
That is semantically/conceptually the same as doing this:
(str 1 2 3) => "123"
I didn’t quite understand how that could work though and my assumption was that somewhere in the Clojure source the above function call would be happening.
The definition of ‘apply’ is as follows:
(defn apply "Applies fn f to the argument list formed by prepending args to argseq." {:arglists '([f args* argseq])} [#^clojure.lang.IFn f & args] (. f (applyTo (spread args))))
The first thing which I hadn’t realised is that when you have an ‘&’ before a parameter definition then any arguments provided will be put into a list.
If we break down the example above we end up with the following:
(. str (applyTo (spread [[1 2 3]])))
The ‘spread’ function is defined like so:
1 2 3 4 5 6 7 | (defn spread {:private true} [arglist] (cond (nil? arglist) nil (nil? (next arglist)) (seq (first arglist)) :else (cons (first arglist) (spread (next arglist))))) |
In this case we only have one item in ‘arglist’ so on line 6 the ‘next arglist’ expression evaluates to nil.
This means that we create a seq from the first argument of the ‘arglist’ which is ‘[1 2 3]‘.
Working our way back up to the ‘apply’ function what we end up with is this:
(. str (applyTo (seq [1 2 3]))) => "123"
This calls through to an ‘applyTo’ method defined on the ‘clojure.lang.IFn’ interface.
I’m not sure which of the implementations ‘str’ maps to but it seems like the ‘str’ function would eventually be called from the Java code with each of the values in the sequence passed in as a separate argument which is pretty neat!
Writing a Java function in Clojure
A function that we had to write in Java on a project that I worked on recently needed to indicate whether there was a gap in a series of data points or not.
If there were gaps at the beginning or end of the sequence then that was fine but gaps in the middle of the sequence were not.
null, 1, 2, 3 => no gaps 1, 2, 3, null => no gaps 1, null, 2, 3 => gaps
The Java version looked a bit like this:
public boolean hasGaps(List<BigInteger> values) { Iterator<BigInteger> fromHead = values.iterator(); while (fromHead.hasNext() && fromHead.next() == null) { fromHead.remove(); } Collections.reverse(values); Iterator<BigInteger> fromTail = values.iterator(); while (fromTail.hasNext() && fromTail.next() == null) { fromTail.remove(); } return values.contains(null); }
We take the initial list and then remove all the null values from the beginning of it, then reverse the list and remove all the values from the end.
We then check if there’s a null value and if there is then it would indicate there is indeed a gap in the list.
To write this function in Clojure we can start off by using the ‘drop-while‘ function to get rid of the trailing nil values.
I started off with this attempt:
(defn has-gaps? [list] let [no-nils] [drop-while #(= % nil) list] no-nils)
Unfortunately that gives us the following error!
Can't take value of a macro: #'clojure.core/let (NO_SOURCE_FILE:16)
It thinks we’re trying to pass around the ‘let’ macro instead of evaluating it – I forgot to put in the brackets around the ‘let’!
I fixed that with this next version:
(defn has-gaps? [list] (let [no-nils] [drop-while nil? list] no-nils))
But again, no love:
java.lang.IllegalArgumentException: let requires an even number of forms in binding vector (NO_SOURCE_FILE:23)
The way I understand it the ‘let’ macro takes in a vector of bindings as its first argument and what I’ve done here is pass in two vectors instead of one.
In the bindings vector we need to ensure that there are an even number of forms so that each symbol can be bound to an expression.
I fixed this by putting the two vectors defined above into another vector:
(defn has-gaps? [list] (let [[no-nils] [(drop-while nil? list)]] no-nils))
We can simplify that further so that we don’t have nested vectors:
(defn has-gaps? [list] (let [no-nils (drop-while nil? list)] no-nils))
The next step was to make ‘no-nils’ a function so that I could make use of that function when the list was reversed as well:
(defn has-gaps? [list] (let [no-nils (fn [x] (drop-while nil? x))] (no-nils list)))
I then wrote the rest of the function to reverse the list and then check the remaining list for nil:
(defn has-gaps? [list] (let [[no-nils] [(fn [x] (drop-while nil? x))] [nils-removed] [(fn [x] ((comp no-nils reverse no-nils) x))]] (some nil? (nils-removed list))))
The ‘comp‘ function can be used to compose a set of functions which is what I needed.
It seemed like the ‘nils-removed’ function wasn’t really necessary so I inlined that:
(defn has-gaps? [list] (let [no-nils (fn [x] (drop-while nil? x))] (some nil? ((comp no-nils reverse no-nils) list))))
The function can now be used like this:
user=> (has-gaps? '(1 2 3)) nil user=> (has-gaps? '(nil 1 2 3)) nil user=> (has-gaps? '(1 2 3 nil)) nil user=> (has-gaps? '(1 2 nil 3)) true
I’d be intrigued to know if there’s a better way to do this.
Clojure: Checking for a nil value in a collection
Something which I wanted to do recently was write a function that would indicate whether a collection contained a nil value.
I initially incorrectly thought the ‘contains?‘ function was the one that I wanted:
(contains? '(1 nil 2 3) nil) => false
I thought it would work the same as the Java equivalent but that function actually checks whether a key exists in a collection rather than a value. It’s more useful when dealing with maps.
There’s more discussion on the consistency of the API on the mailing list.
Luckily the documentation guides us towards the ‘some‘ function:
My first attempt was to write an anonymous function to check if there was a ‘nil’ in the list:
(some #(= % nil) '(1 nil 2 3)) => true
(some #(= % nil) '(1 2 3)) => nil
fogus showed me an even better way by making use of the built in ‘nil?‘ function:
(some nil? '(1 nil 2 3))
Another approach would be to make use of the Java ‘contains’ method as Philip Schwarz pointed out:
(.contains '(1 nil 2 3) nil) => true
I noticed that when you use Java methods in Clojure with collections then the result will either be ‘true’ or ‘false’ whereas when you use Clojure built in functions then it’s more likely to be ‘true’ or ‘nil’.
I guess this is linked to the idea that ‘nil’ is false in Clojure so it doesn’t make much difference what the return value is.
When I’m using a language I’ve got into the habit of just trying out the API in the way that I expect it to work rather than paying a lot of attention to what the API documentation says.
I think this is something I’ll need to work out to avoid much frustration!
Clojure: A few things I’ve been tripping up on
In my continued playing with Clojure I’m noticing a few things that I keep getting confused about.
The meaning of parentheses
Much like Keith Bennett I’m not used to parentheses playing such an important role in the way that an expression gets evaluated.
As I understand it if an expression is enclosed in parentheses then that means it will be evaluated as a function.
For example I spent quite a while trying to work out why the following code kept throwing a class cast exception:
(if (true) 1 0)
If you run that code in the REPL you’ll get the following exception because ‘true’ isn’t a function and therefore can’t be applied as such:
java.lang.ClassCastException: java.lang.Boolean cannot be cast to clojure.lang.IFn (NO_SOURCE_FILE:0)
If we don’t want something to be treated this way then the parentheses need to disappear!
(if true 1 0)
Truthyness
Somewhat related to the above is understanding which expressions evaluate to ‘true’ or ‘false’.
I’m told there are some edge cases but that as a general rule everything except for ‘false’ and ‘nil’ evaluates to true.
I think that’s an idea which is more common in languages like Ruby but I’m not yet used to the idea that we can something like this and have it execute:
(if "mark" 1 0)
In C# or Java I would except to have to compare “mark” to something in order for it to evaluate to a boolean result.
It seems like a really neat way to reduce the amount of code we have to write though so I like it so far.
Character Literals
I’ve been working through Mark Volkmann’s Clojure tutorial and in one example he defines the following function:
(def vowel? (set "aeiou"))
I wanted to try it out to see if a certain character was a vowel so I initially did this:
=>(vowel? "a") nil
“a” is actually a string though which means it’s an array of characters when what we really want is a single character.
I thought the following would be what I wanted:
(vowel? 'a')
Instead what I got was the following exception:
java.lang.Exception: Unmatched delimiter: )
This one just turned out to be a case of me not reading the manual very carefully and actually the following is what I wanted:
=> (vowel? \a) \a
Clojure: A first look at recursive functions
I’m working through Stuart Halloway’s ‘Programming Clojure‘ book and I just got to the section where it first mentions recursive functions.
It’s a simple function to countdown from a given number to zero and then return that sequence.
This was one of the examples from the book:
(defn countdown [result x] (if (zero? x) result (recur (conj result x) (dec x))))
That function could then be called like this:
(countdown [] 5)
I wanted to see what the function would look if we didn’t have the empty vector as a parameter.
From playing around with F# and Scala my first thought would be to write the function like this:
(defn count-down [from] (defn inner-count [so-far x] (if (zero? x) so-far (inner-count (conj so-far x) (dec x)))) (inner-count [] from))
As the book points out a bit further on, Clojure doesn’t perform automatic tail call optimisation so we end up with a stack overflow exception if we run the function with a big enough input value.
Clojure does optimise calls to ‘recur’ so it makes more sense to use that if we want to avoid that problem.
This is an example which makes use of that:
(defn count-down [from] (defn inner-count [so-far x] (if (zero? x) so-far (recur (conj so-far x) (dec x)))) (inner-count [] from))
Looking through the Clojure mailing list at a similar problem I noticed that one of the suggestions was to arity overload the function to include an accumulator.
(defn count-down ([from] (count-down [] from)) ([so-far from] (if (zero? from) so-far (recur (conj so-far from) (dec from)))))
Written this way it feels a little bit like Haskell or Erlang but probably not idiomatic Clojure.
Anyway on the next page Halloway shows a better way to do this with much less code!
(into [] (take 5 (iterate dec 5)))
I noticed that in Scala the idea of using ‘take’ and ‘drop’ on streams of values seems to be quite popular so I’m intrigued as to whether I’ll find the same thing with Clojure.