Mark Needham

Thoughts on Software Development

Archive for the ‘Clojure’ Category

Clojure: Casting to a Java class…or not!

without comments

I have a bit of Java code for working out the final destination of a URL assuming that there might be one redirect which looks like this:

private String resolveUrl(String url) {
  try {
    HttpURLConnection con = (HttpURLConnection) (new URL(url).openConnection());
    con.setInstanceFollowRedirects(false);
    con.connect();
    int responseCode = con.getResponseCode();
 
    if (String.valueOf(responseCode).startsWith("3")) {
      return con.getHeaderField("Location");
    }
  } catch (IOException e) {
    return url;
  }
 
  return url;
}

I need to cast to HttpURLConnection on the first line so that I can make the call to setInstanceFollowRedirects which isn’t available on URLConnection.

I wanted to write some similar code in Clojure and my first thought was that I needed to work out how to do the cast, which I didn’t know how to do.

I then remembered that Clojure is actually dynamically typed so there isn’t any need – as long as the object has the method that we want to call on it everything will be fine.

In this case we end up with the following code:

(defn resolve-url [url]
  (let [con (.. (new URL url) openConnection)]
    (doall
     (.setInstanceFollowRedirects con false)
     (.connect con))
    (if (.startsWith (str (.getResponseCode con)) "3")
      (.getHeaderField con "Location")
      url)))

Which can be simplified to this:

(defn resolve-url [url]
  (let [con (doto (.. (new URL url) openConnection)
                  (.setInstanceFollowRedirects false)
                  (.connect))]
  (if (.startsWith (str (.getResponseCode con)) "3")
    (.getHeaderField con "Location")
    url)))

Written by Mark Needham

December 31st, 2011 at 5:47 pm

Posted in Clojure

Tagged with

Leiningen: Using goose via a local Maven repository

with 3 comments

I’ve been playing around a little bit with goose – a HTML content/article extractor – originally in Java but later in clojure where I needed to work out how to include goose and all its dependencies via Leiningen.

goose isn’t included in a Maven repository so I needed to create a local repository, something which I’ve got stuck on in the past.

Luckily Paul Gross has written a cool blog post explaining how his team got past this problem.

Following the instructions from Paul’s post this is how I got goose playing nicely with clojure:

Inside my clojure project:

/Users/mneedham/github/android/text-extraction $ mkdir maven_repository

I then ran the following command from where I had goose checked out on my machine:

mvn install:install-file -Dfile=target/goose-2.1.6.jar -DartifactId=goose -Dversion=2.1.6 -DgroupId=goose -Dpackaging=jar -DlocalRepositoryPath=/Users/mneedham/github/android/text-extraction/maven_repository -DpomFile=pom.xml

I added the repository and goose dependency to my project.clj file which now looks like this:

(defproject textextraction "0.1.0"
  :description "Extract text from urls"
  :dependencies [[org.clojure/clojure "1.2.0"],
		 [org.clojure/clojure-contrib "1.2.0"],
		 [ring/ring-jetty-adapter "0.3.11"],
         [compojure "0.6.4"]
         [goose "2.1.6"]]
  :dev-dependencies [[swank-clojure "1.2.1"]]
  :repositories {"local" ~(str (.toURI (java.io.File. "maven_repository")))}
  :main textextraction.main)

I then run:

/Users/mneedham/github/android/text-extraction $ lein run

And goose and all its dependencies are included in the ‘lib’ directory.

Written by Mark Needham

December 27th, 2011 at 12:48 pm

Posted in Clojure

Tagged with , ,

Clojure: Getting caught out by lazy collections

with one comment

Most of the work that I’ve done with Clojure has involved running a bunch of functions directly in the REPL or through Leiningen’s run target which led to me getting caught out when I created a JAR and tried to run that.

As I mentioned a few weeks ago I’ve been rewriting part of our system in Clojure to see how the design would differ and a couple of levels down the Clojure version comprises of applying a map function over a collection of documents.

The code in question originally looked like this:

(ns aim.main (:gen-class))
 
(defn import-zip-file [zipFile working-dir]
  (let [xml-files (filter xml-file? (unzip zipFile working-dir))]
    (map import-document xml-files)))
 
(defn -main [& args]
  (import-zip-file "our/file.zip", "/tmp/unzip/to/here"))

Which led to absolutely nothing happening when run like this!

$ lein uberjar && java -jar my-project-0.1.0-standalone.jar

I originally assumed that I had something wrong in the code but my colleague Uday reminded me that collections in Clojure are lazily evaluated and there was nothing in the code that would force the evaluation of ours.

In this situation we had to wrap the map with a doall in order to force evaluation of the collection:

(ns aim.main (:gen-class))
 
(defn import-zip-file [zip-file working-dir]
  (let [xml-files (filter xml-file? (unzip zip-file working-dir))]
    (doall (map import-document xml-files))))
 
(defn -main [& args]
  (import-zip-file "our/file.zip", "/tmp/unzip/to/here"))

When we run the code in the REPL or through ‘lein run’ the code is being eagerly evaluated as far as I understand it which is why we see a different behaviour than when we run it on its own.

I also got caught out on another occasion where I tried to pass around a collection of input streams which I’d retrieved from a zip file only to realise that when the code which used the input stream got evaluated the ZIP file was no longer around!

Written by Mark Needham

July 31st, 2011 at 9:40 pm

Posted in Clojure

Tagged with

Clojure: Creating XML document with namespaces

without comments

As I mentioned in an earlier post we’ve been parsing XML documents with the Clojure zip-filter API and the next thing we needed to do was create a new XML document containing elements which needed to be inside a namespace.

We wanted to end up with a document which looked something like this:

<root>
<mynamespace:foo xmlns:mynamespace="http://www.magicalurlfornamespace.com">
	<mynamespace:bar>baz</mynamespace:bar>
</mynamespace:foo>
</root>

We can make use of lazy-xml/emit to output an XML string from *some sort of input?* by wrapping it inside with-out-str like so:

(require '[clojure.contrib.lazy-xml :as lxml])
(defn xml-string [xml-zip] (with-out-str (lxml/emit xml-zip)))

I was initially confused about how we’d be able to create a map representing name spaced elements to pass to xml-string but it turned out to be reasonably simple.

To create a non namespaced XML string we might pass xml-string the following map:

(xml-string {:tag :root :content [{:tag :foo :content [{:tag :bar :content ["baz"]}]}]})

Which gives us this:

"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<root>
	<foo>
		<bar>baz</bar>
	</foo>
</root>"

Ideally I wanted to prepend :foo and :bar with ‘:mynamespace” but I thought that wouldn’t work since that type of syntax would be invalid in Ruby and I thought it’d be the same in Clojure.

mneedham@Administrators-MacBook-Pro-5.local ~$ irb
>> { :mynamespace:foo "bar" }
SyntaxError: compile error
(irb):1: odd number list for Hash
{ :mynamespace:foo "bar" }
               ^
(irb):1: syntax error, unexpected ':', expecting '}'
{ :mynamespace:foo "bar" }
               ^
(irb):1: syntax error, unexpected '}', expecting $end
	from (irb):1
>>

In fact it isn’t so we can just do this:

(xml-string {:tag :root 
  :content [{:tag :mynamespace:foo :attrs {:xmlns:meta "http://www.magicalurlfornamespace.com"} 
              :content [{:tag :mynamespace:bar :content ["baz"]}]}]})
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<root>
<mynamespace:foo xmlns:meta=\"http://www.magicalurlfornamespace.com">
	<mynamespace:bar>baz</mynamespace:bar>
</mynamespace:foo>
</root>"

As a refactoring step, since I had to append the namespace to a lot of tags, I was able to make use of the keyword function to do so:

(defn tag [name value] {:tag (keyword (str "mynamespace" name)) :content [value]})
> (tag :Foo "hello")  
{:tag :mynamespace:Foo, :content ["hello"]}

Written by Mark Needham

July 20th, 2011 at 8:28 pm

Posted in Clojure

Tagged with

Clojure: Extracting child elements from an XML document with zip-filter

without comments

I’ve been following Nurullah Akkaya’s blog post about navigating XML documents using the Clojure zip-filter API and I came across an interesting problem in a document I’m parsing which goes beyond what’s covered in his post.

Nurullah provides a neat zip-str function which we can use to convert an XML string into a zipper object:

(require '[clojure.zip :as zip] '[clojure.xml :as xml])
(use '[clojure.contrib.zip-filter.xml])
 
(defn zip-str [s]
  (zip/xml-zip (xml/parse (java.io.ByteArrayInputStream. (.getBytes s)))))

The fragment of the document I’m parsing looks like this:

(def test-doc (zip-str "<?xml version='1.0' encoding='UTF-8'?>
<root>
  <Person>
    <FirstName>Charles</FirstName>
    <LastName>Kubicek</LastName>
  </Person>
  <Person>
    <FirstName>Mark</FirstName>
    <MiddleName>H</MiddleName>
    <LastName>Needham</LastName>
  </Person>	
</root>"))

I wanted to be able to get the full names of each of the people such that I’d have a collection which looked like this:

("Charles Kubicek" "Mark H Needham")

My initial thinking was to get all the child elements of the Person element and operate on those:

(require '[clojure.contrib.zip-filter :as zf])
 
(xml-> test-doc :Person zf/children text)

Unfortunately that gives back all the names in one collection like so:

("Charles" "Kubicek" "Mark" "H" "Needham")

Since it’s not mandatory to have a MiddleName element it’s not possible to work out which names go with which person!

A bit of googling led me to stackoverflow where Timothy Pratley suggests that we need to get up to the Person element and then pick each of the child elements individually.

We can do that by mapping over the collection with a function which creates a vector for each Person containing all their names.

In pseudo-code this is what we want to do:

> (map magic-function (xml-> test-doc :Person))
(["Charles" "Kubicek"] ["Mark" "H" "Needham"])

Timothy suggests the juxt function which is defined like so:

juxt

Takes a set of functions and returns a fn that is the juxtaposition of those fns. The returned fn takes a variable number of args, and returns a vector containing the result of applying each fn to the args (left-to-right).

A simple use of juxt could be to create some values containing my name:

((juxt #(str % " loves Clojure") #(str % " loves Scala")) "Mark")

Which returns:

["Mark loves Clojure" "Mark loves Scala"]

We can use juxt to build the collection of names and then use clojure.string/join to separate them with a space.

The code to do this ends up looking like this:

(require '[clojure.string :as str])
 
(defn get-names [doc]
  (->> (xml-> doc :Person)
       (map (juxt #(xml1-> % :FirstName text) #(xml1-> % :MiddleName text) #(xml1-> % :LastName text)))
       (map (partial filter seq))
       (map (partial str/join " "))))

We use a filter on the second last line to get rid of any nil values in the vector (e.g. no middle name) and then combine the names on the last line.

We can then call the function:

> (get-names test-doc)
("Charles Kubicek" "Mark H Needham")

Written by Mark Needham

July 16th, 2011 at 10:19 pm

Posted in Clojure

Tagged with

Clojure: Language as thought shaper

with 3 comments

I recently read an interesting article by Tom Van Cutsem where he describes some of the goals that influence the design of programming languages and one which stood out to me is that of viewing ‘language as a thought shaper’:

Language as thought shaper: to induce a paradigm shift in how one should structure software (changing the “path of least resistance”).

To quote Alan Perlis: “a language that doesn’t affect the way you think about programming, is not worth knowing.”

The goal of a thought shaper language is to change the way a programmer thinks about structuring his or her program.

I’ve been rewriting part of the current system that I’m working on in Clojure in my spare time to see how the design would differ and it’s interesting to see that it’s quite different.

The part of the system I’m working on needs to extract a bunch of XML files from ZIP files and then import those into the database.

From a high level the problem can be described as follows:

  • Get all files in specified directory
  • Find only the ZIP files
  • Find the XML files in those ZIP files
  • Categorise the XML files depending on whether we can import them
  • Add an additional section to good files to allow for easier database indexing
  • Import the new version of the files into the database

Clojure encourages a design based around processing lists and this problem seems to fit that paradigm very neatly.

We can make use of the ->> macro to chain together a bunch of functions originally acting on the specified directory to allow us to achieve this.

At the moment this is what the entry point of the code looks like:

(defn parse-directory [dir]                                                                                                                                                                           
  (->> (all-files-in dir)                                                                                         
       (filter #(.. (canonical-path %1) (endsWith ".zip")))                                                                                                                                                                 
       (mapcat (fn [file] (extract file)))                                                                         
       (filter (fn [entry] (. (entry :name) (endsWith ".xml"))))                                                                                                                                                 
       (map #(categorise %))))

The design of the Scala code is a bit different even though the language constructs exist to make a similar design possible.

The following are some of the classes involved:

  • ImportManager – finds the XML files in the ZIP files, delegates to DocumentMatcher
  • DeliveryManager – gets all the ZIP files from specified directory
  • DocumentMatcher – checks if XML document matches any validation rules and wraps in appropriate object
  • ValidDocument/InvalidDocument – wrap the XML document and upload to database in the case of the former
  • ValidationRule – checks if the document can be imported into the system

It was interesting to me that when I read the Scala code the problem appeared quite complicated whereas in Clojure it’s easier to see the outline of what the program does.

I think is because we’re trying to shoe horn a pipes and filters problems into objects which leaves us with a design that feels quite unnatural.

I originally learnt this design style while playing around with F# a couple of years ago and it seems to work reasonably well in most functional languages.

Written by Mark Needham

July 10th, 2011 at 10:21 pm

Posted in Clojure

Tagged with

Clojure: Equivalent to Scala’s flatMap/C#’s SelectMany

without comments

I’ve been playing around with Clojure a bit over the weekend and one thing I got stuck with was working out how to achieve the functionality provided by Scala’s flatMap or C#’s SelectMany methods on collections.

I had a collection of zip files and wanted to transform that into a collection of all the file entries in those files.

If we just use map then we’ll end up with a collection of collections which is more difficult to deal with going forward.

In Scala we’d do the following:

import scala.collection.JavaConversions._
 
val zip1 = new ZipFile(new File("/Users/mneedham/Documents/my-zip-file.zip"))
val zip2 = new ZipFile(new File("/Users/mneedham/Documents/my-zip-file2.zip"))
 
List(zip1, zip2).flatMap(_.entries)

I was originally make using of map followed by flatten but I learnt from my colleague Phil Calcado that the function I wanted is mapcat which leads to this solution:

(def zip1 (new ZipFile (file "/Users/mneedham/Documents/my-zip-file.zip")))
(def zip2 (new ZipFile (file "/Users/mneedham/Documents/my-zip-file2.zip")))
 
(mapcat (fn [file] (enumeration-seq (.entries file))) (list zip1 zip2))

I also learnt about the various functions available to create sequences, such as enumeration-seq from other types which are listed at the bottom of this page.

Scala uses implicit conversions to do that and presumably you’d hide away the conversion in a helper function in Clojure.

Written by Mark Needham

July 3rd, 2011 at 10:50 pm

Posted in Clojure

Tagged with

Clojure: My first attempt at a macro

with one comment

I’m up to the chapter on using macros in Stuart Halloway’s ‘Programming Clojure‘ book and since I’ve never used a language which has macros in before I thought it’d be cool to write one.

In reality there’s no reason to create a macro to do what I want to do but I wanted to keep the example simple so I could try and understand exactly how macros work.

I want to create a macro which takes in one argument and then prints hello and the person’s name.

In the book Halloway suggests that we should start with the expression that we want to end up with, so this is what I want:

(println "Hello" person)

My first attempt to do that was:

(defmacro say-hello [person]
  println "Hello" person)

I made the mistake of forgetting to include the brackets around the ‘println’ expression so it doesn’t actually pass ‘”Hello”‘ and ‘person’ to ‘println’. Instead each symbol is evaluated individually.

When we evaluate this in the REPL we therefore don’t quite get what we want:

user=> (say-hello "mark")          
"mark"

Expanding the macro results in:

user=> (macroexpand-1 '(say-hello "Mark"))
"Mark"

Which is the equivalent of doing this:

user=> (eval (do println "hello" "Mark")) 
"Mark"

As I wrote previously this is because ‘do’ evaluates each argument in order and then returns the last one which in this case is “Mark”.

I fixed that mistake and got the following:

(defmacro say-hello [person]
  (println "Hello" person))

Which returns the right result…

user=> (say-hello "Mark")
Hello Mark
nil

…but actually evaluated the expression rather than expanding it because I didn’t escape it correctly:

user=> (macroexpand-1 '(say-hello "Mark"))
Hello Mark
nil

After these failures I decided to try and change one of the examples from the book instead of my trial and error approach.

One approach used is to build a list of Clojure symbols inside the macro definition:

(defmacro say-hello [person]
  (list println "hello" person))
user=> (macroexpand-1 '(say-hello "Mark"))
(#<core$println__5440 clojure.core$println__5440@681ff4> "hello" "Mark")

This is pretty much what we want and although the ‘println’ symbol has been evaluated at macro expansion time it doesn’t actually make any difference to the way the macro works.

We can fix that by escaping ‘println’ so that it won’t be evaluated until evaluation time:

(defmacro say-hello [person]
  (list 'println "hello" person))
user=> (macroexpand-1 '(say-hello "Mark"))
(println "hello" "Mark")

I thought it should also be possible to quote(‘) the whole expression instead of building up the list:

(defmacro say-hello [person] 
  '(println "hello" person))

This expands correctly but when we try to use it this happens:

user=> (say-hello "Mark")
java.lang.Exception: Unable to resolve symbol: person in this context

The problem is that when we use quote there is no evaluation of any of the symbols in the expression so the symbol ‘person’ is only evaluated at runtime and since it hasn’t been bound to any value we end up with the above error.

If we want to use the approach of non evaluation then we need to make use of the backquote(`) which stops evaluation of anything unless it’s preceded by a ~.

(defmacro a [person]
  `(println "hello" ~person))

This allows us to evaluate ‘person’ at expand time and replace it with the appropriate value.

In hindsight the approach I took to write this macro was pretty ineffective although it’s been quite interesting to see all the different ways that I’ve found to mess up the writing of one!

Thanks to A. J. Lopez, Patrick Logan and fogus for helping me to understand all this a bit better than I did to start with!

Written by Mark Needham

December 12th, 2009 at 3:53 am

Posted in Clojure

Tagged with

Clojure: Forgetting the brackets

without comments

I’ve been playing around with macros over the last few days and while writing a simple one forgot to include the brackets to make it evaluate correctly:

(defmacro say-hello [person]
  println "Hello" person)

This macro doesn’t even expand like I thought it would:

user=> (macroexpand-1 '(say-hello blah))
blah

That seemed a bit strange to me but I eventually realised that I’d missed off the brackets around ‘println’ and the arguments following it which would have resulted in ‘println’ being evaluated with those arguments.

I was a bit curious as to why that happened so I tried the following expression without any brackets to see what would happen:

user=> println "hello" "mark"
#<core$println__5440 clojure.core$println__5440@681ff4>
"mark"
"random"

It seems to just evaluate each thing individually and when we put this type of expression into a function definition the function will do the same thing but also return the last thing evaluated:

(defn say-hello [] println "hello" "mark")
user=> (say-hello)
"mark"

A. J. Lopez pointed out that this is quite like progn in other LISPs and is the same as doing the following:

user=> (do println "hello" "mark")
"mark"

do is defined as follows:

(do exprs*)
Evaluates the expressions in order and returns the value of the last. If no expressions are supplied, returns nil.

The way to write a function which passes those two arguments to ‘println’ is of course to put brackets around the statement:

(defn say-hello [] (println "hello" "mark"))
user=> (say-hello) 
hello mark
nil

Written by Mark Needham

December 12th, 2009 at 3:51 am

Posted in Clojure

Tagged with

Clojure: when-let macro

with 6 comments

In my continued playing around with Clojure I came across the ‘when-let‘ macro.

‘when-let’ is used when we want to bind an expression to a symbol and only execute the body provided as the second argument to the macro if that symbol evaluates to true.

As I wrote previously, a value of ‘false’ or ‘nil’ would result in the second argument not being evaluated.

A simple example of using ‘when-let’ would be:

(when-let [a 2] (println "The value of a is:" a))

This is the definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
(defmacro when-let
  "bindings => binding-form test
 
  When test is true, evaluates body with binding-form bound to the value of test"
  [bindings & body]
  (assert-args when-let
     (vector? bindings) "a vector for its binding"
     (= 2 (count bindings)) "exactly 2 forms in binding vector")
   (let [form (bindings 0) tst (bindings 1)]
    `(let [temp# ~tst]
       (when temp#
         (let [~form temp#]
           ~@body)))))

The ‘assert-args’ call at the beginning of the macro is quite interesting.

Two assertions are stated:

  • The first argument should be a vector
  • That vector should contain exactly two forms

I’ve not used dynamic languages very much before but it seems like this is one way for a dynamic language to fail fast by checking that the arguments are as expected. In a static language that would be a compile time check.

Line 9 is quite interesting as we know that ‘bindings’ will be a vector so we can take the ’0th’ and ’1st’ elements from it and bind them to ‘form’ and ‘tst’ respectively. I didn’t quite pick up on the first few times I read it.

On line 10 it makes use of ‘auto-gensym’ to create a unique name which begins with ‘temp’ and is bound to the value of ‘tst’ which in the simple example provided would be the value ’2′. As I understand it the name would be something like ‘temp__304′ or something similarly random!

‘when 2′ evaluates to true which means that we execute the body provided as the second argument.

user=> (when-let [a 2] (println "The value of a is:" a))
The value of a is: 2
nil

This is a bit of a contrived example of using the construct and it seems to be properly used when we’re getting a value out of a list and want to check whether or not we’ve reached the end of that list or not. If we have then eventually we’ll have a value of ‘nil’ bound by the ‘let’ and then we’ll know we’re finished.

An example of where the body wouldn’t be evaluated is:

user=> (when-let [a nil] (println "This won't get printed"))
nil

I don’t really understand why we need to bind ‘form’ to ‘temp’ on the second last line as it doesn’t seem like the value is used? I’m sure there’s probably something I’m missing there so if anyone could point it out that’d be cool!

As I understand it, the ‘~@body’ on the last line is called the ‘splicing unquote’ and it allows the individual values in ‘body’ to be put into the template started at ‘`(let [temp# ~tst]‘ individually rather than just being put in there as a list.

Written by Mark Needham

December 9th, 2009 at 2:41 am

Posted in Clojure

Tagged with