Mark Needham

Thoughts on Software Development

Archive for the ‘Scala’ Category

Scala/Haskell: A simple example of type classes

without comments

I never really understood type classes when I was working with Scala but I recently came across a video where Dan Rosen explains them pretty well.

Since the last time I worked in Scala I’ve been playing around with Haskell where type classes are much more common – for example if we want to compare two values we need to make sure that their type extends the ‘Eq’ type class.

Learn Me a Haskell’s chapter on type classes defines them like so:

Typeclasses are like interfaces.

A typeclass defines some behaviour (like comparing for equality, comparing for ordering, enumeration) and then types that can behave in that way are made instances of that typeclass.

The behaviour of typeclasses is achieved by defining functions or just type declarations that we then implement. So when we say that a type is an instance of a typeclass, we mean that we can use the functions that the typeclass defines with that type.

In that chapter we then go on to create a ‘YesNo’ type class which defines Boolean semantics for all different types.

We start by defining the type class like so:

class YesNo a where
  yesno :: a -> Bool

Any type which extends that type class can call this ‘yesno’ function and find out the truthyness of its value.


instance YesNo Int where
  yesno 0 = False
  yesno _ = True

If we call that:

> yesno (0 :: Int)
> yesno (1 :: Int)

We get the expected result, but if we try to call it for a type which hasn’t defined an instance of ‘YesNo’:

> yesno "mark"
    No instance for (YesNo [Char])
      arising from a use of `yesno'
    Possible fix: add an instance declaration for (YesNo [Char])
    In the expression: yesno "mark"
    In an equation for `it': it = yesno "mark"

In Scala we can use traits and implicits to achieve the same effect. First we define the ‘YesNo’ trait:

trait YesNo[A] {
  def yesno(value:A) : Boolean

Then we define an implicit value in a companion object which creates an instance of the ‘YesNo’ type class for Ints:

object YesNo {
  implicit val intYesNo = new YesNo[Int] { 
    def yesno(value:Int) = 
      value match { case 0 => false;  case _ => true } }

We then need to call our ‘yesno’ function and the idea is that if we’ve defined a type class instance for the type we call it with it will return us a boolean value and if not then we’ll get a compilation error.

object YesNoWriter {
  def write[A](value:A)(implicit  conv: YesNo[A]) : Boolean = {

If we call that:

> YesNoWriter.write(1)
res1: Boolean = true
> YesNoWriter.write(0)
res2: Boolean = false

It works as expected, but if we try to call it with a type which wasn’t defined for the ‘YesNo’ type class we run into trouble:

> YesNoWriter.write("mark")
:10: error: could not find implicit value for parameter conv: YesNo[java.lang.String]

We can also define YesNoWriter like this by making use of context bounds:

object YesNoWriter {
  def write[A:YesNo](value:A) : Boolean = {

I think this pattern is preferred when we might just be tunnelling the implicit parameter through to another method but we can still use it here and use the ‘implicitly’ method to get access to the implicit value.

I’m still not entirely sure about the use of implicits but in this case they provide another way to implement polymorphism without having to use inheritance.

Dan Rosen goes into much more detail about type classes and implicits and I wrote about how we were using them on a project I worked on last year in an earlier blog post if you want to learn more.

Written by Mark Needham

May 22nd, 2012 at 10:26 am

Posted in Haskell,Scala

Tagged with ,

Scala: Counting number of inversions (via merge sort) for an unsorted collection

with 2 comments

The first programming questions of algo-class requires you to calculate the number of inversions it would take using merge sort to sort a collection in ascending order.

I found quite a nice explanation here too:

Finding “similarity” between two rankings. Given a sequence of n numbers 1..n (assume all numbers are distinct). Define a measure that tells us how far this list is from being in ascending order. The value should be 0 if a_1 < a_2 < ... < a_n and should be higher as the list is more "out of order". e.g. 2 4 1 3 5 1 2 3 4 5 The sequence 2, 4, 1, 3, 5 has three inversions (2,1), (4,1), (4,3).

The simple/naive way of solving this problem is to iterate through the collection in two loops and compare each value and its current index with the others, looking for ones which are not in the right order.

I wrote the following Ruby code which seems to do the job:

def number_of_inversions(arr)
  count = 0
  arr.each_with_index do |item_a, index_a|
    arr.each_with_index do |item_b, index_b|
      if index_b >= index_a && item_a > item_b
        count +=1
>> number_of_inversions [6,5,4,3,2,1]
=> 15

That works fine for small collections but since we’re effectively doing nested for loops it actually takes O(n²) time which means that it pretty much kills my machine on the sample data set of 100,000 numbers.

An O(n log(n)) solution is explained in the lectures which involves recursively splitting the collection in half (like in merge sort) and then counting how many elements remain in the left collection when we select an item from the right collection since this will tell us how many positions/inversions out of place that element is.

e.g. if we’re trying to work out how many inversions in the collection [1,3,5,2,4,6] we eventually end up merging [1,3,5] and [2,4,6]

  • Our first iteration puts ‘1’ from the left collection into the new array.
  • The second iteration puts ‘2’ from the right collection into the new array and there are two elements left in the left collection (‘3’ and ‘5’) so we have 2 inversions (3,2 and 5,2).
  • Our third iteration puts ‘3’ from the left collection into the new array.
  • Our fourth iteration puts ‘4’ from the right collection into the new array and there is one element left in the left collection (‘5’) so we have 1 inversion (5,4)
  • Our fifth iteration puts ‘5’ from the left collection and the sixth puts ‘6’ from the right collection in.

The number of inversions for that example is therefore 3.

My colleague Amir and I decided to try and write some Scala code to do this and ended up with the following adapted merge sort:

import io.Source
object MSort {
  def main(args:Array[String]) {
    val fileWithNumbers = "/Users/mneedham/Documents/algo-class/IntegerArray.txt"
    val inversions: BigInt = numberOfInversions(Source.fromFile(new
  def numberOfInversions(collection: List[Int]): BigInt = {
    var count: BigInt = 0
    def inversionsInner(innerCollection: List[Int]): List[Int] = {
      def merge(left: List[Int], right: List[Int]): Stream[Int] = (left, right) match {
        case (x :: xs, y :: ys) if x < y=> { Stream.cons(x, merge(xs, right)) }
        case (x :: xs, y :: ys) => { count = count + left.length; Stream.cons(y, merge(left, ys)) }
        case _ => if (left.isEmpty) right.toStream else left.toStream
      val n = innerCollection.length / 2
      if (n == 0) innerCollection
      else {
        val (lowerHalf, upperHalf) = innerCollection splitAt n
        merge(inversionsInner(lowerHalf), inversionsInner(upperHalf)).toList

The interesting line is number 15 where we are going to select a value from the right collection and therefore increment our count by the number of items left in the left collection.

It works but it’s a bit annoying that we had to use a ‘var’ to keep track of the count since that’s not really idiomatic Scala.

I’ve been trying to find a way of passing the count around as an accumulator and returning it at the end but really struggled to get the code to compile when I started doing that and seemed to make the code a lot more complicated than it is now.

I’m sure there is a way to do it but I can’t see it at the moment!

Since the mutation is reasonably well encapsulated I’m not sure whether it really matters that much but it’s always interesting an interesting exercise to try and write code which doesn’t mutate state so if anyone can see a good way to do it let me know.

Written by Mark Needham

March 20th, 2012 at 6:53 am

Posted in Scala

Tagged with ,

Scala: Converting a scala collection to java.util.List

with 3 comments

I’ve been playing around a little with Goose – a library for extracting the main body of text from web pages – and I thought I’d try converting some of the code to be more scala-esque in style.

The API of the various classes/methods is designed so it’s interoperable with Java code but in order to use functions like map/filter we need the collection to be a Scala one.

That’s achieved by importing ‘scala.collections.JavaConversions._’ which will apply an implicit conversion to convert the Java collection into a Scala one.

I needed to go back to the Java one again which can be achieved with the following code:

import scala.collection.JavaConversions._
val javaCollection = seqAsJavaList(Seq("abc"))

I also used that function in the StopWords.scala object in Goose.

There are a load of other functions available in JavaConversions as well for going to a Dictionary, Map, Set and so on.

Written by Mark Needham

February 5th, 2012 at 9:40 pm

Posted in Scala

Tagged with

Scala: Our Retrospective of the benefits/drawbacks

with 7 comments

As the closing part of a Scala Experience Report Liz and I gave at XP Day we detailed a retrospective that we’d carried out on the project after 3 months where the team outlined the positives/negatives of working with Scala.

The team members who were there right at the beginning of the project 3 months earlier had come up with what they thought the proposed benefits/drawbacks would be so it was quite interesting to look at our thoughts at both times.

Some of this is available in our slides from the talk but Nat Pryce suggested it’d be interesting to post it up in more detail.

We weren’t aware that we’d be doing this exercise until the session where we did it and noone looked at the original answers so hopefully some of the potential biases have been removed!


  • +++ Increased developer productivity

    • Higher-level language constructs (functional programming, actors, pattern matching, mixins, etc.)
    • Less code -> less time spent reading code / less defects
    • Syntax is better suited for writing DSLs (e.g. SBT, Scalatra, ScalaTest, etc.)
  • +++ Bigger potential to attract talented developers (not using the same old ‘boring’ stack)
  • ++ Gentle learning curve for Java devs
  • + Built-in support at language-level for handling XML
  • + Comes with SBT, a powerful build tool
  • + Seamlessly integrates with Java and it’s ecosystem
  • + Runs on the JVM (i.e. no operational concerns)
  • — Bigger potential to screw things up (think: “with great power comes…”)
  • — Tool support is less mature and polished (e.g. IDEs, profilers, metrics, etc.)
  • – Community is younger and smaller
  • – Scala compiler seems to be slower than Java counterparts



  • +8 Easy to learn
  • +8 Functional Language (Immutable, closures, etc)
  • +6 Concise code
  • +5 SBT power
  • +4 Case classes
  • +4 XML support
  • +4 Java integration
  • +3 List processing
  • +3 DSL support
  • +2 Helpful community (IRC, StackOverflow)
  • +2 Performance


  • -8 IDE support (refactoring, plugin quality)
  • -5 Slow compiler
  • -3 Code can become complex to read
  • -2 Lack of XPath support in XML
  • -2 SBT complexity
  • -2 Immature frameworks

Quite a few of the expected benefits from June were observed in June, such as having to write less code, functional programming constructs, XML support and the ability to write DSLs.

The community was one benefit which wasn’t expected – we’ve found that every time we get stuck on something we can go on Stack Overflow and find the answer and if that doesn’t work then someone on IRC will be able to help us almost immediately.


Our experience with Scala’s complexity partly matches with that of Stephen Coulbourne who suggests the following:

Scala appears to have attracted developers who are very comfortable with type theory, hard-core functional programming and the mathematical end of programming.

There is also a sense that many in the Scala community struggle to understand how other developers cannot grasp Scala/Type/FP concepts which seem simple to them. This sometimes leads Scala aficionados to castigate those that don’t understand as lazy or poor quality developers.

We’ve tried to be reasonably sensible with the language and only used bits of it that the whole team are likely to understand rather than learning some obscure way of solving a problem and checking that in.

On the other hand reading the code of Scala libraries such as scalaz or SBT is something that I, at least, find extremely difficult.

Changing the SBT build files can be quite a scary experience while you try and remember what all the different symbols mean and how they integrate together.

Learning curve

The learning curve for Java developers has been a bit of a mixed experience.

When we started working on the project we were effectively writing Java in Scala and we’ve slowly learnt/introduced more Scala features into our code as time has passed.

I think everyone who has come on that journey has found the transition reasonably okay but we’ve had other team members who joined later on and went straight into code that they weren’t familiar with and for them it’s been more difficult.

Again, again!

It will be interesting to see the team’s thoughts if we do the exercise again 3 more months on.

I would imagine there would be more ‘dislikes’ around code complexity now that the code has grown even more in size.

It probably also mean the lack of IDE support becomes more annoying as people want to refactor code and can’t get the seamless experience that you get when editing Java code.

Written by Mark Needham

November 28th, 2011 at 12:15 am

Posted in Scala

Tagged with

Java/Scala: Runtime.exec hanging/in ‘pipe_w’ state

with 7 comments

On the system that I’m currently working on we have a data ingestion process which needs to take zip files, unzip them and then import their contents into the database.

As a result we delegate from Scala code to the system unzip command like so:

def extract {
  var command = "unzip %s -d %s" format("/file/to/", "/place/to/unzip/to")
  var process: Process = null
  try {
    process = Runtime.getRuntime.exec(command)
    val exitCode = process.waitFor
  } catch {
    case e : Exception => // do some stuff
  } finally {
    // close the stream here

We ran into a problem where the unzipping process was hanging and executing ‘ps’ showed us that the ‘unzip’ process was stuck in the ‘pipe_w’ (pipe waiting) state which suggested that it was waiting for some sort of input.

After a bit of googling Duncan found this blog which explained that we needed to process the output stream from our process otherwise it might end up hanging

a.k.a. RTFM:

The Runtime.exec methods may not work well for special processes on certain native platforms, such as native windowing processes, daemon processes, Win16/DOS processes on Microsoft Windows, or shell scripts.

The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (Process.getOutputStream(), Process.getInputStream(), Process.getErrorStream()).

The parent process uses these streams to feed input to and get output from the subprocess.

Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.

For most of the zip files we presumably hadn’t been reaching the limit of the buffer because the list of files being sent to STDOUT by ‘unzip’ wasn’t that high.

In order to get around the problem we needed to gobble up the output stream from unzip like so:

def extract {
  var command = "unzip %s -d %s" format("/file/to/", "/place/to/unzip/to")
  var process: Process = null
  try {
    process = Runtime.getRuntime.exec(command)
    val thisVariableIsNeededToSuckDataFromUnzipDoNotRemove = "Output: " + IOUtils.readLines(process.getInputStream)
    val exitCode = process.waitFor
  } catch {
    case e : Exception => // do some stuff
  } finally {
    // close the stream here

We need to do the same thing with the error stream as well in case ‘unzip’ ends up overflowing that buffer as well.

On a couple of blog posts that we came across it was suggested that we should ‘gobble up’ the output and error streams on separate threads but we weren’t sure why exactly that was considered necessary…

If anyone knows then please let me know in the comments.

Written by Mark Needham

November 20th, 2011 at 8:20 pm

Posted in Java,Scala

Tagged with

Scala: scala.xml.SpecialNode: StackOverFlowError

without comments

We have some code in our application where we parse reasonably complex XML structures and then sometimes choose to get rid of certain elements from the structure.

When we wanted to get rid of an element we replaced that element with a SpecialNode:

val emptyNode = new scala.xml.SpecialNode() {
  def buildString(sb:StringBuilder) = new StringBuilder()
  def label = null

Unfortunately when you call #text on the node it results in the following exception which we only found out today:

> emptyNode.text
	at scala.xml.NodeSeq$$anonfun$text$1.apply(NodeSeq.scala:152)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
	at scala.collection.Iterator$class.foreach(Iterator.scala:652)
	at scala.collection.LinearSeqLike$$anon$1.foreach(LinearSeqLike.scala:50)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:73)
	at scala.xml.NodeSeq.foreach(NodeSeq.scala:43)
	at scala.collection.TraversableLike$
	at scala.xml.NodeSeq.text(NodeSeq.scala:152)
	at scala.xml.Node.text(Node.scala:200)

The way to get around that problem is to override the text method so it returns empty:

val emptyNode = new scala.xml.SpecialNode() {
  def buildString(sb:StringBuilder) = new StringBuilder()
  def label = null
  override def text = ""
> emptyNode.text
res1: String = ""

It took a seriously long time for us to track down what was going on and that bit of code wasn’t unit tested.


Written by Mark Needham

November 15th, 2011 at 12:26 am

Posted in Scala

Tagged with

Scala: Setting default argument for function parameter

without comments

Yesterday I wrote about a problem we’ve been having with trying to work out how to default a function parameter that we have in one of our methods.

Our current version of the code defines the function parameter as implicit which means that if it isn’t passed in it defaults to Predef.conforms():

def foo[T](bar: String)(implicit blah:(String => T)) = { 

It’s not entirely clear just from reading the code where the implicit value is coming from so we want to try and make the code a bit more expressive.

The way we wanted to do this was by making ‘blah’ have a default value rather than making it implicit.

Our equivalent to Predef.conforms() is the identity function and our first attempt at defaulting the parameter looked like this:

def foo[T](bar: String, blah:(String => T) = identity _) = { 

Unfortunately when we try to use that function without providing the second argument we get the following exception:

scala> foo("mark")
<console>:18: error: polymorphic expression cannot be instantiated to expected type;
 found   : [T](Nothing) => Nothing
 required: (String) => ?
Error occurred in an application involving default arguments.

From what I understand the compiler is unable to infer the type of the input parameter, a problem we can fix by explicitly specifying that:

def foo[T](bar: String, blah:(String => T) = identity[String] _) = { println(blah(bar)); bar }

We can then either choose to provide a function:

scala> foo("mark", _ + "needham")
res17: String = mark

…or not:

scala> foo("mark")
res16: String = mark

This solves the problem for this simple example but an interesting problem that we then ran into is that we actually had overloaded versions of the method in question and only one overload is allowed to specify default arguments as per the spec.

Each overload actually takes in different parameter types so one way to get around this problem would be to make some of the parameters optional and then default them to None.

At the moment we’ve ended up leaving the implicit conversion in because the change is a bit bigger in nature than antiticpated.

Written by Mark Needham

November 8th, 2011 at 10:46 pm

Posted in Scala

Tagged with

Scala: Which implicit conversion is being used?

with one comment

Last week my colleague Pat created a method which had a parameter which he wanted to make optional so that consumers of the API wouldn’t have to provide it if they didn’t want to.

We ended up making the method take in an implicit value such that the method signature looked a bit like this:

def foo[T](implicit blah:(String => T)) = {

We can call foo with or without an argument:

scala> foo { x => x + " Needham" } 
mark Needham
res16: java.lang.String = foo
scala> foo 
res17: java.lang.String = foo

In the second case it seems like the function is defaulting to an identity function of some sorts since the same value we pass to it is getting printed out.

We figured that it was probably using one of the implicit conversions in Predef but weren’t sure which one.

I asked about this on the Scala IRC channel and Heikki Vesalainen suggested running scala with the ‘-print’ flag to work it out.

scala -print

The output is pretty verbose but having defined foo as above this is some of the output we get when calling it:

scala> foo
[[syntax trees at end of cleanup]]// Scala source: <console>
package $line2 {
  final object $read extends java.lang.Object with ScalaObject {
    def this(): object $line2.$read = {
  final object $read$$iw$$iw extends java.lang.Object with ScalaObject {
    private[this] val res0: java.lang.String = _;
    <stable> <accessor> def res0(): java.lang.String = $read$$iw$$iw.this.res0;
    def this(): object $line2.$read$$iw$$iw = {
      $read$$iw$$iw.this.res0 = $line1.$read$$iw$$<strong>scala.this.Predef.conforms()</strong>);
  final object $read$$iw extends java.lang.Object with ScalaObject {
    def this(): object $line2.$read$$iw = {

I’ve highlighted the call to Predef.conforms() which is the implicit conversion that’s been substituted into ‘foo’.

It’s defined like so:


implicit def conforms[A]: A <:< A = new (A <:< A) { def apply(x: A) = x }

I’m not sure where that would be legitimately used but the comments just above it suggest the following:

An instance of `A <:< B` witnesses that `A` is a subtype of `B`.

This is probably a misuse of implcits and we intend to replace the implicit in our code with a default function value but it was interesting investigating where the implicit had come from!

Written by Mark Needham

November 6th, 2011 at 9:25 pm

Posted in Scala

Tagged with

Scala: Option.isDefined as the new null check

with 3 comments

One cool thing about using Scala on my current project is that we don’t have nulls anywhere in our code, instead when something may or may not be there we make use of the Option type.

Unfortunately what we’ve (heavily contributed by me) ended up with in our code base is repeated use of the isDefined method whenever we want to make a decision depending on whether or not the option is populated.

For example the following is quite common:

case class Foo(val bar:String)
val foo : Option[Foo] = Some(Foo("mark"))
> val bar = if(foo.isDefined) Some( else None
bar: Option[String] = Some(mark)

We can actually get rid of the if statement by making use of collect instead:

> val bar = foo.collect { case f => } 
bar: Option[String] = Some(mark)

And if foo is None:

> val foo : Option[Foo] = None
> val bar = foo.collect { case f => } 
bar: Option[String] = None

The code is now simpler and as long as you understand collect then it’s easier to understand as well.

Another quite common example would be something like this:

case class Foo(val bar:Option[String])
> val foos = List(Foo(Some("mark")), Foo(None), Foo(Some("needham")))
foos: List[Foo] = List(Foo(Some(mark)), Foo(None), Foo(Some(needham)))
> foos.filter( + " awesome")
res23: List[java.lang.String] = List(mark awesome, needham awesome)

Which we can simplify down to:

foos.collect { case Foo(Some(bar)) => bar + " awesome" }

When I was playing around with F# a couple of years ago I learnt that wherever possible I should try and keep chaining functions together rather than breaking the code up into conditionals and I think the same applies here.

There are loads of methods available on TraversableLike to help us achieve this.

Written by Mark Needham

November 1st, 2011 at 12:58 am

Posted in Scala

Tagged with

Scala: Adding logging around a repository

with one comment

We wanted to add some logging around one of our repositories to track how many times users were trying to do various things on the application and came across a cool blog post explaining how we might be able to do this.

We ended up with the following code:

class BarRepository {
  def all: Seq[Bar] = Seq()
  def find(barId:String) : Bar = Bar("myBar")
class TrackService(barRepository:BarRepository) {
  def all : Seq[Bar] = { 
    var bars = barRepository.all; 
    println("tracking all bars"); 
implicit def trackServiceToBarRepository(t:TrackService) : BarRepository = t.barRepository

We can then use it like this:

scala> val service = new TrackService(new BarRepository())
service: TrackService = TrackService@4e5394c
scala> service.all
tracking all bars
res6: Seq[Bar] = List()

If a method doesn’t exist on TrackService then the implicit conversion ensures that the appropriate method will be called on BarRepository directly:

scala> service.find("mark")
res7: Bar = Bar(myBar)

I came across another way to achieve the same results by making use of traits although we’d need to change our design a little bit to achieve this pattern:

trait IProvideBars {
  def all : Seq[Bar]
  def find(barId:String) : Bar
class BarRepository extends IProvideBars {
  def all: Seq[Bar] = Seq()
  def find(barId:String) : Bar = Bar("myBar")
trait Tracking extends IProvideBars {
  abstract override def all : Seq[Bar] = { 
    val bars = super.all;
    println("tracking all bars"); 
scala> val b = new BarRepository() with Tracking
b: BarRepository with Tracking = $anon$1@ddc652f
scala> b.all
tracking all bars
res8: Seq[Bar] = List()

Written by Mark Needham

October 25th, 2011 at 9:19 pm

Posted in Scala

Tagged with