Archive for the ‘Scala’ Category
Scala: Converting a scala collection to java.util.List
I’ve been playing around a little with Goose – a library for extracting the main body of text from web pages – and I thought I’d try converting some of the code to be more scala-esque in style.
The API of the various classes/methods is designed so it’s interoperable with Java code but in order to use functions like map/filter we need the collection to be a Scala one.
That’s achieved by importing ‘scala.collections.JavaConversions._’ which will apply an implicit conversion to convert the Java collection into a Scala one.
I needed to go back to the Java one again which can be achieved with the following code:
import scala.collection.JavaConversions._ val javaCollection = seqAsJavaList(Seq("abc"))
I also used that function in the StopWords.scala object in Goose.
There are a load of other functions available in JavaConversions as well for going to a Dictionary, Map, Set and so on.
Scala: Our Retrospective of the benefits/drawbacks
As the closing part of a Scala Experience Report Liz and I gave at XP Day we detailed a retrospective that we’d carried out on the project after 3 months where the team outlined the positives/negatives of working with Scala.
The team members who were there right at the beginning of the project 3 months earlier had come up with what they thought the proposed benefits/drawbacks would be so it was quite interesting to look at our thoughts at both times.
Some of this is available in our slides from the talk but Nat Pryce suggested it’d be interesting to post it up in more detail.
We weren’t aware that we’d be doing this exercise until the session where we did it and noone looked at the original answers so hopefully some of the potential biases have been removed!
JUNE
-
+++ Increased developer productivity
- Higher-level language constructs (functional programming, actors, pattern matching, mixins, etc.)
- Less code -> less time spent reading code / less defects
- Syntax is better suited for writing DSLs (e.g. SBT, Scalatra, ScalaTest, etc.)
- +++ Bigger potential to attract talented developers (not using the same old ‘boring’ stack)
- ++ Gentle learning curve for Java devs
- + Built-in support at language-level for handling XML
- + Comes with SBT, a powerful build tool
- + Seamlessly integrates with Java and it’s ecosystem
- + Runs on the JVM (i.e. no operational concerns)
- — Bigger potential to screw things up (think: “with great power comes…”)
- – Tool support is less mature and polished (e.g. IDEs, profilers, metrics, etc.)
- - Community is younger and smaller
- - Scala compiler seems to be slower than Java counterparts
SEPTEMBER
Liked:
- +8 Easy to learn
- +8 Functional Language (Immutable, closures, etc)
- +6 Concise code
- +5 SBT power
- +4 Case classes
- +4 XML support
- +4 Java integration
- +3 List processing
- +3 DSL support
- +2 Helpful community (IRC, StackOverflow)
- +2 Performance
Disliked:
- -8 IDE support (refactoring, plugin quality)
- -5 Slow compiler
- -3 Code can become complex to read
- -2 Lack of XPath support in XML
- -2 SBT complexity
- -2 Immature frameworks
Quite a few of the expected benefits from June were observed in June, such as having to write less code, functional programming constructs, XML support and the ability to write DSLs.
The community was one benefit which wasn’t expected – we’ve found that every time we get stuck on something we can go on Stack Overflow and find the answer and if that doesn’t work then someone on IRC will be able to help us almost immediately.
Complexity
Our experience with Scala’s complexity partly matches with that of Stephen Coulbourne who suggests the following:
Scala appears to have attracted developers who are very comfortable with type theory, hard-core functional programming and the mathematical end of programming.
…
There is also a sense that many in the Scala community struggle to understand how other developers cannot grasp Scala/Type/FP concepts which seem simple to them. This sometimes leads Scala aficionados to castigate those that don’t understand as lazy or poor quality developers.
We’ve tried to be reasonably sensible with the language and only used bits of it that the whole team are likely to understand rather than learning some obscure way of solving a problem and checking that in.
On the other hand reading the code of Scala libraries such as scalaz or SBT is something that I, at least, find extremely difficult.
Changing the SBT build files can be quite a scary experience while you try and remember what all the different symbols mean and how they integrate together.
Learning curve
The learning curve for Java developers has been a bit of a mixed experience.
When we started working on the project we were effectively writing Java in Scala and we’ve slowly learnt/introduced more Scala features into our code as time has passed.
I think everyone who has come on that journey has found the transition reasonably okay but we’ve had other team members who joined later on and went straight into code that they weren’t familiar with and for them it’s been more difficult.
Again, again!
It will be interesting to see the team’s thoughts if we do the exercise again 3 more months on.
I would imagine there would be more ‘dislikes’ around code complexity now that the code has grown even more in size.
It probably also mean the lack of IDE support becomes more annoying as people want to refactor code and can’t get the seamless experience that you get when editing Java code.
Java/Scala: Runtime.exec hanging/in ‘pipe_w’ state
On the system that I’m currently working on we have a data ingestion process which needs to take zip files, unzip them and then import their contents into the database.
As a result we delegate from Scala code to the system unzip command like so:
def extract { var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to") var process: Process = null try { process = Runtime.getRuntime.exec(command) val exitCode = process.waitFor } catch { case e : Exception => // do some stuff } finally { // close the stream here } }
We ran into a problem where the unzipping process was hanging and executing ‘ps’ showed us that the ‘unzip’ process was stuck in the ‘pipe_w’ (pipe waiting) state which suggested that it was waiting for some sort of input.
After a bit of googling Duncan found this blog which explained that we needed to process the output stream from our process otherwise it might end up hanging
a.k.a. RTFM:
The Runtime.exec methods may not work well for special processes on certain native platforms, such as native windowing processes, daemon processes, Win16/DOS processes on Microsoft Windows, or shell scripts.
The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (Process.getOutputStream(), Process.getInputStream(), Process.getErrorStream()).
The parent process uses these streams to feed input to and get output from the subprocess.
Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.
For most of the zip files we presumably hadn’t been reaching the limit of the buffer because the list of files being sent to STDOUT by ‘unzip’ wasn’t that high.
In order to get around the problem we needed to gobble up the output stream from unzip like so:
import org.apache.commons.io.IOUtils def extract { var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to") var process: Process = null try { process = Runtime.getRuntime.exec(command) val thisVariableIsNeededToSuckDataFromUnzipDoNotRemove = "Output: " + IOUtils.readLines(process.getInputStream) val exitCode = process.waitFor } catch { case e : Exception => // do some stuff } finally { // close the stream here } }
We need to do the same thing with the error stream as well in case ‘unzip’ ends up overflowing that buffer as well.
On a couple of blog posts that we came across it was suggested that we should ‘gobble up’ the output and error streams on separate threads but we weren’t sure why exactly that was considered necessary…
If anyone knows then please let me know in the comments.
Scala: scala.xml.SpecialNode: StackOverFlowError
We have some code in our application where we parse reasonably complex XML structures and then sometimes choose to get rid of certain elements from the structure.
When we wanted to get rid of an element we replaced that element with a SpecialNode:
val emptyNode = new scala.xml.SpecialNode() { def buildString(sb:StringBuilder) = new StringBuilder() def label = null }
Unfortunately when you call #text on the node it results in the following exception which we only found out today:
> emptyNode.text java.lang.StackOverflowError at scala.xml.NodeSeq$$anonfun$text$1.apply(NodeSeq.scala:152) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194) at scala.collection.Iterator$class.foreach(Iterator.scala:652) at scala.collection.LinearSeqLike$$anon$1.foreach(LinearSeqLike.scala:50) at scala.collection.IterableLike$class.foreach(IterableLike.scala:73) at scala.xml.NodeSeq.foreach(NodeSeq.scala:43) at scala.collection.TraversableLike$class.map(TraversableLike.scala:194) at scala.xml.NodeSeq.map(NodeSeq.scala:43) at scala.xml.NodeSeq.text(NodeSeq.scala:152) at scala.xml.Node.text(Node.scala:200)
The way to get around that problem is to override the text method so it returns empty:
val emptyNode = new scala.xml.SpecialNode() { def buildString(sb:StringBuilder) = new StringBuilder() def label = null override def text = "" }
> emptyNode.text res1: String = ""
It took a seriously long time for us to track down what was going on and that bit of code wasn’t unit tested.
#fail
Scala: Setting default argument for function parameter
Yesterday I wrote about a problem we’ve been having with trying to work out how to default a function parameter that we have in one of our methods.
Our current version of the code defines the function parameter as implicit which means that if it isn’t passed in it defaults to Predef.conforms():
def foo[T](bar: String)(implicit blah:(String => T)) = { println(blah(bar)); bar }
It’s not entirely clear just from reading the code where the implicit value is coming from so we want to try and make the code a bit more expressive.
The way we wanted to do this was by making ‘blah’ have a default value rather than making it implicit.
Our equivalent to Predef.conforms() is the identity function and our first attempt at defaulting the parameter looked like this:
def foo[T](bar: String, blah:(String => T) = identity _) = { println(blah(bar)); bar }
Unfortunately when we try to use that function without providing the second argument we get the following exception:
scala> foo("mark")
<console>:18: error: polymorphic expression cannot be instantiated to expected type;
found : [T](Nothing) => Nothing
required: (String) => ?
Error occurred in an application involving default arguments.
foo("mark")From what I understand the compiler is unable to infer the type of the input parameter, a problem we can fix by explicitly specifying that:
def foo[T](bar: String, blah:(String => T) = identity[String] _) = { println(blah(bar)); bar }
We can then either choose to provide a function:
scala> foo("mark", _ + "needham") markneedham res17: String = mark
…or not:
scala> foo("mark") mark res16: String = mark
This solves the problem for this simple example but an interesting problem that we then ran into is that we actually had overloaded versions of the method in question and only one overload is allowed to specify default arguments as per the spec.
Each overload actually takes in different parameter types so one way to get around this problem would be to make some of the parameters optional and then default them to None.
At the moment we’ve ended up leaving the implicit conversion in because the change is a bit bigger in nature than antiticpated.
Scala: Which implicit conversion is being used?
Last week my colleague Pat created a method which had a parameter which he wanted to make optional so that consumers of the API wouldn’t have to provide it if they didn’t want to.
We ended up making the method take in an implicit value such that the method signature looked a bit like this:
def foo[T](implicit blah:(String => T)) = { println(blah("mark")) "foo" }
We can call foo with or without an argument:
scala> foo { x => x + " Needham" } mark Needham res16: java.lang.String = foo
scala> foo mark res17: java.lang.String = foo
In the second case it seems like the function is defaulting to an identity function of some sorts since the same value we pass to it is getting printed out.
We figured that it was probably using one of the implicit conversions in Predef but weren’t sure which one.
I asked about this on the Scala IRC channel and Heikki Vesalainen suggested running scala with the ‘-print’ flag to work it out.
scala -print
The output is pretty verbose but having defined foo as above this is some of the output we get when calling it:
scala> foo [[syntax trees at end of cleanup]]// Scala source: <console> package $line2 { final object $read extends java.lang.Object with ScalaObject { def this(): object $line2.$read = { $read.super.this(); () } }; final object $read$$iw$$iw extends java.lang.Object with ScalaObject { private[this] val res0: java.lang.String = _; <stable> <accessor> def res0(): java.lang.String = $read$$iw$$iw.this.res0; def this(): object $line2.$read$$iw$$iw = { $read$$iw$$iw.super.this(); $read$$iw$$iw.this.res0 = $line1.$read$$iw$$iw.foo(<strong>scala.this.Predef.conforms()</strong>); () } }; final object $read$$iw extends java.lang.Object with ScalaObject { def this(): object $line2.$read$$iw = { $read$$iw.super.this(); () } } }
I’ve highlighted the call to Predef.conforms() which is the implicit conversion that’s been substituted into ‘foo’.
It’s defined like so:
350 | implicit def conforms[A]: A <:< A = new (A <:< A) { def apply(x: A) = x } |
I’m not sure where that would be legitimately used but the comments just above it suggest the following:
An instance of `A <:< B` witnesses that `A` is a subtype of `B`.
This is probably a misuse of implcits and we intend to replace the implicit in our code with a default function value but it was interesting investigating where the implicit had come from!
Scala: Option.isDefined as the new null check
One cool thing about using Scala on my current project is that we don’t have nulls anywhere in our code, instead when something may or may not be there we make use of the Option type.
Unfortunately what we’ve (heavily contributed by me) ended up with in our code base is repeated use of the isDefined method whenever we want to make a decision depending on whether or not the option is populated.
For example the following is quite common:
case class Foo(val bar:String) val foo : Option[Foo] = Some(Foo("mark"))
> val bar = if(foo.isDefined) Some(foo.get.bar) else None bar: Option[String] = Some(mark)
We can actually get rid of the if statement by making use of collect instead:
> val bar = foo.collect { case f => f.bar } bar: Option[String] = Some(mark)
And if foo is None:
> val foo : Option[Foo] = None > val bar = foo.collect { case f => f.bar } bar: Option[String] = None
The code is now simpler and as long as you understand collect then it’s easier to understand as well.
Another quite common example would be something like this:
case class Foo(val bar:Option[String])
> val foos = List(Foo(Some("mark")), Foo(None), Foo(Some("needham"))) foos: List[Foo] = List(Foo(Some(mark)), Foo(None), Foo(Some(needham)))
> foos.filter(_.bar.isDefined).map(_.bar.get + " awesome") res23: List[java.lang.String] = List(mark awesome, needham awesome)
Which we can simplify down to:
foos.collect { case Foo(Some(bar)) => bar + " awesome" }
When I was playing around with F# a couple of years ago I learnt that wherever possible I should try and keep chaining functions together rather than breaking the code up into conditionals and I think the same applies here.
There are loads of methods available on TraversableLike to help us achieve this.
Scala: Adding logging around a repository
We wanted to add some logging around one of our repositories to track how many times users were trying to do various things on the application and came across a cool blog post explaining how we might be able to do this.
We ended up with the following code:
class BarRepository { def all: Seq[Bar] = Seq() def find(barId:String) : Bar = Bar("myBar") }
class TrackService(barRepository:BarRepository) { def all : Seq[Bar] = { var bars = barRepository.all; println("tracking all bars"); bars } }
implicit def trackServiceToBarRepository(t:TrackService) : BarRepository = t.barRepository
We can then use it like this:
scala> val service = new TrackService(new BarRepository()) service: TrackService = TrackService@4e5394c scala> service.all tracking all bars res6: Seq[Bar] = List()
If a method doesn’t exist on TrackService then the implicit conversion ensures that the appropriate method will be called on BarRepository directly:
scala> service.find("mark") res7: Bar = Bar(myBar)
I came across another way to achieve the same results by making use of traits although we’d need to change our design a little bit to achieve this pattern:
trait IProvideBars { def all : Seq[Bar] def find(barId:String) : Bar }
class BarRepository extends IProvideBars { def all: Seq[Bar] = Seq() def find(barId:String) : Bar = Bar("myBar") }
trait Tracking extends IProvideBars { abstract override def all : Seq[Bar] = { val bars = super.all; println("tracking all bars"); bars } }
scala> val b = new BarRepository() with Tracking b: BarRepository with Tracking = $anon$1@ddc652f scala> b.all tracking all bars res8: Seq[Bar] = List()
Scala: Creating an Xml element with an optional attribute
We have a lot of Xml in our application and one of the things that we need to do reasonably frequently in our test code is create elements which have optional attributes on them.
Our simple first approach looked like this:
def createElement(attribute: Option[String]) = if(attribute.isDefined) <p bar={attribute.get} /> else <p />
That works but it always seemed like we should be able to do it in a simpler way.
Our first attempt was this:
def createElement(attribute: Option[String]) = <p bar={attribute} />
But that ends up in a compilation error:
error: overloaded method constructor UnprefixedAttribute with alternatives:
(key: String,value: Option[Seq[scala.xml.Node]],next: scala.xml.MetaData)scala.xml.UnprefixedAttribute <and>
(key: String,value: String,next: scala.xml.MetaData)scala.xml.UnprefixedAttribute <and>
(key: String,value: Seq[scala.xml.Node],next1: scala.xml.MetaData)scala.xml.UnprefixedAttribute
cannot be applied to (java.lang.String, Option[String], scala.xml.MetaData)
def createElement1(attribute: Option[String]) = <p bar={attribute} />We really need to extract the string value from the option if there is one and not do anything if there isn’t one but with the above approach we try to shove an option in as the attribute value. Unfortunately there isn’t an overload of the constructor which lets us do that.
Eventually one of my colleagues suggested we try passing null in as the attribute value if we had a None option:
def createElement(attribute: Option[String]) = <p bar={attribute.getOrElse(null)} />
Which works pretty well:
scala> createElement(Some("mark"))
res0: scala.xml.Elem = <p bar="mark"></p>
scala> createElement(None)
res1: scala.xml.Elem = <p ></p>Scala: Replacing a trait with a fake one for testing
We recently wanted to replace a trait mixed into one of our classes with a fake version to make it easier to test but forgot how exactly to do that!
The class is roughly like this:
trait Foo { def foo : String = "real foo" } class Mark extends Foo {}
We originally tried to replace it like this:
trait BrokenFakeFoo { def foo : String = "broken fake foo" } val m = new Mark with BrokenFakeFoo
error: overriding method foo in trait Foo of type => String;
method foo in trait BrokenFakeFoo of type => String needs `override' modifier
val m = new Mark with BrokenFakeFooIf m compiled it would have two versions of foo but it wouldn’t know which one to use, hence the error message.
Attempt two was this:
trait BrokenFakeFoo { override def foo : String = "broken fake foo" }
error: method foo overrides nothing
trait BrokenFakeFoo { override def foo : String = "broken fake foo" }As Uday pointed out, what we actually need to do is make our fake trait extend the original one and then override the method.
trait FakeFoo extends Foo { override def foo : String = "fake foo" } val m = new Mark with FakeFoo
m.foo > res5: String = fake foo
Since FakeFoo is the right most of the traits mixed into Mark its foo method will be used over the Foo one mixed into Mark on its class definition.