Mark Needham

Thoughts on Software Development

Archive for March, 2012

Haskell: Pattern matching data types with named fields

with one comment

One of my favourite things about coding in Haskell is that I often end up pattern matching against data types.

I’ve been playing around with modelling cars coming into and out from a car park and changing the state of the car park accordingly.

I started with these data type definitions:

data CarParkState = Available Bool Int Int | AlmostFull Bool Int Int | Full Bool Int deriving (Show)
data Action = Entering | Leaving deriving (Show) 
data Sticker = Handicap | None deriving (Show)

which were used in the following function:

car_park_state :: CarParkState -> (Action, Sticker) -> CarParkState
car_park_state (Available _ 8 handicap) (Entering, None) = AlmostFull True 9 handicap
car_park_state (AlmostFull _ 11 handicap) (Entering, None) = Full True handicap
car_park_state (AlmostFull _ 9 handicap) (Leaving, None) = Available True 8 handicap
 
-- there are more states but I've cut them for brevity

This code isn’t too bad but it’s not obvious what the numbers ‘8’, ’11’ and ‘9’ are referring to without scrolling back up to the data type definition and checking.

By coincidence I came across a Haskell data types wiki page which introduced me to the concept of named fields.

This means that rather than referring to the arguments of a data type by position we can give them a name instead:

data CarParkState = Available  { changed :: Bool, normalSpaces :: Int, handicapSpaces :: Int } | 
                    AlmostFull { changed :: Bool, normalSpaces :: Int, handicapSpaces :: Int } | 
                    Full       { changed :: Bool, handicapSpaces :: Int } deriving (Show)

The car_park_state function now reads like this:

car_park_state :: CarParkState -> (Action, Sticker) -> CarParkState
car_park_state state@(Available  {normalSpaces=8})  (Entering, None) = AlmostFull True 9 (handicapSpaces state)
car_park_state state@(AlmostFull {normalSpaces=11}) (Entering, None) = Full True (handicapSpaces state)
car_park_state state@(AlmostFull {normalSpaces=9})  (Leaving, None)  = Available True 8 (handicapSpaces state)

There’s more code in this solution than the original although to me car_park_state is clearer since you can tell what it’s doing just by glancing at the code.

I’m not sure which of these approaches is considered idiomatic Haskell but my feeling is that it’s probably the former since it’s more concise.

On the other hand, the value of named fields goes up as the total number of parameters in a data types increases since people will find it increasingly difficult to remember which parameter is in each position.

Written by Mark Needham

March 31st, 2012 at 10:49 pm

Posted in Haskell

Tagged with

Micro Services: A simple example

with 4 comments

In our code base we had the concept of a ‘ProductSpeed’ with two different constructors which initialised the object in different ways:

public class ProductSpeed {
  public ProductSpeed(String name) {
    ...
  }
 
  public ProductSpeed(String name, int order)) {
 
  }
}

In the cases where the first constructor was used the order of the product was irrelevant.

When the second constructor was used we did care about it because we wanted to be able sort the products before showing them in a drop down list to the user.

The reason for the discrepancy was that this object was being constructed from data which originated from two different systems and in one the concept of order existed and in the other it didn’t.

What we actually needed was to have two different versions of that object but we probably wouldn’t want to name them ‘ProductSpeedForSystem1′ and ‘ProductSpeedForSystem2’!

In Domain Driven Design terms we actually have the concept of a ‘ProductSpeed’ but in two different bounded contexts which could just mean that they come under different packages if we’re building everything in one (monolithic) application.

However, we could see from looking at the way ‘ProductSpeed’ initialised from the second constructor was being used in the application that it didn’t interact with anything else and so could easily be pulled out into its own mini application or micro service.

We’re actually building an API for other systems to interact with and the initial design of the code described above was:

Api before

We get a product from the product list (which is sorted based on the ordering described!) and then post a request which includes the product amongst other things.

After we’d pulled out a micro service it looked like this:

Api after

The choice of product is actually a step that you do before you make your request to the main API whereas we’d initially coupled them into the same deployable.

These are the advantages I see from what we’ve done:

  • We can now easily change the underlying data source of the products micro service if we want to since it now has its own schema which we could switch out if necessary.
  • It takes about 5 minutes to populate all the products and we run the script to repopulate the main DB quite frequently. Now products can be loaded separately.
  • Our code is now much simplified!

And some disadvantages:

  • We now have to deploy two jars instead of one so our deployment has become a bit more complicated.

    My colleague James Lewis points out that we’re effectively pushing the complexity from the application into the infrastructure when we design systems with lots of mini applications doing one thing.

  • Overall I think we have more code since there are some similarities between the objects in both contexts and we’ve now got two versions of each object since they’re deployed separately. My experience is that sharing domain code generally leads to suffering so we’re not doing that.

Written by Mark Needham

March 31st, 2012 at 9:06 am

Posted in Micro Services

Tagged with

IntelliJ: Find/Replace using regular expressions with capture groups

with 2 comments

Everyone now and then we end up having to write a bunch of mapping code and I quite like using IntelliJ’s ‘Replace’ option to do it but always end up spending about 5 minutes trying to remember how to do capture groups so I thought I’d write it down this time.

Given the following text in our file:

val mark = 0
val dave = 0
val john = 0
val alex = 0

Let’s say we wanted to prefix each of those names with ‘cool’ and had decided not to use Column mode for whatever reason.

One way of doing that is to capture the names and then replace each of them with ‘cool’ appended on the beginning:

A (very hacky) find regex could be this:

\s([a-zA-Z]+)\s=

Where we capture all the letters in the variable name inside a group and then build our replacement string like so:

 cool$1 =

I always expect the capture group replacement syntax to be ‘\1′, ‘\2′ and so on but it actually uses a ‘$’ instead.

Hopefully now I shall remember!

Written by Mark Needham

March 30th, 2012 at 6:21 am

Readability/Performance

with 3 comments

I recently read the Graphite chapter of The Architecture of Open Source Applications book which mostly tells the story of how Chris Davis incrementally built out Graphite – a pretty cool tool that can be used to do real time graphing of metrics.

The whole chapter is a very good read but I found the design reflections especially interesting:

One of Graphite’s greatest strengths and greatest weaknesses is the fact that very little of it was actually “designed” in the traditional sense. [It] evolved gradually, hurdle by hurdle, as problems arose. Many times the hurdles were foreseeable and various pre-emptive solutions seemed natural. However it can be useful to avoid solving problems you do not actually have yet, even if it seems likely that you soon will.

One of the main success criteria of the application that I’m currently working on is its performance – it doesn’t have millisecond-ish latency requirements but it does need to do a lot of calculations and return within a reasonable amount of time.

Several times we’ve been working on a bit of code and have written something which is easy to read but runs in quadratic time because we have a nested iteration over a collection.

Having this performance requirement in mind has made us think twice about whether we should be looking for a better way of writing that code but that would seem to be a premature optimisation since we don’t actually know that this bit of code is going to be the bottleneck.

Instead we’ve started to put a performance testing infrastructure in place so that we can actually gather some data that tells us where the problems in the code are.

Davis goes on to state the following in his article:

The reason is that you can learn much more from closely studying actual failures than from theorizing about superior strategies. Problem solving is driven by both the empirical data we have at hand and our own knowledge and intuition. I’ve found that doubting your own wisdom sufficiently can force you to look at your empirical data more thoroughly.

As I mentioned in my last post the main logic in our application involves loading a bunch of data from Oracle and then processing various calculations on it.

At the moment we’re getting sub second responses from most requests and where that isn’t the case the bottleneck has been in one of our database queries rather than in the code.

Maybe eventually we’ll want to optimise those bits of code which have quadratic running time and the good thing is that we’ve abstracted those bits of code so we could optimise that code with minimal impact to the rest of the code base.

I’ve been working through Algo Class and one of the things that I’ve learnt is that you only really see problems with algorithms of quadratic running time if you are iterating through really big collections – otherwise it’s barely noticeable.

In our case the collections will probably be small so we unlikely to see that problem.

On the other hand we are likely to be repeating the same calculations multiple times so we will probably look to add some caching if that starts to become a problem.

Of course when we do that we’ll need to check our performance tests before and after the change to ensure that we’ve actually addressed a real problem rather than theorised about it!

For now though we’re continuing to write our code in the way that is most readable/easy to understand and waiting for our performance tests to force us to change that approach.

—–

I’ve not worked closely on anything which had proprietary trading style latency requirements so if your experience is different please let me know in the comments.

Written by Mark Needham

March 29th, 2012 at 6:45 am

Testing: Trying not to overdo it

with 3 comments

The design of the code which contains the main logic of the application that I’m currently working on looks a bit like the diagram on the right hand side:

Orchestration code

We load a bunch of stuff from an Oracle database, construct some objects from the data and then invoke a sequence of methods on those objects in order to execute our domain logic.

Typically we might expect to see unit level test against all the classes described in this diagram but we’ve actually been trying out an approach where we don’t test the orchestration code directly but rather only test it via the resource which makes use of it.

We originally started off writing some tests around that code but they ended up being really similar to our database and resource tests.

Having them around also made it difficult to change the way the orchestration worked since we’d end up breaking most of the tests when we tried to change anything.

One disadvantage of not testing this code is that we end up using the debugger more when trying to work out why resource tests aren’t working since we now have more code being directly tested.

Orchestration tests2

On the other hand we’ve been forced to drive logic into the domain objects as a result since we don’t have any other place to test that functionality from.

Testing directly against the domain objects is much easier since everything’s in memory and we can easily setup the data to be how we want it to be and inject it into the objects.

Another approach we could have taken would be to mock out the dependencies of the orchestration code but since this code is mostly coordinating other classes there are a lot of dependencies and the tests ended up being quite complicated and brittle.

Initially I was of the opinion that it wasn’t a good idea to not test the orchestration code but looking back a month later I think it’s working reasonably well and putting this constraint on ourselves has made the code easier to change while still being well tested.

Written by Mark Needham

March 28th, 2012 at 12:10 am

Posted in Testing

Tagged with

Haskell: Memoization using the power of laziness

with 3 comments

I’ve been trying to solve problem 15 of Project Euler which requires you to find the number of routes that can be taken to navigate from the top corner of a grid down to the bottom right corner.

For example there are six routes across a 2×2 grid:

Grid

My initial solution looked like this:

routes :: (Int, Int) -> Int -> Int
routes origin size =
  inner origin size
  where
    inner origin@(x, y) size 
      | x == size && y == size = 0
      | x == size || y == size = 1
      | otherwise = inner (x+1, y) size + inner (x, y+1) size

Which can be called like this:

routes (0,0) 2

Once we reach the edge of the grid i.e. ‘x==size’ or ‘y==size’ then we know there are no more routes from that path because you’re not allowed to backtrack so we return a value of 1.

At every other position we recurse twice, once going down the grid and once going across it to get all of the routes.

This solution works fine for small sizes but it starts to take a serious amount of time to finish once ‘size’ gets above 11. Since the problem requires you to solve the problem for a grid size of 20 it doesn’t suffice!

From working through the problem on paper it was clear that a lot of the calculations were being repeated since there were many different ways of reaching each point on the grid. We therefore need to cache the calculations so that they wouldn’t be repeated multiple times.

The normal imperative language way of doing that would be to create a map or 2D array containing each grid position and then updating it once a grid position had been calculated.

Unfortunately that doesn’t really work with the normal array in Haskell because it’s immutable and passing it around as a parameter doesn’t seem to work either since we only have a new updated array when it’s too late to be useful for other calculations.

I came across a cool blog post by Matt Giuca where he suggests that we need to think of things which have been computed and those which haven’t rather than thinking about the problem in terms of which elements of the array have been mutated.

He suggests creating an array where (in our case) the key represents a grid position and the value is a function call to work out how many routes we have from that position.

Since Haskell is lazily evaluated that function won’t be evaluated until that array position is accessed and once it’s been calculated the value will be stored in the array for any future lookups.

The code now ends up looking like this:

routes :: Int -> Int
routes size =
  arr ! (size, size)
  where
    arr = array ((0,0),(size,size)) [((x,y), inner (x,y) size) | x<-[0..size], y<-[0..size]]
      inner origin@(x, y) size 
        | x == 0 && y == 0 = 0
        | x == 0 || y == 0 = 1
        | otherwise = arr ! (x-1, y) + arr ! (x, y-1)

We first create an array which contains entries for every position on the grid and a corresponding call to the function which works out the number of rotues from there.

I had to change the way we navigate through the grid to be from the bottom right up to the top left corner since that made it much easier to just do a lookup on position (size, size) as the entry point.

And now it returns instantly!

> routes 20
137846528820

Written by Mark Needham

March 24th, 2012 at 12:28 pm

Posted in Haskell

Tagged with

Saving the values of dynamically populated dropdown on back button

without comments

We wanted to be able to retain the value of a drop down menu that was being dynamically populated (via an AJAX call) when the user hit the back button but the AJAX request re-runs when we go hit back therefore losing our selection.

Our initial thinking was that we might be able to store the value of the dropdown in a hidden field and then restore it into the dropdown using jQuery on page load but that approach didn’t work since hidden fields don’t seem to retain their values when you hit back.

Another approach would be to update the URL with a new # value on each change of the dropdown and then parse the URL on page load to figure out which value should be selected.

That would probably work but it seemed a bit too complicated.

Eventually we realised that we could create an extra text box, ‘hide’ it by use of ‘display:none’ in our CSS and then copy the dropdown value into that every time it changed.

We ended up with code something like this to capture the selected value:

$("#dropdown-menu").live('change', function() {
  $("#hidden-field').val($(this).val());
});

And then we populated the dropdown with the saved value on the callback from the AJAX call:

$.ajax({
  url: 'url/to/populate_dropdown',
  success: function(data) {
    $("#dropdown-menu").html(data);
	$("#dropdown-menu").val($("#hidden-field").val());
  }
});

It seems to work pretty well and it’s a simple technique as well so it worked in all the browsers that we tested it in.

Written by Mark Needham

March 24th, 2012 at 12:40 am

Oracle Spatial: Querying by a point/latitude/longitude

without comments

We’re using Oracle Spatial on the application I’m working on and while most of the time any spatial queries we make are done from Java code we wanted to be able to run them directly from SQL as well to verify the code was working correctly.

We normally end up forgetting how to construct a query so I thought I’d document it.

Assuming we have a table table_with_shape which has a column shape which is a polygon, if we want to check whether a lat/long value interacts with that shape we can do that with the following query:

SELECT *
FROM table_with_shape tws
WHERE 
SDO_ANYINTERACT(tws.shape, SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(<LongValueHere>, <LatValueHere>, NULL), NULL, NULL) ) = 'TRUE'

The first parameter to SDO_GEOMETRY defines the type of geometry which in this case is a point.

The second parameter is the coordinate system which is 8307 since we’re using the ‘WGS 84 longitude/latitude’ system.

The third parameter is our point and the rest of the parameters aren’t interesting here so we pass null for them.

Written by Mark Needham

March 23rd, 2012 at 11:54 pm

Posted in Software Development

Tagged with ,

Functional Programming: Handling the Options

with 7 comments

A couple of weeks ago Channing Walton tweeted the following:

Every time you call get on an Option a kitten dies.

As Channing points out in the comments he was referring to unguarded calls to ‘get’ which would lead to an exception if the Option was empty, therefore pretty much defeating the point of using an Option in the first place!

We’re using Dan Bodart’s totallylazy library on the application I’m currently working on and in fact were calling ‘get’ on an Option so I wanted to see if we could get rid of it.

The code in question is used to work out a price based on a base value and an increment value, both of which are optional.

I’ve re-created some tests to show roughly how it works:

@Test
public void myOptionsExample() {
    Assert.assertThat(calculate(Option.<Double>none(), Option.<Double>none(), 20), is(Option.<Double>none()));
    Assert.assertThat(calculate(Option.<Double>none(), Option.some(12.0), 20), is(Option.some(240.0)));
    Assert.assertThat(calculate(Option.some(50.0), Option.some(12.0), 20), is(Option.some(290.0)));
    Assert.assertThat(calculate(Option.some(50.0), Option.<Double>none(), 20), is(Option.some(50.0)));
}
private Option<Double> calculate(Option<Double> base, final Option<Double> increment, final int multiplier) {
    if(base.isEmpty() && increment.isEmpty()) {
        return Option.none();
    }
 
    if(increment.isEmpty()){
        return base;
    }
 
    double baseBit = base.getOrElse(0.0);
    double incrementBit = increment.get() * multiplier;
 
    return Option.some(baseBit + incrementBit);
}

We can get rid of the ‘increment.isEmpty()’ line by making use of a ‘getOrElse’ a few lines later like so:

private Option<Double> calculate(Option<Double> base, final Option<Double> increment, final int multiplier) {
    if(base.isEmpty() && increment.isEmpty()) {
        return Option.none();
    }
 
    double baseBit = base.getOrElse(0.0);
    double incrementBit = increment.getOrElse(0.0) * multiplier;
 
    return Option.some(baseBit + incrementBit);
}

In Scala we can often make use of a for expression when combining Options but I’m not sure it would help in this case because we want to return a ‘Some’ if either of the Options has a value and only return a ‘None’ if they’re both empty.

I can’t figure out how to get rid of that opening if statement – I tried pushing the options into a list and then calling ‘foldLeft’ on that to sum up the components but we still have the problem of how to deal with returning a ‘None’ when the list is empty.

If you know how to simplify that code either using totallylazy or another language please let me know in the comments!

Written by Mark Needham

March 21st, 2012 at 12:50 am

Posted in Java

Tagged with ,

Haskell: Newbie currying mistake

without comments

As I mentioned in my last post I’ve spent a bit of this evening writing a merge sort function and one of the mistakes I made a few times was incorrectly passing arguments to the recursive calls of ‘merge’.

For example, this is one of the earlier versions of the function:

middle :: [Int] -> Int
middle = floor . (\y -> y / 2) .  fromIntegral . length	
msort :: [Int] -> [Int]
msort unsorted = 	
  let n = middle unsorted
  in 
    if n == 0 then unsorted 
    else  
      let (left, right) = splitAt n unsorted
      in merge (msort left) (msort right)
  where 
    merge [] right = right
    merge left [] = left
    merge left@(x:xs) right@(y:ys) = if x < y then x : merge(xs, right) else y : merge (left, ys)

Which doesn’t actually compile:

    Couldn't match expected type `[a0]' with actual type `[a0] -> [a0]'
    In the return type of a call of `merge'
    In the second argument of `(:)', namely `merge (xs, right)'
    In the expression: x : merge (xs, right)

My defence for this mistake is that many of the other languages I’ve programmed in take function parameters separated by a comma but in this case I’ve actually only succeeded in currying the ‘merge’ function.

i.e. it thinks I only wanted to pass one value to it and return a function, hence the error message!

One correction would be to change the code to read like this so we can explicitly see that a function which takes 2 arguments can be called with each argument separately:

msort :: [Int] -> [Int]
msort unsorted = 	
  let n = middle unsorted
  in 
    if n == 0 then unsorted 
    else  
      let (left, right) = splitAt n unsorted
      in merge (msort left) (msort right)
  where 
    merge [] right = right
    merge left [] = left
    merge left@(x:xs) right@(y:ys) = if x < y then x : merge(xs)(right) else y : merge (left)(ys)

We can do exactly the same with the ‘/’ function to use a simpler example:

-- a long way of dividing 2 by 3
> ((/) 2) 3
0.6666666666666666

The type of ‘/’ is:

> :t (/)
(/) :: Fractional a => a -> a -> a

which means it takes in a ‘Fractional’ and then returns a function which takes in a ‘Fractional’ and returns a ‘Fractional’.

In Haskell function application is left associative which means that ‘f x y’ is the same as ‘(f x) y’ so we can omit the parentheses and pass both arguments together:

> (/) 2 3
0.6666666666666666

And we can do the same thing in the merge sort of course:

msort :: [Int] -> [Int]
msort unsorted = 	
  let n = middle unsorted
  in 
    if n == 0 then unsorted 
    else  
      let (left, right) = splitAt n unsorted
      in merge (msort left) (msort right)
  where 
    merge [] right = right
    merge left [] = left
    merge left@(x:xs) right@(y:ys) = if x < y then x : merge xs right else y : merge left ys

A fairly simple mistake to make but it had me confused for a while!

——

Updated the explanation around how we can pass arguments to ‘/’ after my colleague Gavri pointed out that the initial explanation was incorrect.

Written by Mark Needham

March 20th, 2012 at 11:55 pm

Posted in Haskell

Tagged with