# Mark Needham

Thoughts on Software Development

## Knapsack Problem in Haskell

with one comment

I recently described two versions of the Knapsack problem written in Ruby and Python and one common thing is that I used a global cache to store the results of previous calculations.

From my experience of coding in Haskell it’s not considered very idiomatic to write code like that and although I haven’t actually tried it, potentially more tricky to achieve.

I thought it’d be interesting to try and write the algorithm in Haskell with that constraint in mind and my first version looked like this:

```ref :: a -> IORef a ref x = unsafePerformIO (newIORef x)   knapsackCached1 :: [[Int]] -> Int -> Int -> IORef (Map.Map (Int, Int) Int) -> Int knapsackCached1 rows knapsackWeight index cacheContainer = unsafePerformIO \$ do cache <- readIORef cacheContainer if index == 0 || knapsackWeight == 0 then do return 0 else let (value:weight:_) = rows !! index best = knapsackCached1 rows knapsackWeight prevIndex cacheContainer in if weight > knapsackWeight && lookupPreviousIn cache == Nothing then do let updatedCache = Map.insert (prevIndex, knapsackWeight) best cache writeIORef cacheContainer updatedCache return \$ fromJust \$ lookupPreviousIn updatedCache else if lookupPreviousIn cache == Nothing then do let newBest = maximum [best, value + knapsackCached1 rows (knapsackWeight-weight) prevIndex cacheContainer] updatedCache = Map.insert (prevIndex, knapsackWeight) newBest cache writeIORef cacheContainer updatedCache return \$ fromJust \$ lookupPreviousIn updatedCache else return \$ fromJust \$ lookupPreviousIn cache where lookupPreviousIn cache = Map.lookup (prevIndex,knapsackWeight) cache prevIndex = index-1```

We then call it like this:

```let (knapsackWeight, numberOfItems, rows) = process contents cache = ref (Map.empty :: Map.Map (Int, Int) Int) knapsackCached1 rows knapsackWeight (numberOfItems-1) cache```

As you can see, we’re passing around the cache as a parameter where the cache is a Map wrapped inside an IORef – a data type which allows us to pass around a mutable variable in the IO monad.

We write our new value into the cache on lines 11 and 17 so that our updates to the map will be picked up in the other recursive steps.

Apart from that the shape of the code is the same as the Ruby and Python versions except I’m now only using a map with a pair as the key instead of an array + map as in the other versions.

The annoying thing about this solution is that we have to pass the cache around as a parameter when it’s just a means of optimisation and not part of the actual problem.

An alternative solution could be the following where we abstract the writing/reading of the map into a memoize function which we wrap our function in:

```memoize :: ((Int, Int) -> Int) -> (Int, Int) -> Int memoize fn mapKey = unsafePerformIO \$ do let cache = ref (Map.empty :: Map.Map (Int, Int) Int) items <- readIORef cache if Map.lookup mapKey items == Nothing then do let result = fn mapKey writeIORef cache \$ Map.insert mapKey result items return result else return (fromJust \$ Map.lookup mapKey items)   knapsackCached :: [[Int]] -> Int -> Int -> Int knapsackCached rows weight numberOfItems = inner (numberOfItems-1, weight) where inner = memoize (\(i,w) -> if i < 0 || w == 0 then 0 else let best = inner (i-1,w) (vi:wi:_) = rows !! i in if wi > w then best else maximum [best, vi + inner (i-1, w-wi)])```

We can call that function like this:

```let (knapsackWeight, numberOfItems, rows) = process contents cache = ref (Map.empty :: Map.Map (Int, Int) Int) knapsackCached rows knapsackWeight numberOfItems```

Here we define an inner function inside knapsackCached which is a partial application of the memoize function. We then pass our cache key to that function on the previous line.

One thing which I noticed while writing this code is that there is some strangeness around the use of ‘in’ after let statements. It seems like if you’re inside an if/else block you need to use ‘in’ unless you’re in the context of a Monad (do statement) in which case you don’t need to.

I was staring a screen of compilation errors for about an hour until I realised this!

These are the timings for the two versions of the algorithm:

```# First one \$ time ./k knapsack2.txt real 0m14.993s user 0m14.646s sys 0m0.320s   # Second one \$ time ./k knapsack2.txt real 0m12.594s user 0m12.259s sys 0m0.284s```

I’m still trying to understand exactly how to profile and then optimise the program so any tips are always welcome.

Written by Mark Needham

January 9th, 2013 at 12:12 am

Tagged with ,

In writing the clustering algorithm which I’ve mentioned way too many times already I needed to process a text file which contained all the points and my initial approach looked like this:

```import System.IO   main = do withFile "clustering2.txt" ReadMode (\handle -> do contents <- hGetContents handle putStrLn contents)```

It felt a bit clunky but I didn’t realise there was an easier way until I came across this thread. We can simplify reading a file to the following by using the readFile function:

```main = do contents <- readFile "clustering2.txt" putStrLn contents```

We need to read the file in the IO monad which explains why we have the ‘do’ notation on the first line.

Another thing I didn’t realise until recently was that you don’t actually need to worry about the ‘do’ notation if you try to read from the IO monad inside GHCI.

In this context we’re reading from the IO monad when we bind ‘readFile’ to the variable ‘contents’ since ‘readFile’ returns type ‘IO String’:

```> :t readFile readFile :: FilePath -> IO String```

We can therefore play around with the code pretty easily:

```> contents <- readFile "clustering2.txt" > let (bits, nodes) = process contents > bits 24 > length nodes 19981 > take 10 nodes [379,1669,5749,6927,7420,9030,9188,9667,11878,12169]```

I think we’re able to do this because by being in GHCI we’re already in the context of the IO monad but I’m happy to be corrected if I haven’t explained that correctly.

Written by Mark Needham

January 2nd, 2013 at 12:16 am

Tagged with

I’ve started playing around with Haskell again and since I’m doing so on a new machine I don’t have a copy of the language source code.

I wanted to rectify that situation but my Google fu was weak and it took me way too long to figure out how to get it so I thought I’d better document it for future me.

The easiest way is to clone the copy of the GHC repository on github:

`git clone https://github.com/ghc/ghc.git`

Initially that doesn’t have any of the code for the core libraries but running the following command (which takes ages!) sorts it out:

```cd ghc ./sync-all get```

Noticing that the core libraries weren’t there initially I thought I must have done something wrong so I went to the documentation for the function that I wanted to see the source for – getAssocs.

From that page there is a link to the source for that function and a bit tweaking of the URL lets us know that this function is defined in the array package.

Most of the base packages are available from the Haskell darcs repository so I ended up cloning the ones I wanted in the time that it took for the sync-all script to run.

```darcs get http://darcs.haskell.org/packages/base/ # gets most of the packages we'd be interested in darcs get http://darcs.haskell.org/packages/array/ # gets the array package```

Of course I could have just been patient and waited for the script to finish…

Written by Mark Needham

December 31st, 2012 at 10:39 pm

Tagged with

## Haskell: Strictness and the monadic bind

with one comment

As I mentioned towards the end of my post about implementing the union find data structure in Haskell I wrote another version using a mutable array and having not seen much of a performance improvement started commenting out code to try and find the problem.

I eventually narrowed it down to the union function which was defined like so:

```union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int) union arrayContainer x y = do actualArray <- arrayContainer ls <- getAssocs actualArray leader1 <- readArray actualArray x leader2 <- readArray actualArray y let newValues = (map (\(index, value) -> (index, leader1)) . filter (\(index, value) -> value == leader2)) ls sequence \$ map (\(idx, val) -> writeArray actualArray idx val) newValues return actualArray```

I was using Unix’s time function to get the execution time since this meant I didn’t need to make any changes to the program and this level of granularity was ok.

The first time I ran the program it executed in 36.379 seconds and my first hunch was that a lot of time was being taken up writing to the array so I commented out that line:

```union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int) union arrayContainer x y = do actualArray <- arrayContainer ls <- getAssocs actualArray leader1 <- readArray actualArray x leader2 <- readArray actualArray y let newValues = (map (\(index, value) -> (index, leader1)) . filter (\(index, value) -> value == leader2)) ls -- sequence \$ map (\(idx, val) -> writeArray actualArray idx val) newValues return actualArray```

The execution time decreased to 33.381 seconds so the writing of the array was actually only a small part of the total execution time.

I thought it was quite strange that it was taking so long to execute since things are generally lazily evaluated in Haskell and my assumption was that newValues wasn’t being evaluated since I hadn’t used it anywhere. I decided to comment that out to see what difference it would make:

```union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int) union arrayContainer x y = do actualArray <- arrayContainer ls <- getAssocs actualArray leader1 <- readArray actualArray x leader2 <- readArray actualArray y -- let newValues = (map (\(index, value) -> (index, leader1)) . filter (\(index, value) -> value == leader2)) ls -- sequence \$ map (\(idx, val) -> writeArray actualArray idx val) newValues return actualArray```

The execution time was now 33.579 seconds so commenting out that line hadn’t actually made any difference. I assumed ls wasn’t being evaluated since it isn’t being used but I thought I’d better check:

```union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int) union arrayContainer x y = do actualArray <- arrayContainer -- ls <- getAssocs actualArray leader1 <- readArray actualArray x leader2 <- readArray actualArray y -- let newValues = (map (\(index, value) -> (index, leader1)) . filter (\(index, value) -> value == leader2)) ls -- sequence \$ map (\(idx, val) -> writeArray actualArray idx val) newValues return actualArray```

The execution time now reduced to 3.882 seconds thereby suggesting that getAssocs was being strictly evaluated.

We are doing what’s called a monadic bind which (at least) within GHCI is strictly evaluated but isn’t necessarily evaluated like this everywhere else:

Monad operations (bind and return) have to be non-strict in fact, always! However other operations can be specific to each monad. For instance some are strict (like IO), and some are non-strict (like []).

From my observations it would seem that the IOArray is one of those monads which evaluates bind strictly.

I tried looking at the Haskell source code to see if I could find any code to prove what I’d observed but I’m not entirely sure what I should be looking for!

Written by Mark Needham

December 31st, 2012 at 10:27 pm

Tagged with

## Haskell: An impressively non performant union find

I’ve spent the best part of the last day debugging a clustering algorithm I wrote as part of the Algorithms 2 course, eventually coming to the conclusion that the union find data structure I was using wasn’t working as expected.

In our algorithm we’re trying to group together points which are ‘close’ to each other and the data structure is particular useful for doing that.

To paraphrase from my previous post about how we use the union find data structure:

We start out with n connected components i.e. every point is in its own connected component.

We then merge these components together as calculate the neighbours of each point until we’ve iterated through all the points and have grouped all the points into the appropriate components.

I came across 3 libraries which implement this data structure – union-find, equivalence and persistent-equivalence.

union-find seemed like it had the easiest API to understand so I plugged it into my program only to eventually realise that it wasn’t putting the points into components as I expected.

I eventually narrowed the problem down to the following example:

```> let uf = emptyEquivalence (0,9) [(0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9)]   > components \$ equate 0 1 uf [(0,0),(1,0),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9)]   > components \$ equate 8 9 \$ equate 0 1 \$ uf [(0,0),(1,0),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,8)]   > components \$ equate 0 8 \$ equate 8 9 \$ equate 0 1 \$ uf [(0,0),(1,0),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,0),(9,8)]```

We start out with a union-find where every point is in its own component. The next line puts points ‘0’ and ‘1’ into the same component which it does by making indexes ‘0’ and ‘1’ of the array both have the same value, in this case 0, which is known as the component’s leader.

We continue doing that for points ‘8’ and ‘9’ which works fine and our union find now consists of 8 components – the ones with leaders 8 & 0 which have two elements and then ones with leaders 2,3,4,5,6 & 7 which only contain themselves.

Things go wrong on our next step where we try to join nodes ‘0’ and ‘8’. As I understand it what should happen here is that all the points connected to either ‘0’ or ‘8’ should end up in the same component so we should have a component containing points ‘0’, ‘1’, ‘8’ and ‘9’ but ‘9’ has been missed off in this case.

The implementation is deliberately written to work like this so I thought I’d try writing my own version based on the following Ruby version:

```class UnionFind def initialize(n) @leaders = 1.upto(n).inject([]) { |leaders, i| leaders[i] = i; leaders } end   def connected?(id1,id2) @leaders[id1] == @leaders[id2] end   def union(id1,id2) leader_1, leader_2 = @leaders[id1], @leaders[id2] @leaders.map! {|i| (i == leader_1) ? leader_2 : i } end end```

This is my Haskell equivalent which I adapted from the union-find one that I mentioned above:

```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 module Leaders (UnionSet, create, components, numberOfComponents, indexes, inSameComponent, union) where   import Control.Concurrent.MVar import Control.Monad import Data.Array.Diff as ArrayDiff import Data.IORef import qualified Data.List import Data.Maybe import System.IO.Unsafe import qualified Data.Set as Set   arrayFrom :: (IArray a e, Ix i) => (i,i) -> (i -> e) -> a i e arrayFrom rng f = array rng [ (x, f x) | x <- range rng ]   ref :: a -> IORef a ref x = unsafePerformIO (newIORef x)   data UnionSet i = UnionSet { leaders :: IORef (DiffArray i i) }   create :: Ix i => (i, i) -> UnionSet i create is = UnionSet (ref (arrayFrom is id))   extractComponents :: Ix i => DiffArray i i -> [(i, i)] extractComponents = Set.toList . Set.fromList . ArrayDiff.assocs   components :: Ix i => UnionSet i -> [(i,i)] components (UnionSet leaders) = unsafePerformIO \$ do l <- readIORef leaders return (extractComponents l)   numberOfComponents :: Ix i => UnionSet i -> Int numberOfComponents (UnionSet leaders) = unsafePerformIO \$ do l <- readIORef leaders return (length \$ extractComponents l)   indexes :: Ix i => UnionSet i -> [(i,i)] indexes (UnionSet leaders) = unsafePerformIO \$ do l <- readIORef leaders return (ArrayDiff.assocs l)   inSameComponent :: Ix i => UnionSet i -> i -> i -> Bool inSameComponent (UnionSet leaders) x y = unsafePerformIO \$ do l <- readIORef leaders return (l ! x == l ! y)   union x y (UnionSet leaders) = unsafePerformIO \$ do ls <- readIORef leaders let leader1 = ls ! x leader2 = ls ! y newLeaders = map (\(index, value) -> (index, leader2)) . filter (\(index, value) -> value == leader1) \$ assocs ls modifyIORef leaders (\l -> l // newLeaders) return \$ UnionSet leaders```

We can recreate the above example like so:

```> indexes \$ Leaders.union 0 8 \$ Leaders.union 8 9 \$ Leaders.union 0 1 \$ create (0,9) [(0,9),(1,9),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,9),(9,9)]```

Unfortunately it takes 44 seconds to execute which is mainly down to the call to assocs on line 46. assocs gives us a list of all the indexes and their corresponding values which we use to work out which indexes need to be updated with a new leader.

The rest of the code is mostly boiler plate around getting the array out of the IORef. The IORef allows us to have a mutable array in this instance. There is a page on the c2 wiki which explains how to use IORef in more detail.

Although using the DiffArray allows us to provide a pure external interface around its use, it is known to be 10-100x slower than an MArray.

I’ve been playing around with a version of the union-find data structure which makes use of an MArray instead and has decreased the execution time to 34 seconds.

Unless anyone has any ideas for how I can get this to perform more quickly I’m thinking that perhaps an array isn’t a good choice of underlying data structure at least when using Haskell.

Written by Mark Needham

December 31st, 2012 at 8:44 pm

Tagged with

## Bitwise operations in Ruby and Haskell

Part of one of the most recent problems in the Algorithms 2 course required us to find the ‘neighbours’ of binary values.

In this case a neighbour is described as being any other binary value which has an equivalent value or differs in 1 or 2 bits.

e.g. the neighbours of ‘10000’ would be ‘00000’, ‘00001’, ‘00010’, ‘00100’, ”01000′, ‘10001’, ‘10010’, ‘10011’, ‘10100’, ‘10101’, ‘10110’, ‘11000’, ‘11001’, ‘11010’ and ‘11100’

I initially treated ‘10000’ as an array of 1s and 0s and wrote a function to recursively come up with the above combinations before it was pointed out to me that it’d be much easier to use bit wise logic instead.

I don’t remember every having to write any code using bit wise operations since university so I thought I’d document what I learnt.

The first thing to do is convert those binary values into a decimal equivalent which is pretty easy in Ruby:

```> "10000".to_i(2) => 16```

In Haskell I stole a function from PLEAC to do the job:

```> import Data.Char > (foldr (\c s -> s * 2 + c) 0 . reverse . map digitToInt) "10000" 16```

I initially worked out the neighbours by hand but I need to work out how to convert that into code so I ran through an example.

We want to get from ‘10000’ (16) to ‘00000’ (0) which we can do by using an XOR operation:

Binary XOR Operator copies the bit if it is set in one operand but not both.

`'10000' XOR '10000'`

In Ruby it would read like this:

```> 16 ^ 16 => 0```

If we do the same XOR operation but changing the other bits instead we end up with all the neighbours of distance 1:

```> [0,1,2,4,16].map { |x| 16 ^ x } => [16, 17, 18, 20, 0]```

We can generalise the function that creates that array of values like so:

```> bits = 5 > offsets = (0..(bits - 1)).map { |x| 2 ** x } => [1, 2, 4, 8, 16]```

or if we want to use bit shifting:

```> offsets = (0..(bits - 1)).map { |x| 1 << x } => [1, 2, 4, 8, 16]```

With that approach we’re moving the “left operands value is moved left by the number of bits specified by the right operand.” i.e. we move ‘1’ left by 0 bits (from ‘1’to ‘1’), then by 1 bit (from ‘1’ to ’10’), then by 2 bits (from ‘1’ to ‘100’) and so on.

We still need to get all the neighbours which differ by 2 bits which we can do by getting all the ways that we can combine those offsets together. The combination function and Bitwise or do the trick here:

```> offsets.combination(2).to_a => [[1, 2], [1, 4], [1, 8], [1, 16], [2, 4], [2, 8], [2, 16], [4, 8], [4, 16], [8, 16]] > offsets.combination(2).to_a.map { |a,b| a|b } => [3, 5, 9, 17, 6, 10, 18, 12, 20, 24]```

Now if we put it all together:

```> initial_value = "10000" => "10000" > bits = initial_value.length => 5 > value = "10000".to_i(2) => 16 > single_bit_offsets = (0..(bits - 1)).map { |x| 1 << x } => [1, 2, 4, 8, 16] > all_the_offsets = offsets + offsets.combination(2).to_a.map { |a,b| a|b } => [1, 2, 4, 8, 16, 3, 5, 9, 17, 6, 10, 18, 12, 20, 24] > all_the_offsets.map { |offset| value ^ offset }.sort => [0, 1, 2, 4, 8, 17, 18, 19, 20, 21, 22, 24, 25, 26, 28]```

The final Haskell version looks like this:

```> import Data.Char > import Data.Bits > import Data.List   > let initialValue = "10000" > let bits = length initialValue > let value = (foldr (\c s -> s * 2 + c) 0 . reverse . map digitToInt) initialValue > let singleBitOffsets = map (shiftL 1) [0..(bits - 1)] :: [Int] > let allTheOffsets = singleBitOffsets ++ map (\(x:y:_) -> (x .|. y)) (combinationsOf 2 singleBitOffsets) :: [Int] > Data.List.sort \$ map (xor value) allTheOffsets [0,1,2,4,8,17,18,19,20,21,22,24,25,26,28]```

I took the combinationsOf function from David Amos’ Combinatorics Generation library and it reads like so:

```-- subsets of size k combinationsOf 0 _ = [[]] combinationsOf _ [] = [] combinationsOf k (x:xs) = map (x:) (combinationsOf (k-1) xs) ++ combinationsOf k xs```

A real life version of this problem occurs in the world of genome analysis where scientists need to work out whether phylogenetic profiles are functionally linked.

Written by Mark Needham

December 31st, 2012 at 1:14 pm

Tagged with , ,

## Haskell: Using qualified imports to avoid polluting the namespace

with one comment

In most of the Haskell code I’ve read any functions from other modules have been imported directly into the namespace and I reached the stage where I had this list of imports in a file:

```import System.IO import Data.List.Split import Data.Char import Data.Bits import Control.Monad import Data.Map import Data.Set import Data.List import Data.Maybe```

This becomes a problem when you want to use a function which is defined in multiple modules such as filter:

```clustering.hs:53:43: Ambiguous occurrence `filter' It could refer to either `Data.List.filter', imported from `Data.List' at clustering.hs:11:1-16 (and originally defined in `GHC.List') or `Data.Set.filter', imported from `Data.Set' at clustering.hs:10:1-16 or `Data.Map.filter', imported from `Data.Map' at clustering.hs:9:1-16```

One way to solve this is to change occurrences of filter to Data.List.filter but it’s a bit long winded and in this case there is a function in the Prelude package which is available without us importing anything.

Unfortunately we’d have to use the prefix Prelude to refer to it since all the other versions of the function have made it ambiguous.

We still want to use some of the functions in those other modules though so we can do a qualified import which will make the functions available to us but only if we refer to them by their full name. It won’t import them into our namespace.

For example to initialise a map we’d do this:

```> import qualified Data.Map > Data.Map.assocs \$ Data.Map.fromList [(1,2), (3,7)] [(1,2),(3,7)]```

That’s a bit long winded though so we can rename imports with a shorter name to make our life a bit easier:

```import System.IO import Data.List.Split import Data.Char import Data.Bits import qualified Control.Monad as Monad import qualified Data.Map as Map import qualified Data.Set as Set import qualified Data.List as List import qualified Data.Maybe as Maybe```

We can then use functions in those packages like so:

```> Maybe.maybeToList (Just 3) [3]```

For this particular function I haven’t come across any with the same name so we might want to import that one into our namespace but require the use of the Maybe prefix for any other functions:

```import qualified Data.Maybe as Maybe import Data.Maybe (maybeToList)```
```> maybeToList (Just 3) [3]```

There’s a wiki entry I came across which explains Haskell modules this in a bit more detail and is worth a read.

Written by Mark Needham

December 30th, 2012 at 11:16 pm

Tagged with

## Haskell: Pattern matching a list

with one comment

As I mentioned in a post yesterday I’ve been converting a clustering algorithm into Haskell and I wanted to get the value from doing a bit wise or on two values in a list.

I forgot it was possible to pattern match on lists until I came across a post I wrote about 8 months ago where I’d done this so my initial code looked like this:

```> import Data.Bits > map (\pair -> (pair !! 0) .|. (pair !! 1)) [[1,2], [3,4]] [3,7]```

We can pattern match against the list like so:

```> map (\(x:y:_) -> x .|. y) [[1,2], [3,4]] [3,7]```

Here x takes the first value, y takes the second value and the rest of the list is assigned to _ which we don’t use in this case.

There are loads of examples of pattern matching against different data structures in Learn You A Haskell and hopefully next time I’ll remember and won’t write hideous code like the first example!

Written by Mark Needham

December 30th, 2012 at 10:39 pm

Tagged with

## Haskell: A cleaner way of initialising a map

I recently wrote a blog post showing a way of initialising a Haskell map and towards the end of the post I realised how convoluted my approach was and wondered if there was an easier way and indeed there is!

To recap, this is the code I ended up with to populate a map with binary based values as the keys and node ids as the values:

```import Data.Map   toMap :: [Int] -> Map Int [Int] toMap nodes = fromList \$ map asMapEntry \$ (groupIgnoringIndex . sortIgnoringIndex) nodesWithIndexes where nodesWithIndexes = (zip [0..] nodes)   groupIgnoringIndex = groupBy (\(_,x) (_,y) -> x == y) sortIgnoringIndex = sortBy (\(_,x) (_,y) -> x `compare` y)   asMapEntry :: [(Int, Int)] -> (Int, [Int]) asMapEntry nodesWithIndexes = ((snd . head) nodesWithIndexes, Prelude.foldl (\acc (x,_) -> acc ++ [x]) [] nodesWithIndexes)```
```> assocs \$ toMap [1,2,5,7,2,4] [(1,[0]),(2,[4,1]),(4,[5]),(5,[2]),(7,[3])]```

To sum up what we’re trying to do: when a key doesn’t have an entry we want to create one with a list containing the appropriate value and if the key already has a value then we want to append that value to the existing list.

As it turns out the insertWith function does exactly what we want:

```> let emptyMap = empty :: Map Int [Int] > assocs \$ foldl (\acc (id,val) -> insertWith (++) val [id] acc) emptyMap nodesWithIndexes [(1,[0]),(2,[4,1]),(4,[5]),(5,[2]),(7,[3])]```

Based on this experience it would appear that the same type of thing applies when coding in Haskell as when coding in Clojure.

If you’ve written a fair amount of Clojure code […] then chances are you’ve probably reinvented a few functions that are already available in the standard library.

Written by Mark Needham

December 29th, 2012 at 8:14 pm

Tagged with

## Haskell: Initialising a map

I’ve been converting a variation of Kruskal’s algorithm from Ruby into Haskell and one thing I needed to do was create a map of binary based values to node ids.

In Ruby I wrote the following code to do this:

```nodes = [1,2,5,7,2,4] @magical_hash = {} nodes.each_with_index do |node, index| @magical_hash[node] ||= [] @magical_hash[node] << index end   => {1=>[0], 2=>[1, 4], 5=>[2], 7=>[3], 4=>[5]}```

From looking at the documentation it seemed like the easiest way to do this in Haskell would be to convert the nodes into an appropriate list and then call the fromList function to build the map.

I needed to get the data to look like this:

```> let nodesMap = Data.Map.fromList [(1, [0]), (2, [1,4]), (5, [2]), (7, [3]), (4, [5])] > Data.Map.assocs nodesMap [(1,[0]),(2,[1,4]),(4,[5]),(5,[2]),(7,[3])]```

The first step was to create a list of tuples with the nodes ids and values:

```> zip [0..] [1,2,5,7,2,4] [(0,1),(1,2),(2,5),(3,7),(4,2),(5,4)]```

I wanted group the collection by value so that in this instance I could have the 2 nodes with a value of ‘2’ mapping from the same key in the map.

The following code helped do that:

```groupIgnoringIndex = groupBy (\(_,x) (_,y) -> x == y) sortIgnoringIndex = sortBy (\(_,x) (_,y) -> x `compare` y)```
```> (groupIgnoringIndex . sortIgnoringIndex) (zip [0..] [1,2,5,7,2,4]) [[(0,1)],[(1,2),(4,2)],[(5,4)],[(2,5)],[(3,7)]]```

I wrote the following function to convert that collection into one that could be passed to fromList:

```asMapEntry :: [(Int, Int)] -> (Int, [Int]) asMapEntry nodesWithIndexes = ((snd . head) nodesWithIndexes, foldl (\acc (x,_) -> acc ++ [x]) [] nodesWithIndexes)```
```> asMapEntry [(1,2),(4,2)] (2,[1,4])```

We can then put all those functions together into the following top level function:

```toMap :: [Int] -> Map Int [Int] toMap nodes = Data.Map.fromList \$ map asMapEntry \$ (groupIgnoringIndex . sortIgnoringIndex) nodesWithIndexes where nodesWithIndexes = (zip [0..] nodes)```
```> Data.Map.assocs \$ toMap [1,2,5,7,2,4] [(1,[0]),(2,[4,1]),(4,[5]),(5,[2]),(7,[3])]```

I haven’t properly investigated all the functions available in the Data.Map package but my gut feeling is there must be a better way to do this – the sort/group combination is ugly in the extreme!

Written by Mark Needham

December 29th, 2012 at 7:27 pm