Mark Needham

Thoughts on Software Development

Haskell: A cleaner way of initialising a map

with 7 comments

I recently wrote a blog post showing a way of initialising a Haskell map and towards the end of the post I realised how convoluted my approach was and wondered if there was an easier way and indeed there is!

To recap, this is the code I ended up with to populate a map with binary based values as the keys and node ids as the values:

import Data.Map 
toMap :: [Int] -> Map Int [Int]
toMap nodes = fromList $ map asMapEntry $ (groupIgnoringIndex . sortIgnoringIndex) nodesWithIndexes
              where nodesWithIndexes = (zip [0..] nodes)
groupIgnoringIndex = groupBy (\(_,x) (_,y) -> x == y) 
sortIgnoringIndex = sortBy (\(_,x) (_,y) -> x `compare` y)
asMapEntry :: [(Int, Int)] -> (Int, [Int]) 
asMapEntry nodesWithIndexes = 
   ((snd . head) nodesWithIndexes, Prelude.foldl (\acc (x,_) -> acc ++ [x]) [] nodesWithIndexes)
> assocs $ toMap [1,2,5,7,2,4]

To sum up what we’re trying to do: when a key doesn’t have an entry we want to create one with a list containing the appropriate value and if the key already has a value then we want to append that value to the existing list.

As it turns out the insertWith function does exactly what we want:

> let emptyMap = empty :: Map Int [Int]
> assocs $ foldl (\acc (id,val) -> insertWith (++) val [id] acc) emptyMap nodesWithIndexes

Based on this experience it would appear that the same type of thing applies when coding in Haskell as when coding in Clojure.

To paraphrase Jay Fields:

If you’ve written a fair amount of Clojure code […] then chances are you’ve probably reinvented a few functions that are already available in the standard library.

Be Sociable, Share!

Written by Mark Needham

December 29th, 2012 at 8:14 pm

Posted in Haskell

Tagged with

  • Dave C Turner

    Hi Mark,

    I think fromListWith (++) does what you want too. ūüôā



  • @5d71bb97759c0c3482849e7106fec73c:disqus¬†
    Hey David, 

    We meet again! I did try using fromListWith to start with but I was struggling to work out how exactly to use it:

    >  fromListWith (++) [(0, 2), (1,4)]

    :28:24:¬† ¬† No instance for (Num [a0])¬† ¬† ¬† arising from the literal `2′¬† ¬† Possible fix: add an instance declaration for (Num [a0])¬† ¬† In the expression: 2¬† ¬† In the expression: (0, 2)¬† ¬†In the second argument of `fromListWith’, namely `[(0, 2), (1, 4)]’

    I think it wants the second argument in the tuple to be a list

    > assocs $ fromListWith (++) [(0, [2]), (1,[4]), (1, [5])]

    So I guess I’d need to run a map over ‘nodesWithIndexes’ to get it into the right shape to pass to the function.

    Unless there’s a another/better way?

  • Dave C Turner

    The following does what you want, I think:

    fromListWith (++) $ map (second return . swap) nodesWithIndexes

    ‘second’ is in Control.Arrow and is effectively defined as second f (x,y) = (x, f y)

    ‘swap’ is in Data.Tuple – it is simply (x,y) -> (y,x)

    ‘return’ here means just x -> [x]

    ‘second’ and ‘return’ have more general types than the above so reading the documentation for them may or may not help. ‘return’ applies in any monad but here the (++) forces it to be in the list monad, where return just turns things into a singleton list. ‘second’ applies to arrows which generalise functions, but I find it mostly helpful to ignore the generality.

  • Dave C Turner

    I didn’t mention why I think fromListWith is better. Well one thing is that I think it reveals your intention better than the explicit fold – there’s a lot less syntax to write. A second reason is that there are more efficient algorithms for fromListWith than iteratively calling insertWith: maps are implemented as balanced trees, and by calling fromListWith you give the implementation freedom to allow the tree to become very unbalanced and then do a single rebalancing step at the end.

    In fact, if you look at the definition of fromListWith, it basically does a foldl insertWith just as you did (although it’s slightly stricter). So for now at least there’s not much difference.



  • @5d71bb97759c0c3482849e7106fec73c:disqus¬†cool, thanks – I didn’t think about giving the implementation the freedom to optimise the building of the map!¬†

    And yes less syntax is better. It’s amusing that my initial implementation was 8 lines and now we’re down to about 30 characters.¬†

  • @5d71bb97759c0c3482849e7106fec73c:disqus¬†yeh I don’t really follow the documentation for second:

    second :: a b c -> a (d, b) (d, c)

    From what I can tell, in the example you gave we’re partially applying ‘second return’ and then passing through the reversed tuple as the second argument to the function? But that’s not what the signature implies to me so yep…confused!

  • Dave C Turner

    Read ‘a x y’ as ‘x -> y’ and it might be clearer. ‘a’ stands for ‘arrow’ and is written in prefix position, but function spaces are written as an infix ‘->’.