Mark Needham

Thoughts on Software Development

Haskell: Int and Integer

without comments

In my last post about the Rabin Karp algorithm I mentioned that I was having some problems when trying to write a hash function which closely matched its English description.

((rm-1 * ascii char) + (rm-2 * ascii char) + … (r0 * ascii char)) % q
where r = 256, q = 1920475943

This is my current version of the hash function:

hash = hash' globalR globalQ
hash' r q string m = foldl (\acc x -> (r * acc + ord x) `mod` q) 0 $ take m string

And my initial attempt to write the alternate version was this:

hash2 :: [Char] -> Int -> Int
hash2 = hash2' globalR  globalQ	
hash2' r q string m = (flip mod q . sum . map (\(pow, char) -> ord char *  (r ^ pow)) . zip [m-1, m-2..0]) string

Unfortunately this version breaks down as we try to create hashes for bigger lists of characters:

> hash "markusaere" 10
849698536
> hash2 "markusaere" 10
1239828806

We can see what’s going on if we execute the core bit of the hash function in the REPL:

> map (\(pow, char) -> ord char * 256 ^ pow) $ zip [9, 8..0] "markusaere"
[0,0,8214565720323784704,30117822508040192,128642860449792,493921239040,1627389952,6619136,29184,101]

In this case the hash value for the first character is:

> 256 ^ 9 * ord 'm'
0

Be default if we do calculations on integers the ‘Int‘ type is used:

“Int” is the more common 32 or 64 bit integer. Implementations vary, although it is guaranteed to be at least 30 bits.

The maximum value a 64 bit Int can hold is 263, a value that we’re exceeding with the first two calculations in our list.

Since our calculation has exceeded the maximum value of a 64 bit integer we need to coerce our calculation to use the ‘Integer’ type which has arbitrary precision up to the limit of the machine’s memory.

We can use the ‘toInteger’ function to do this:

> toInteger (256 ^ 9) * toInteger (ord 'm')
514737946632791328292864

We can change the core of the function to use ‘toInteger’ like so:

> map (\(pow, char) -> toInteger (ord char) * 256 ^ pow) $ zip [9, 8..0] "markusaere"
[514737946632791328292864,1789334175149826506752,8214565720323784704,30117822508040192,128642860449792,493921239040,1627389952,6619136,29184,101]

We can then change the hash function to read like this:

hash2 :: [Char] -> Int -> Int
hash2 value m = fromInteger $ hash2' (toInteger globalR) (toInteger globalQ) value m
 
hash2' :: Integer -> Integer -> [Char] -> Int -> Integer
hash2' r q string m = (sum $ map (\(pow, char) -> toInteger (ord char) *  toInteger (r ^ pow)) $ zip [m-1, m-2..0] string) `mod` q

When we call the function with the original parameters it now returns the correct answer:

> hash2 "markusaere" 10
849698536
Be Sociable, Share!

Written by Mark Needham

April 28th, 2012 at 5:39 pm

Posted in Haskell

Tagged with