Archive for the ‘knapsack’ tag
Knapsack Problem in Haskell
I recently described two versions of the Knapsack problem written in Ruby and Python and one common thing is that I used a global cache to store the results of previous calculations.
From my experience of coding in Haskell it’s not considered very idiomatic to write code like that and although I haven’t actually tried it, potentially more tricky to achieve.
I thought it’d be interesting to try and write the algorithm in Haskell with that constraint in mind and my first version looked like this:
ref :: a > IORef a ref x = unsafePerformIO (newIORef x) knapsackCached1 :: [[Int]] > Int > Int > IORef (Map.Map (Int, Int) Int) > Int knapsackCached1 rows knapsackWeight index cacheContainer = unsafePerformIO $ do cache < readIORef cacheContainer if index == 0  knapsackWeight == 0 then do return 0 else let (value:weight:_) = rows !! index best = knapsackCached1 rows knapsackWeight prevIndex cacheContainer in if weight > knapsackWeight && lookupPreviousIn cache == Nothing then do let updatedCache = Map.insert (prevIndex, knapsackWeight) best cache writeIORef cacheContainer updatedCache return $ fromJust $ lookupPreviousIn updatedCache else if lookupPreviousIn cache == Nothing then do let newBest = maximum [best, value + knapsackCached1 rows (knapsackWeightweight) prevIndex cacheContainer] updatedCache = Map.insert (prevIndex, knapsackWeight) newBest cache writeIORef cacheContainer updatedCache return $ fromJust $ lookupPreviousIn updatedCache else return $ fromJust $ lookupPreviousIn cache where lookupPreviousIn cache = Map.lookup (prevIndex,knapsackWeight) cache prevIndex = index1 
We then call it like this:
let (knapsackWeight, numberOfItems, rows) = process contents cache = ref (Map.empty :: Map.Map (Int, Int) Int) knapsackCached1 rows knapsackWeight (numberOfItems1) cache 
As you can see, we’re passing around the cache as a parameter where the cache is a Map wrapped inside an IORef – a data type which allows us to pass around a mutable variable in the IO monad.
We write our new value into the cache on lines 11 and 17 so that our updates to the map will be picked up in the other recursive steps.
Apart from that the shape of the code is the same as the Ruby and Python versions except I’m now only using a map with a pair as the key instead of an array + map as in the other versions.
The annoying thing about this solution is that we have to pass the cache around as a parameter when it’s just a means of optimisation and not part of the actual problem.
An alternative solution could be the following where we abstract the writing/reading of the map into a memoize function which we wrap our function in:
memoize :: ((Int, Int) > Int) > (Int, Int) > Int memoize fn mapKey = unsafePerformIO $ do let cache = ref (Map.empty :: Map.Map (Int, Int) Int) items < readIORef cache if Map.lookup mapKey items == Nothing then do let result = fn mapKey writeIORef cache $ Map.insert mapKey result items return result else return (fromJust $ Map.lookup mapKey items) knapsackCached :: [[Int]] > Int > Int > Int knapsackCached rows weight numberOfItems = inner (numberOfItems1, weight) where inner = memoize (\(i,w) > if i < 0  w == 0 then 0 else let best = inner (i1,w) (vi:wi:_) = rows !! i in if wi > w then best else maximum [best, vi + inner (i1, wwi)]) 
We can call that function like this:
let (knapsackWeight, numberOfItems, rows) = process contents cache = ref (Map.empty :: Map.Map (Int, Int) Int) knapsackCached rows knapsackWeight numberOfItems 
Here we define an inner function inside knapsackCached which is a partial application of the memoize function. We then pass our cache key to that function on the previous line.
One thing which I noticed while writing this code is that there is some strangeness around the use of ‘in’ after let statements. It seems like if you’re inside an if/else block you need to use ‘in’ unless you’re in the context of a Monad (do statement) in which case you don’t need to.
I was staring a screen of compilation errors for about an hour until I realised this!
These are the timings for the two versions of the algorithm:
# First one $ time ./k knapsack2.txt real 0m14.993s user 0m14.646s sys 0m0.320s # Second one $ time ./k knapsack2.txt real 0m12.594s user 0m12.259s sys 0m0.284s 
I’m still trying to understand exactly how to profile and then optimise the program so any tips are always welcome.
Knapsack Problem: Python vs Ruby
The latest algorithm that we had to code in Algorithms 2 was the Knapsack problem which is as follows:
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
We did a slight variation on this in that you could only pick each item once, which is known as the 01 knapsack problem.
In our case we were given an input file from which you could derive the size of the knapsack, the total number of items and the individual weights & values of each one.
The pseudocode of the version of the algorithm which uses a 2D array as part of a dynamic programming solution is as follows:
 Let A = 2D array of size n (number of items) * W (size of the knapsack)
 Initialise A[0,X] = 0 for X=0,1,..,W

for i=1,2,3,…n
 for x=0,1,2,…,w
 A[i,x] = max { A[i1, x], A[x1, xw_{i}] + V_{i} }
 where V_{i} is the value of the i^{th} element and W_{i} is the weight of the i^{th} element
 for x=0,1,2,…,w
 return A[n, W]
This version runs in O(nW) time and O(nW) space. This is the main body of my Ruby solution for that:
number_of_items,knapsack_size = # calculated from file cache = [].tap { m (number_of_items+1).times { m << Array.new(knapsack_size+1) } } cache[0].each_with_index { value, weight cache[0][weight] = 0 } (1..number_of_items).each do i value, weight = rows[i1] (0..knapsack_size).each do x if weight > x cache[i][x] = cache[i1][x] else cache[i][x] = [cache[i1][x], cache[i1][xweight] + value].max end end end p cache[number_of_items][knapsack_size] 
This approach works reasonably well when n and W are small but in the second part of the problem n was 500 and W was 2,000,000 which means the 2D array would contain 1 billion entries.
If we’re storing integers of 4 bytes each in that data structure then the amount of memory required is 3.72GB – slightly too much for my machine to handle!
Instead a better data structure would be one where we don’t have to allocate everything up front but can just fill it in as we go. In this case we can still use an array for the number of items but instead of storing another array in each slot we’ll use a dictionary/hash map instead.
If we take a bottom up approach to this problem it seems like we end up solving a lot of sub problems which aren’t relevant to our final solution so I decided to try a top down recursive approach and this is what I ended up with:
@new_cache = [].tap { m (@number_of_items+1).times { m << {} } } def knapsack_cached(rows, knapsack_size, index) return 0 if knapsack_size == 0  index == 0 value, weight = rows[index] if weight > knapsack_size stored_value = @new_cache[index1][knapsack_size] return stored_value unless stored_value.nil? return @new_cache[index1][knapsack_size] = knapsack_cached(rows, knapsack_size, index1) else stored_value = @new_cache[index1][knapsack_size] return stored_value unless stored_value.nil? option_1 = knapsack_cached(rows, knapsack_size, index1) option_2 = value + knapsack_cached(rows, knapsack_size  weight, index1) return @new_cache[index1][knapsack_size] = [option_1, option_2].max end end p knapsack_cached(rows, @knapsack_size, @number_of_items1) 
The code is pretty similar to the previous version except we’re starting from the last item and working our way inwards. We end up storing 2,549,110 items in @new_array which we can work out by running this:
p @new_cache.inject(0) { acc,x acc + x.length} 
If we’d used the 2D array that would mean we’d only populated 0.25% of the data structure, truly wasteful!
I wanted to do a little bit of profiling on how fast this algorithm ran in Ruby compared to JRuby and I also recently came across nailgun – which allows you to start up a persistent JVM and then run your code via that instead of starting a new one up each time – so I thought I could play around with that as well!
# Ruby $ time ruby knapsack/knapsack_rec.rb real 0m18.889s user 0m18.613s sys 0m0.138s # JRuby $ time ruby knapsack/knapsack_rec.rb real 0m6.380s user 0m10.862s sys 0m0.428s # JRuby with nailgun $ ruby ngserver & # start up the nailgun server $ time ruby ng knapsack/knapsack_rec.rb real 0m6.734s user 0m0.023s sys 0m0.021s $ time ruby ng knapsack/knapsack_rec.rb real 0m5.213s user 0m0.022s sys 0m0.021s 
The first run is a bit slow as the JVM gets launched but after that we get a marginal improvement. I thought the JVM startup time would be a bigger proportion of the running time but I guess not!
I thought I’d try it out in Python as well because on one of the previous problems Isaiah had been able to write much faster versions in Python so I wanted to see if that’d be the case here too.
This was the python solution:
def knapsack_cached(rows, knapsack_size, index): global cache if(index is 0 or knapsack_size is 0): return 0 else: value, weight = rows[index] if(weight > knapsack_size and knapsack_size not in cache[index1]): cache[index1][knapsack_size] = knapsack_cached(rows, knapsack_size, index1) else: if(knapsack_size not in cache[index1]): option_1 = knapsack_cached(rows, knapsack_size, index1) option_2 = value + knapsack_cached(rows, knapsack_size  weight, index1) cache[index1][knapsack_size] = max(option_1, option_2) return cache[index1][knapsack_size] knapsack_size, number_of_items, rows = # worked out from the file result = knapsack_cached(rows, knapsack_size, number_of_items1) print(result) 
The code is pretty much exactly the same as the Ruby version but interestingly it seems to run more quickly:
$ time python knapsack/knapsack.py real 0m4.568s user 0m4.480s sys 0m0.078s 
I have no idea why that would be the case but it has been for all the algorithms we’ve written so far. If anyone has any ideas I’d be intrigued to hear them!