18 Jan 2013

Bellman-Ford algorithm in Python

The latest problem of the Algorithms 2 class required us to write an algorithm to calculate the shortest path between two nodes on a graph and one algorithm which allows us to do this is Bellman-Ford.

Bellman-Ford computes the single source shortest path which means that if we have a 5 vertex graph we’d need to run it 5 times to find the shortest path for each vertex and then find the shortest paths of those shortest paths.

One nice thing about Bellman-Ford compared to Djikstra is that it’s able to handle negative edge weights.

The pseudocode of the algorithm is as follows:

Let A = 2-D array of size n * n where n is the number of vertices
Initialise A[0,s] = 0; A[0,v] = infinity for all v ≠ s where s is the node we’re finding the shortest path for
for i=1,2,3,…n-1
- for each v ∈ V
  - A[i,v] = min { A[i-1, v],
    min (w,v) ∈ E { A[k-1, w] + Cost_wv }
    &nbsp}
  - where (w,v) are the incoming edges of vertex v
check for negative cycles by running one more time and checking A[n] = A[n-1]
If they are equal then return A[n-1] otherwise report a negative cycle.

My first version looked like this:

import os
file = open(os.path.dirname(os.path.realpath(__file__)) + "/g_small.txt")
vertices, edges = map(lambda x: int(x), file.readline().replace("\n", "").split(" "))

adjacency_list = [[] for k in xrange(vertices)]
for line in file.readlines():
    tail, head, weight = line.split(" ")
    adjacency_list[int(head)-1].append({"from" : int(tail), "weight" : int(weight)})

s=0
cache = [[0 for k in xrange(vertices+1)] for j in xrange(vertices+1)]
cache[0][s] = 0

for v in range(0, vertices):
  if v != s:
    cache[0][v] = float("inf")

for i in range(1, vertices):
  for v in range(0, vertices):
    least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, cache[i-1])
    cache[i][v] = min(cache[i-1][v], least_adjacent_cost)

# detecting negative cycles
for v in range(0, vertices):
  least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, cache[vertices-1])
  cache[vertices][v] = min(cache[vertices-1][v], least_adjacent_cost)

if(not cache[vertices] == cache[vertices-1]):
    raise Exception("negative cycle detected")

shortest_path = min(cache[vertices-1])
print("Shortest Path: " + str(shortest_path))

where calculate_least_adjacent_cost is defined like so:

def calculate_least_adjacent_cost(adjacency_list, i, v, cache):
    adjacent_nodes = adjacency_list[v]

    least_adjacent_cost = float("inf")
    for node in adjacent_nodes:
      adjacent_cost = cache[node["from"]-1] + node["weight"]
      if adjacent_cost < least_adjacent_cost:
        least_adjacent_cost = adjacent_cost
    return least_adjacent_cost

As you can see we’re using an adjacency list as our graph representation, mapping from each node to its corresponding edges. We then initialise the cache as per the algorithm and then have two nested for loops which we use to work out the shortest path from s to each vertex.

The calculate_least_adjacent_cost function is used to work out which of the incoming edges to a vertex has the lowest cost, taking into account previous shortest path calculations that we’ve done up to the source vertices of those incoming edges.

We then have an extra call at the end to check for negative cycles. If there is no change in the values calculated from s to each vertex then we know we don’t have any negative cycles because otherwise one of them would have come into effect and given us a different result.

This algorithm works but it’s inefficient in its use of space - our cache has size n*n when we only ever care about 2 of the rows - the previous and current ones - so we can just use a list instead.

If we do this the code now looks like this:

s=0
cache = [[] for j in xrange(vertices+1)]
cache[s] = 0

for v in range(0, vertices):
  if v != s:
    cache[v] = float("inf")

for i in range(1, vertices):
  for v in range(0, vertices):
    previous_cache = cache
    least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, previous_cache)
    cache[v] = min(previous_cache[v], least_adjacent_cost)

# detecting negative cycles
for v in range(0, vertices):
  previous_cache = copy.deepcopy(cache)
  least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, previous_cache)
  cache[v] = min(previous_cache[v], least_adjacent_cost)

if(not cache == previous_cache):
    raise Exception("negative cycle detected")

By doing this we lose the history of the algorithm over the run which means in its current state we wouldn’t be able to work our what the shortest path was, we just know its cost.

For a 1000 node, 47978 edge graph it takes 27 seconds to go over the whole graph and detect a negative cycle if there is one.

The code for this algorithm is on github as always.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.