Archive for the ‘graphs’ tag
Bellman-Ford algorithm in Python
The latest problem of the Algorithms 2 class required us to write an algorithm to calculate the shortest path between two nodes on a graph and one algorithm which allows us to do this is Bellman-Ford.
Bellman-Ford computes the single source shortest path which means that if we have a 5 vertex graph we’d need to run it 5 times to find the shortest path for each vertex and then find the shortest paths of those shortest paths.
One nice thing about Bellman-Ford compared to Djikstra is that it’s able to handle negative edge weights.
The pseudocode of the algorithm is as follows:
- Let A = 2-D array of size n * n where n is the number of vertices
- Initialise A[0,s] = 0; A[0,v] = infinity for all v ≠ s where s is the node we’re finding the shortest path for
-
for i=1,2,3,…n-1
- for each v ∈ V
- A[i,v] = min { A[i-1, v],
min (w,v) ∈ E { A[k-1, w] + Costwv }
 } - where (w,v) are the incoming edges of vertex v
- A[i,v] = min { A[i-1, v],
- for each v ∈ V
- check for negative cycles by running one more time and checking A[n] = A[n-1]
- If they are equal then return A[n-1] otherwise report a negative cycle.
My first version looked like this:
import os file = open(os.path.dirname(os.path.realpath(__file__)) + "/g_small.txt") vertices, edges = map(lambda x: int(x), file.readline().replace("\n", "").split(" ")) adjacency_list = [[] for k in xrange(vertices)] for line in file.readlines(): tail, head, weight = line.split(" ") adjacency_list[int(head)-1].append({"from" : int(tail), "weight" : int(weight)}) s=0 cache = [[0 for k in xrange(vertices+1)] for j in xrange(vertices+1)] cache[0][s] = 0 for v in range(0, vertices): if v != s: cache[0][v] = float("inf") for i in range(1, vertices): for v in range(0, vertices): least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, cache[i-1]) cache[i][v] = min(cache[i-1][v], least_adjacent_cost) # detecting negative cycles for v in range(0, vertices): least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, cache[vertices-1]) cache[vertices][v] = min(cache[vertices-1][v], least_adjacent_cost) if(not cache[vertices] == cache[vertices-1]): raise Exception("negative cycle detected") shortest_path = min(cache[vertices-1]) print("Shortest Path: " + str(shortest_path))
where calculate_least_adjacent_cost is defined like so:
def calculate_least_adjacent_cost(adjacency_list, i, v, cache): adjacent_nodes = adjacency_list[v] least_adjacent_cost = float("inf") for node in adjacent_nodes: adjacent_cost = cache[node["from"]-1] + node["weight"] if adjacent_cost < least_adjacent_cost: least_adjacent_cost = adjacent_cost return least_adjacent_cost
As you can see we’re using an adjacency list as our graph representation, mapping from each node to its corresponding edges. We then initialise the cache as per the algorithm and then have two nested for loops which we use to work out the shortest path from s to each vertex.
The calculate_least_adjacent_cost function is used to work out which of the incoming edges to a vertex has the lowest cost, taking into account previous shortest path calculations that we’ve done up to the source vertices of those incoming edges.
We then have an extra call at the end to check for negative cycles. If there is no change in the values calculated from s to each vertex then we know we don’t have any negative cycles because otherwise one of them would have come into effect and given us a different result.
This algorithm works but it’s inefficient in its use of space – our cache has size n*n when we only ever care about 2 of the rows – the previous and current ones – so we can just use a list instead.
If we do this the code now looks like this:
s=0 cache = [[] for j in xrange(vertices+1)] cache[s] = 0 for v in range(0, vertices): if v != s: cache[v] = float("inf") for i in range(1, vertices): for v in range(0, vertices): previous_cache = cache least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, previous_cache) cache[v] = min(previous_cache[v], least_adjacent_cost) # detecting negative cycles for v in range(0, vertices): previous_cache = copy.deepcopy(cache) least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, previous_cache) cache[v] = min(previous_cache[v], least_adjacent_cost) if(not cache == previous_cache): raise Exception("negative cycle detected")
By doing this we lose the history of the algorithm over the run which means in its current state we wouldn’t be able to work our what the shortest path was, we just know its cost.
For a 1000 node, 47978 edge graph it takes 27 seconds to go over the whole graph and detect a negative cycle if there is one.
The code for this algorithm is on github as always.
gephi: Centring a graph around an individual node
I spent some time recently playing around with gephi – an open source platform for creating visualisations of graphs – to get a bit more insight into the ThoughtWorks graph which I’ve created in neo4j.
I followed Max De Marxi’s blog post to create a GEFX (Graph Exchange XML Format) file to use in gephi although I later learned that you can import directly from neo4j into gephi which I haven’t tried yet.
I loaded it into gephi and this is what it looks like:
There are massive numbers of connections between almost every node so it’s pretty hard to see what’s going on.
I wanted to be able to centre the graph around an individual person and see just the links related to them.
To do that we need to make use of an ‘ego filter‘ so the first step is to make the filters menu visible by clicking ‘Window > Filters’.
We can then choose the ego filter which sits under ‘Library > Topology’ and fill in the ID of the node that we want to centre the graph around as well as choose the depth of the traversal.
In this case any traversal above 1 will end up traversing the majority of the graph since the average distance between nodes in this graph is just above 2.5.
I run the ‘Force Atlas’ layout algorithm over it and this is what we end up with:
My graph shows some interesting clustering of nodes where the ones on the right are people I worked with in India, top left are people I worked with in Australia, bottom left people in the UK and the ones towards the lower middle are people I went to ThoughtWorks University with.
There are a load of other filters to choose from but the ego filter is pretty cool!