23 Dec 2012

Kruskal's Algorithm in Ruby

Last week I wrote a couple of posts showing different implementations of Prim’s algorithm - an algorithm using to find a minimum spanning tree in a graph - and a similar algorithm is Kruskal’s algorithm.

Kruskal’s algorithm also finds a minimum spanning tree but it goes about it in a slightly different way.

Prim’s algorithm takes an approach whereby we select nodes and then find connecting edges until we’ve covered all the nodes. Kruskal’s algorithm, on the other hand, drives from the edges of lowest costs and makes sure that we don’t create a cycle in our spanning tree by adding an edge to it.

The pseudocode for Kruskal’s algorithm reads like so:

sort edges E in order of increasing cost
let T = minimum spanning tree, m = number of edges
for i=1 to m:
- let e = selected edge (u,v)
- if there’s no way to go between u and v using edges in T:
  - add e to T
return T

The full codeis on github and the outline of the algorithm reads like so:

@minimum_spanning_tree = []

edges = file.drop(1).
        map { |x| x.gsub(/\n/, "").split(" ").map(&:to_i) }.
        map { |one, two, weight| { :from => one, :to => two, :weight => weight}}.
        sort_by { |x| x[:weight]}

edges.each do |edge|
  @minimum_spanning_tree << edge unless has_cycles edge
end

The main bit of code is a variation on depth first search to check if adding an edge creates a cycle which would mean we’re adding an edge which isn’t needed as part of a minimum spanning tree.

def has_cycles(edge)
  node_one, node_two = edge[:from], edge[:to]
  @minimum_spanning_tree.each { |x| x[:explored] = false }
  cycle_between(node_one, node_two, @minimum_spanning_tree.dup)
end

def cycle_between(one, two, edges)
  adjacent_edges = edges.select { |edge| edge[:to] == one || edge[:from] == one}
  return false if adjacent_edges.empty?
  adjacent_edges.reject {|edge| edge[:explored] }.each do |edge|
    edge[:explored] = true
    other_node = (edge[:from] == one) ? edge[:to] : edge[:from]
    return true if other_node == two || cycle_between(other_node, two, edges)
  end
  false
end

cycle_between is the depth first search but we’re using it to tell us whether there’s already a path between two nodes in the graph and we exit as soon as we determine that.

I added some code on line 3 to reset the explored status of edges each time that has_cycles is called but I’m not sure why this is necessary because I am making a copy of @minimum spanning_tree (on line 4) before mutating it.

Otherwise I think this is a fairly standard solution. If we wanted to go from node 1 to node 3 and we had the following edges: 1 -> 4 -> 5 -> 3 we’d make these method calls:

# excluding weights and explored status for brevity
cycle_between(1,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from => 5, :to => 3})
cycle_between(4,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from =>; 5, :to => 3})
cycle_between(5,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from => 5, :to => 3})

The function would exit on that last iteration because we’d match our target node '3'.

This algorithm wasn’t one of the assignments for the class but I used the same data set that was provided for Prim’s algorithm and was able to get the same output so I figure it works!

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.