Kruskal's Algorithm in Ruby
Last week I wrote a couple of posts showing different implementations of Prim’s algorithm  an algorithm using to find a minimum spanning tree in a graph  and a similar algorithm is Kruskal’s algorithm.
Kruskal’s algorithm also finds a minimum spanning tree but it goes about it in a slightly different way.
Prim’s algorithm takes an approach whereby we select nodes and then find connecting edges until we’ve covered all the nodes. Kruskal’s algorithm, on the other hand, drives from the edges of lowest costs and makes sure that we don’t create a cycle in our spanning tree by adding an edge to it.
The pseudocode for Kruskal’s algorithm reads like so:

sort edges E in order of increasing cost

let T = minimum spanning tree, m = number of edges

for i=1 to m:

let e = selected edge (u,v)

if there’s no way to go between u and v using edges in T:

add e to T



return T
The full codeis on github and the outline of the algorithm reads like so:
@minimum_spanning_tree = []
edges = file.drop(1).
map { x x.gsub(/\n/, "").split(" ").map(&:to_i) }.
map { one, two, weight { :from => one, :to => two, :weight => weight}}.
sort_by { x x[:weight]}
edges.each do edge
@minimum_spanning_tree << edge unless has_cycles edge
end
The main bit of code is a variation on depth first search to check if adding an edge creates a cycle which would mean we’re adding an edge which isn’t needed as part of a minimum spanning tree.
def has_cycles(edge)
node_one, node_two = edge[:from], edge[:to]
@minimum_spanning_tree.each { x x[:explored] = false }
cycle_between(node_one, node_two, @minimum_spanning_tree.dup)
end
def cycle_between(one, two, edges)
adjacent_edges = edges.select { edge edge[:to] == one  edge[:from] == one}
return false if adjacent_edges.empty?
adjacent_edges.reject {edge edge[:explored] }.each do edge
edge[:explored] = true
other_node = (edge[:from] == one) ? edge[:to] : edge[:from]
return true if other_node == two  cycle_between(other_node, two, edges)
end
false
end
cycle_between is the depth first search but we’re using it to tell us whether there’s already a path between two nodes in the graph and we exit as soon as we determine that.
I added some code on line 3 to reset the explored status of edges each time that has_cycles is called but I’m not sure why this is necessary because I am making a copy of @minimum spanning_tree (on line 4) before mutating it.
Otherwise I think this is a fairly standard solution. If we wanted to go from node 1 to node 3 and we had the following edges: 1 > 4 > 5 > 3 we’d make these method calls:
# excluding weights and explored status for brevity
cycle_between(1,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from => 5, :to => 3})
cycle_between(4,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from =>; 5, :to => 3})
cycle_between(5,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from => 5, :to => 3})
The function would exit on that last iteration because we’d match our target node '3'.
This algorithm wasn’t one of the assignments for the class but I used the same data set that was provided for Prim’s algorithm and was able to get the same output so I figure it works!
About the author
I'm currently working on realtime userfacing analytics with Apache Pinot at StarTree. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also coauthored the O'Reilly Graph Algorithms Book with Amy Hodler.