# Mark Needham

Thoughts on Software Development

## Kruskal’s Algorithm using union find in Ruby

I recently wrote a blog post describing my implementation of Kruskal’s algorithm – a greedy algorithm using to find a minimum spanning tree (MST) of a graph – and while it does the job it’s not particularly quick.

It takes 20 seconds to calculate the MST for a 500 node, ~2000 edge graph.

One way that we can improve the performance of the algorithm is by storing the MST in a union find/disjoint set data structure.

To refresh, Kruskal’s algorithm does the following:

• sort edges E in order of increasing cost
• let T = minimum spanning tree, m = number of edges
• for i=1 to m:
• let e = selected edge (u,v)
• if there’s no way to go between u and v using edges in T:
• return T

In the first version of the algorithm we stored the MST in an array and then we did a depth first search to check that adding an edge to the array wasn’t going to create a cycle which would mean we no longer had an MST.

As I understand it the way that we use the union find data structure is that we start out with n connected components i.e. every node is in its own connected component.

We then merge these components together as we come across new edges which connect nodes in different components until we only have one component left at which point we should have our MST.

• sort edges E in order of increasing cost
• initialise union-find uf with n connected components
• let T = minimum spanning tree, m = number of edges
• for i=1 to m:
• let e = selected edge (u,v)
• if u and v not in same connected component in uf:
• merge u and v into same connected component in uf
• return T

I came across this bit of code written by Michael Luckeneder which implements the union find data structure in Ruby and adapted that slightly to fit the nomenclature used in the Algorithms 2 videos.

The union find data structure looks like this:

```class UnionFind def initialize(n) @leaders = 1.upto(n).inject([]) { |leaders, i| leaders[i] = i; leaders } end   def connected?(id1,id2) @leaders[id1] == @leaders[id2] end   def union(id1,id2) leader_1, leader_2 = @leaders[id1], @leaders[id2] @leaders.map! {|i| (i == leader_1) ? leader_2 : i } end end```

We have two methods:

• connected? which we use to check whether or not two nodes are in the same connected component.
• union which we use to put two nodes into the same connected component.

The way it’s implemented is that we have a collection of ‘leaders’ and initially each node is its own leader. As we find edges which belong in the MST we call union passing the two nodes as arguments. After this method executes every node which has the same leader as the first node will now instead have the same leader as the second node.

For example if we have a simple graph with edges 1 -> 2 -> 3 -> 4 -> 5 our initial union find data structure would look like this:

```> uf = UnionFind.new 5 => #<UnionFind:0x45e5a9b3 @leaders=[nil, 1, 2, 3, 4, 5]>```

At this stage each node is in its own connected component but if we process the edge 1 -> 2 we’d first check if those two nodes were already in the same connected component:

```> uf.connected?(1,2) => false```

Given that they aren’t we can call union:

```> uf.union(1,2) => [nil, 2, 2, 3, 4, 5]```

As we can see from the output nodes 1 & 2 now both have the same leader since they are in the same connected component while the other nodes still remain on their own.

We could then process the 2 -> 3 edge which would put nodes 1, 2 & 3 together:

```> uf.union(2,3) => [nil, 3, 3, 3, 4, 5]```

The outline for Kruskal’s algorithm which makes use of this data structure is like so:

```set = UnionFind.new number_of_nodes   @minimum_spanning_tree = []   edges = file.drop(1). map { |x| x.gsub(/\n/, "").split(" ").map(&:to_i) }. map { |one, two, weight| { :from => one, :to => two, :weight => weight}}. sort_by { |x| x[:weight]}   edges.each do |edge| if !set.connected?(edge[:from], edge[:to]) @minimum_spanning_tree << edge set.union(edge[:from], edge[:to]) end end   puts "MST: #{@minimum_spanning_tree}" puts "Cost: #{@minimum_spanning_tree.inject(0) { |acc, x| acc + x[:weight]}}"```

This version of the algorithm runs in 1.9 seconds, a significant improvement on the initial version. The full code is on github as usual!

Written by Mark Needham

December 23rd, 2012 at 9:43 pm

Posted in Algorithms

Tagged with , ,

## Kruskal’s Algorithm in Ruby

Last week I wrote a couple of posts showing different implementations of Prim’s algorithm – an algorithm using to find a minimum spanning tree in a graph – and a similar algorithm is Kruskal’s algorithm.

Kruskal’s algorithm also finds a minimum spanning tree but it goes about it in a slightly different way.

Prim’s algorithm takes an approach whereby we select nodes and then find connecting edges until we’ve covered all the nodes. Kruskal’s algorithm, on the other hand, drives from the edges of lowest costs and makes sure that we don’t create a cycle in our spanning tree by adding an edge to it.

The pseudocode for Kruskal’s algorithm reads like so:

• sort edges E in order of increasing cost
• let T = minimum spanning tree, m = number of edges
• for i=1 to m:
• let e = selected edge (u,v)
• if there’s no way to go between u and v using edges in T:
• return T

The full code is on github and the outline of the algorithm reads like so:

```@minimum_spanning_tree = []   edges = file.drop(1). map { |x| x.gsub(/\n/, "").split(" ").map(&amp;:to_i) }. map { |one, two, weight| { :from => one, :to => two, :weight => weight}}. sort_by { |x| x[:weight]}   edges.each do |edge| @minimum_spanning_tree << edge unless has_cycles edge end```

The main bit of code is a variation on depth first search to check if adding an edge creates a cycle which would mean we’re adding an edge which isn’t needed as part of a minimum spanning tree.

```def has_cycles(edge) node_one, node_two = edge[:from], edge[:to] @minimum_spanning_tree.each { |x| x[:explored] = false } cycle_between(node_one, node_two, @minimum_spanning_tree.dup) end   def cycle_between(one, two, edges) adjacent_edges = edges.select { |edge| edge[:to] == one || edge[:from] == one} return false if adjacent_edges.empty? adjacent_edges.reject {|edge| edge[:explored] }.each do |edge| edge[:explored] = true other_node = (edge[:from] == one) ? edge[:to] : edge[:from] return true if other_node == two || cycle_between(other_node, two, edges) end false end```

cycle_between is the depth first search but we’re using it to tell us whether there’s already a path between two nodes in the graph and we exit as soon as we determine that.

I added some code on line 3 to reset the explored status of edges each time that has_cycles is called but I’m not sure why this is necessary because I am making a copy of @minimum spanning_tree (on line 4) before mutating it.

Otherwise I think this is a fairly standard solution. If we wanted to go from node 1 to node 3 and we had the following edges: 1 -> 4 -> 5 -> 3 we’d make these method calls:

```# excluding weights and explored status for brevity cycle_between(1,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from => 5, :to => 3}) cycle_between(4,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from =>; 5, :to => 3}) cycle_between(5,3, [{:from => 1, :to => 4}, {:from => 4, :to => 5}, {:from => 5, :to => 3})```

The function would exit on that last iteration because we’d match our target node ‘3’.

This algorithm wasn’t one of the assignments for the class but I used the same data set that was provided for Prim’s algorithm and was able to get the same output so I figure it works!

Written by Mark Needham

December 23rd, 2012 at 2:18 pm

Posted in Algorithms

Tagged with , ,