1 Oct 2013 - Greedy Algorithms Spanning Trees, Disjoint Sets, Path Compression in Union Find Greedy algorithms: "myopic" - only looking nearby - repeatedly pick the next part of the solution that looks best - not in GENERAL an optimal strategy eg: not good for shortest paths - BUT: for many problems it is optimal. Minimum spanning trees - example: finding least expensive way to network a set of computers [picture] What is a possible solution? [cost should be 16] Property of a solution: - Cannot contain cycles -- because removing a cycle edge cannot disconnect a graph Krustal's algorithm: - Repeatedly add next lightest edge that does not create a cycle. // NOTE: theh partial solution is not necessarily connected until the end! // Run example WHY does Krustal's algorithm find an optimal solution? Def: CUT: partition of a vertices into two groups, S and V-S: Cut property: Suppose S is part of a MST T. Then a lowest cost edge e connecting S to V-S is part of SOME MST T' case 1: T=T'. Done! case 2: T =/= T'. Must be some edge e' that connects S and V-S. Replace e' with e. Since cost(e) <= cost(e'), cost(T') <= cost(T) so T' is a MST as well. The Cut Property is the inductive step in an inductive proof on |S| of correctness of Krustal's algorithm. Base case: |S|=1, so S contains one node v and no edges. By def it is part of a MST. Inductive case: Cut property. How to implement Krustal's algorithm efficiently? Let's rephrase the algorithm as follow: Krustal, Take 2: Start with every vertice in G in its own component (no edges). Repeat: Select lightest edge that connects two unconnected components Merge the two components Thus the state of the algorithm at any point is a collection of DISJOINT SETS. The operations we need to perform are UNION and FIND. procedure Krustal(G,w) for all u in V: makeset(u) S = {} sort edges E by weight for all edges (u,v) in E in increasing order of weight: if find(u) =/= find(v) then add edge (u,v) to S union(u,v) return S end Have you seen a data structure before for disjoint sets? UP TREES: Tree where children point to parents. Examples: procedure makeset(x) parent(x) = x rank(x) = 0 // will be height of tree rooted at x end function find(x) while (x != parent(x)): x = parent(x) return x end union: idea: make root shorter tree point to root of taller tree. procedure union(x,y) rx = find(x) ry = find(y) if rx = ry then return if rank(rx) > rank(ry) then parent(ry) = rx else if rank(ry) > rank(rx) then parent(rx) = ry else // they have same rank, so will increase parent(ry) = x rank(x) = rank(x) + 1 end Example: Properties: - rank(x) < rank(parent(x)) for all x - any root of rank k has at LEAST 2^k nodes in its tree - if there are n elements, there can be at most n/2^k nodes of rank k all imply: maximum rank is log n THEREFORE find(x) = O(log n) union(x) = O(log n) Overall running time: makeset(x) = O(1), done |V| times, so O(|V|) sorting edges: O(|E|log|E|) = O(|E|log|V|) why? in worst case, |E| = |V|^2 log|E| = log|V|^2 = 2 log|V| Two finds and a union repeated O(|E|) times O(|E|log|V|) So overall: O(|E|log|V|) Improving FIND in union/find: Path Compression function find(x) if x =/= parent(x) then parent(x) = find(parent(x)) return parent(x) end Claim: Although the worst case for ONE find is unchanged, TOTAL time for |V| finds is O(|V| log*(|V|)) where log*(n) = "inverse Ackerman's function" = number of times you need to take the log of the argument to get a value <= 1 <= 5 in practice So can do a find in O(1) amortized time. If edges come in pre-sorted, then what is running time? O(|E|)