随机算法 4 Karger's Min Cut Algorithm
Lecturer this time: Ioannis Caragiannis
Lecture notes is based on Karger’s Min Cut Algorithm by Sanjeev Arora, Sanjoy Dasgupta
Karger’s min cut algorithm and its extension. It’s simple for finding the minimum cut in a graph.
Clustering via Graph Cuts
The minimum cut of an undirected graph is a partition of the nodes into two groups (s.t. and ) so that the number of edges between and is minimized.
Variant of cut problem:
Minimum global cut: 将图分成两不连通部分所需的最小割
Minimum s-t cut: 分开源点 s 和汇点 t 所需的最小割
size of minimum s-t cut = value of maximum s-t flow
Algorithm: Ford-Fulkerson algorithm (compute then remove augmenting paths in a residual network 计算增广路径), Edmonds-Karp algorithm ()
一个很自然的计算 minimal global cut 的 deterministic 算法:
选择一个固定源点 s fix a node s
将其他 个点分别当作汇点 t,使用 EK 算法执行 次,分别计算 s 到这些 t 的最大流 the global should disconnect node s from any other node t.
返回这些最大流的最小值 return the minimum s-t cut in all these executions
时间复杂度为 It’s the algorithm we wanna beat today.
Karger’s Algorithm
Randomized algorithm for finding the minimum cut:
- Repeat until just two nodes remain:
- Pick an edge of at random and collapse its two endpoints into a single node.
- For the two remaining nodes , set and .
The algorithm has running time
- linear time to update node/edge information after a contraction
- contraction in total.
Analysis
Some facts help to analyze the algorithm.
Fact 1
If denotes the number of edges touching node , then
Obviously, each edge contributes twice w.r.t. 2 nodes it connects.
Fact 2
If there are nodes, then the average degree of a node is .
A very straightforward conclusion.
When we randomly pick a node at random:
Fact 3
The size of the minimum cut is at most .
Consider the partition of into two pieces, one containing a single node , and the other containing the remaining nodes. The size of this cut is . Since this is a valid cut, the minimum cut cannot be bigger than this. i.e.,
Fact 4
If an edge is picked at random, the probability that it lies across the minimum cut is at most .
This is because there are edges to choose from, and at most of them are in the minimum cut.
By facts above, we can easily analyze Karger’s algorithm.
Each time an edge is collapsed, the number of nodes decreases by . Therefore,
Karger’s algorithm succeeds with probability . Therefore, it should be run times, after which the smallest cut found should be chosen.
Another way to implement Karger’s algorithm:
- Assign each edge a random weight
- Run Kruskal’s algorithm to get the minimum spanning tree
- Break the largest edge in the tree to get the two clusters.
Why does Kruskal’s algorithm work?
Waiting for Iannis to tell me …
Run Multiple Times to Bound Error Probability
In order to boost the probability of success, we simply run the algorithm times.
The probability that at least one run succeeds is at least
Setting , we have error probability .
It’s easy to implement Karger’s algorithm so that one run takes time. Therefore, by running the algorithm times, we have an time randomized algorithm with error probability . Still better than EK algorithm for not very sparse graphs.
The Number of
Theorem: Any graph has at most distinct global cuts of .
Recall that we fixed a particular minimum cut and proved the probability .
Denote by the event that the -th minimum cut is returned in an execution by the contraction algorithm, .
Faster Version
Proposed by Karger and Stein.
Key idea
In the initial merging, it’s very unlikely we merged an edge in the . Towards the end of the algorithm, our probability of selecting an edge in the grows.
For a fixed minimum cut , the probability that this cut survives down to vertices is at least . Thus, for we have probability of succeeding.
Detail in slides page 91.
Improved algorithm: From a multi-graph , if has at least vertices, repeat twice:
- run the original algorithm down to vertices.
- recurse on the resulting graph.
Return the minimum of the cuts found in the two recursive calls.
The choice of instead of other constant will only affect the running time by a constant factor.
Since we succeed down to with probability , we have the following recurrence for the probability of success, denote by :
Inner term is probability that the sub-instance preserved AND recursive call found this optimum.
probability of success = 1 - probability of failure.
probability of failure = probability of both subroutines fails
probability of both subroutines fails = square of (probability of a subroutine fails)
probability of a subroutine fails = 1 - 1/2 * one successful recursive call
This solves to . WHY???
Set
This inequality becomes
It suffices to show inductively that
Observe that the RHS of the inequality is increasing in .
By induction hypothesis, , the inequality implies that
as desired.
Hence, similar to the earlier argument for the original algorithm, with runs of the algorithm, the probability of success is .
There fore in total time, we can find the minimum cut with probability .