APS 重点复习课程之算法导论
Overview
Mathematical Way to Evaluate Performance of Algorithm
- Time Complexity
- Space Complexity
Brute-force
Dynamic Programming
- Longest Common Subsequence
Greedy
- Hoffman Coding
- Minimum Spanning Tree (MST)
- Prim
- Kruskal
Graph
Shortest Path
Floyd (All Pairs Shortest Paths)
Bellman-Ford (Single Pair)
Dijkstra (Single Pair)
Top Sort
Strong Connected Components (SCC)
- Korsaraju
- Tarjan
Maximum Flow Problem
- Dinic
- Max Flow Min Cost: Linear Programming
Number Theory
- Greatest Common Devisor
- Lucas
String
- Knuth Morris Pratt (KMP)
NP Complete
Mathematical Way to Evaluate Performance of Algorithm
Time Complexity
Time complexity is the amount of computational effort required to execute the algorithm.
The method of analyzing the time complexity
For Loop
3 Loops of matrix multiply:
1 | const int N=/**/; |
For Recursion
Computing factorial of n.
1 | template<typename T> |
According to the recurrence formula:
Space Complexity
Space complexity is the memory space required to execute the algorithm.
Brute-force
aka generate and test, very general
Implementation costs are proportional to the number of candidate solutions – which in many practical problems tends to grow very quickly as the size of the problem increases.
Typically used when the problem size is limited
Dynamic Programming
Simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner.
Longest Common Subsequence
Sequence X and Y.
Let be the length of LCS of X first-i subsequence and Y first-j subsequence.
1 | int LCS(string X, string Y){ |
Time complexity: , space complexity: .
Greedy
Hoffman Coding
See in note of Data Structure.
Minimum Spanning Tree (MST)
A subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.
Prim
(每趟选一个点)
Time Complexity: (depends on how to store graph)
Adjacent matrix:
Adjacent list:
Step:
- Initialize a tree with a single, chosen arbitrarily from the graph.
- Grow the tree by one edge: of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.
- Repeat step 2 until all vertices are in the tree.
Kruskal
Kruskal’s Algorithm = Greedy + disjoint sets
(每趟选一个边)
For a graph with E edges and V vertices, Kruskal’s algorithm can be shown to run in time, or equivalently, time, all with simple data structures.
Step:
- create a forest F (a set of trees), where each vertex in the graph is a separate Tree
- create a sorted set containing all the edges in the graph
- while S is nonempty and F is not yet spanning
- remove an edge with minimum weight from S
- if the removed edge connects two different trees then add it to the forest F, combining two trees into a single tree
1 |
|
Graph
Shortest Path
Finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized.
Floyd (All Pairs Shortest Paths)
1 |
|
The time complexity of the Floyd algorithm is , where V is the number of vertices in the graph.
Bellman-Ford (Single Pair)
Code
1 |
|
Process
At each iteration, the algorithm considers all of the edges in the graph and updates the distances to each vertex if a shorter path is found. The algorithm continues until all of the edges have been relaxed a number of times equal to the number of vertices in the graph.
Here is a more detailed description of the Bellman-Ford algorithm:
Initialize the distances to all vertices to infinity, except for the source vertex, which has a distance of 0.
For each iteration of the outer loop:
For each edge in the graph:
If the distance to the destination vertex of the edge is greater than the distance to the source vertex plus the weight of the edge, update the distance to the destination vertex.
If any of the distances to the vertices have been updated in the previous step, repeat the outer loop.
The algorithm terminates when all of the edges have been relaxed a number of times equal to the number of vertices in the graph, or when none of the distances are updated in the previous step. This ensures that the distances to all of the vertices are correct and that the shortest paths have been found.
Time Complexity
The time complexity of the Bellman-Ford algorithm using an adjacent list to store the graph is , where V is the number of vertices and E is the number of edges in the graph.
Dijkstra (Single Pair)
Code
1 |
|
Process
- Start with the source vertex, and assign it a distance of 0.
- Initialize the priority queue with all the vertices, ordered by their distances from the source.
- Repeat the following steps until the destination is reached or the queue is empty:
- Dequeue the vertex with the smallest distance from the queue.
- Update the distances of its neighbors by considering the edge weights and the current distances of the vertices.
- Re-order the queue according to the updated distances of the vertices.
- If the destination was reached, the algorithm has produced a shortest-path tree. Otherwise, the destination is not reachable from the source.
Time Complexity
The time complexity of Dijkstra’s algorithm using a priority queue is , where E is the number of edges and V is the number of vertices in the graph.
Top Sort
Code
1 |
|
Process
To perform a topological sort on a DAG, the following steps can be followed:
- Start by choosing an arbitrary node
v
in the graph and marking it as visited. - Consider all nodes
u
that are adjacent tov
, and that have not yet been visited. For each such nodeu
, mark it as visited and recursively apply the topological sort to it. - When all nodes that are adjacent to
v
have been visited, addv
to the output list. - Repeat this process until all nodes in the graph have been visited and added to the output list. The resulting output list is the topological sort of the graph.
It is important to note that a topological sort is only possible for a DAG, as it is not possible to produce a linear ordering of the nodes in a graph that contains cycles. If the input graph contains a cycle, the topological sort will fail. So top sort can judge whether a graph contains circles.
Complexity
The time complexity of topological sort is , where V
is the number of nodes in the graph and E
is the number of edges. This is because the algorithm visits each node and each edge in the graph once, so the time complexity is proportional to the sum of the number of nodes and edges.
Strong Connected Components (SCC)
A strong connected component is a group of nodes in which there is a path from any node to any other node in the group.
Korsaraju
- Start by performing a depth-first search (DFS) on the graph to compute the finish times for each node. The finish time of a node is the time at which the DFS algorithm finishes visiting that node.
- Create a new graph that is the transpose of the original graph. The transpose of a graph is a new graph with the same nodes as the original graph, but with the direction of each edge reversed.
- Perform another DFS on the transposed graph, but this time visit the nodes in decreasing order of their finish times from the previous DFS. This will produce a forest of trees, where each tree represents a strong connected component in the original graph.
Tarjan
- Start by performing a depth-first search (DFS) on the graph.
- As the DFS algorithm visits each node, maintain a stack of nodes that are currently in the DFS search tree.
- When a node is first visited, push it onto the stack.
- When a node is finished being visited (i.e., all of its descendants have been visited), pop it off the stack and add it to the current strong connected component.
- If a node
v
is visited while another nodeu
is on the stack, and there is a path fromu
tov
in the search tree, thenv
must also be in the same strong connected component asu
. In this case, remove all nodes from the stack that are on the path fromu
tov
, and add them to the current strong connected component.
Flow Problem
Hungarian Algorithm
Time complexity: , finding best - match of Bipartite graph.
- Begin by subtracting the smallest element from each row and column of the cost matrix. This ensures that all zeros in the final matrix will be part of a minimum-cost assignment.
- Cover each row of the matrix with the fewest possible number of lines (vertical and horizontal) such that all zeros are covered.
- If all rows are covered, then the algorithm terminates and the minimum-cost assignment is found. If not, then proceed to the next step.
- Find the smallest uncovered element in the matrix, and add it to every element that is covered by a line. Subtract this smallest element from every element that is not covered by a line.
- Go back to step 2.
Max Flow Min Cost
One way to solve the minimum-cost flow problem is to use the shortest path algorithm. This involves finding the shortest path from the source to the sink in the network, and then incrementally sending flow along this path until the maximum flow is reached or the path is exhausted.
Another approach is to use linear programming to formulate the problem as a series of linear equations and then solve for the optimal flow using a linear programming solver.
Number Theory
Greatest Common Devisor
1 | int gcd(int a,int b){ |
Time complexity is
Lucas Theorem
p is prime.
Time complexity: .
1 | // C++ Version |
String
Knuth Morris Pratt (KMP)
Time complexity:
A string-matching algorithm that is used to find the occurrences of a pattern in a given string. It does this by preprocessing the pattern to compute a table of failure function values, which are then used to skip comparisons when searching for the pattern in the string. This can make the KMP algorithm faster than other string-matching algorithms in certain cases.
Next array means the maximum length of prefix and suffix which prefix = suffix of current sub-string.
1 | int* getNext(string p) { |
NP Complete
to classify problems according to their computational complexity.
P (polynomial time)
A problem is in P if there exists an algorithm for solving the problem such that the time it takes to solve the problem is at most a polynomial function of the size of the input.
NP (nondeterministic polynomial time)
A problem is in NP if there exists an algorithm for verifying solutions to the problem such that the time it takes to verify a solution is at most a polynomial function of the size of the input. Can’t (or not sure to) be solved in polynomial time.
, but is not solved now.
NP-hard
A problem is said to be NP-hard if it is at least as hard as any problem in NP. In other words, an NP-hard problem is a problem that is at least as difficult to solve as any problem in NP. Note that an NP-hard problem does not necessarily have to be in NP, but any problem in NP is also NP-hard.
NPC(omplete)
A problem is in NPC if it is both in NP and NP-hard.
z.B. TSP (traveling salesperson problem), The subset sum problem, The 3-satisfiability problem (3-SAT): Given a boolean formula in conjunctive normal form with at most three literals per clause, determine whether there exists a truth assignment to the variables that makes the formula evaluate to true.