Dynamic Programming
An optimal solution to a problem is built up from the optimal solution of some subproblems.
Weakly \(NP\)-hard problems sometimes have dynamic programming algorithms which run in pseudo-polynomial time.
Strongly \(NP\)-hard problem: A problem is strongly \(NP\)-hard if it is \(NP\)-hard even when its numeric data is encoded in unary.
Pseudo-polynomial algorithms: An algorithm is pseudo-polynomial if its running time is polynomial in the size of the input when the numeric part is encoded in unary.
- A problem is weakly \(NP\)-hard if it has a pseudo-polynomial algorithm
- For instance, a dynamic programming algorithm for knapsack requires \(O(n\cdot B)\) time, it is polynomial if \(B\) is encoded in unary, but \(B\) can be represented in \(\log_2B\) bits.
Two techniques which uses dynamic programming:
- We can devise approximation algorithm by rounding the values of the instances in order to have a polynomial number of distinct values.
- We can split the input in "large" and "small" parts.
The Knapsack Problem
Dynamic Programming
We maintain an array \(A(j)\), for each \(j\), which is a list of pairs \((t,w)\). If \((t,w)\in A(j)\), then there exists \(S\subseteq\{1,\cdots,j\}\) s.t.
- \(\sum_{i\in S}s_i=t\)
- \(\sum_{i\in S}v_i=w\)
A pair \((t,w)\) dominates \((t',w')\) if \(t\leq t'\land w\geq w'\).
We ensure that in \(A(j)\) no pair dominates another one: \(A(j)=\{(t_i,w_i)|i=1,\cdots,k\}\)
- \(t_1<t_2<\cdots<t_k\)
- \(w_1<w_2<\cdots<w_k\)
The sizes and the value are integers. If \(V=\sum_{i=1}^nv_i\), then each least has at most \(\min\{B+1,V+1\}\) pairs.
Pseudocode
\(A(1)=\{(0,0),(s_1,v_1)\}\)
for \(j=2\to n\) do
- \(A(j)=A(j-1)\)
- for each \((t,w)\in A(j-1)\) do
- if \(t+s_j\leq B\) then Add \((t+s_j,w+s_j)\) to \(A(j)\)
- Remove dominated pairs from \(A(j)\)
return \(\max_{(t,w)\in A(n)}w\)
To remove dominated pairs:
- sort \(A(j)\) w.r.t. first component
- Retain the best value for each space total
- Remove any larger space total that does not have a larger value
Theorem: The DP correctly computes the optimal value of knapsack problem.
Running Time
\(O(n\cdot\min\{B,V\})\), \(V=\sum_{i=1}^nv_i\). It is pseudo-polynomial.
We round the values of \(v_i\) in order to have \(V=O(poly(n))\).
Complexity Classes
Polynomial Time Approximation Scheme (PTAS): A PTAS is a family of algorithms \(\mathcal A\), where there is an algorithm for each \(\epsilon>0\), s.t. \(A\in\mathcal A\) is a \((1+\epsilon)\)-approximation algorithm (for min problem) or a \((1-\epsilon)\)-approximation algorithm (for max problem)
Fully Polynomial Time Approximation Scheme (FPTAS): a FPTAS is a PTAS s.t. the running time of \(A\) is bounded by a polynomial in \(1/\epsilon\).
FPTAS for Knapsack
Consider a value \(\mu\) and convert each \(v_i\) by rounding down to the nearest integer multiple of \(\mu\): $$ v_i'=\lfloor\frac{v_i}{\mu}\rfloor. $$ We can use the dynamic programming algorithm on the items with sizes \(s_i\) and values \(v_i'\).
We choose \(\mu\) s.t.
- The optimal solution for the rounded data is as a near-optimal solution for the true data.
- The sum of \(v_i'\) is polynomial in \(n\).
If we use \(\tilde v_i=v_i'\mu\) then we have an error at most \(\mu\) for each \(v_i\).
Therefore, we have a solution which changes by at most a factor \(n\mu\).
We want this error to be \(\epsilon\) far from the optimum.
\(M=\max_{i\in I}v_i\) is a lower bound to the optimum.
We choose \(\mu\) s.t. \(n\mu=\epsilon M\) i.e. \(\mu=\epsilon M/n\).
Moreover, with the modified values, $$ V'=\sum_{i=1}nv_i'=\sum_{i=1}n\lfloor\frac{v_in}{\epsilon M}\rfloor=O(n^2/\epsilon) $$
Pseudocode
\(M=\max_{i\in I}v_i\)
\(\mu=\epsilon M/n\)
\(v_i'=\lfloor v_i/\mu\rfloor\) for all \(i\in I\)
Run the DP on instance with \(v_i'\).
Running time \(O(n^3/\epsilon)\).
Theorem: this algorithm is a \(FPTAS\) for the knapsack problem
Parallel Job Scheduling
Minimum makespan scheduling
PTAS
We use similar arguments for longest processing time rule to obtain \(PTAS\). We partition the jobs in two sets: short jobs and long jobs.
We optimally assign the long jobs by enumeration and assign the short jobs by using list scheduling.
- If the last job is long, then it is optimal.
- If the last job is short, then the list scheduling analysis applies that gives a \((1+s)\)-approximation, where \(s\) is the processing time of a short job.
We show that there exists a trade-off between the number of long jobs and the solution found.
Given a job \(i\) and a fixed integer \(k\) (\(\epsilon=1/k\))
- short: \(p_i\leq\frac{1}{km}\sum_{j=1}^np_j\)
- long: \(p_i>\frac{1}{km}\sum_{j=1}^np_j\)
There are at most \(km\) long jobs.
PTAS for minimum makespan scheduling on identical parallel machines.
- Enumerate all the possible schedule for the long jobs and choose the one with minimum makespan
- Extend this schedule by using list scheduling for the short jobs
There are \(m\) machines and at most \(km\) long jobs. So enumerating \(m^{km}\) possible schedules. It is polynomial if \(m\) is constant.
Approximation
For the last job \(\ell\) $$ C_\ell=C_\max\leq p_\ell+\frac{1}{m}\sum_{j\neq\ell}p_j. $$ Two cases:
- \(\ell\) is long: the schedule is optimal. (the short jobs fill the "empty spaces")
- \(\ell\) is short: \(p_\ell\leq\frac{1}{km}\sum_{j=1}^np_j\), then
Theorem: The family \(\{A_k\}\) of the above algorithms is a PTAS for the minimum makespan scheduling on a constant number of identical parallel machines.
Extend to More Machines
We don't need the schedule for the long jobs to be optimal.
- We only used the optimality when the last job to finish was a long job.
- It is sufficient to have a schedule for the long jobs that has makespan at most \(1+1/k\) times the optimal value
We can round the processing time of long jobs and then use dynamic programming as in knapsack problem.
Fix a target makespan \(T\) and find a family of algorithm \(B_k\) that
- either find a schedule of length \((1+1/k)T\),
- or prove that a schedule of length \(T\) does not exist.
We assume that \(T\geq\frac{1}{m}\sum_{j=1}^np_j\), since otherwise no feasible schedule exists.
Algorithm
Partition: Short if \(p_i<T/k\), long if \(p_i≥T/k\).
Round: For each long job, let \(p_i′=⌊p_i/(T/k2)⌋⋅(T/k2)p_i′=⌊p_i/(T/k2)⌋⋅(T/k2)\). (The number of distinct rounded values is limited)
DP: Determine if the rounded long jobs can be scheduled on mm machines within time \(T\).
- If no such schedule exists, then no schedule of length \(T\) exists for the original input.
- If yes, convert it to a schedule for original long jobs (each job's true time is at most \(T/k^2\) more than its rounded time).
Extend: Assign short jobs greedily (list scheduling).
Analysis
- Long jobs: On a single machine, at most \(k\) long jobs (since each rounded job \(\geq T/k\)). The total true processing time on that machine is at most \(T+k⋅T/k^2=(1+1/k)T\).
- Short jobs: If the last job \(\ell\) is short, \(p_\ell<T/k\), and by list scheduling: \(C_\max≤p_\ell+\frac{1}{m}\sum_{j\neq\ell}p_j≤T/k+T=(1+1/k)T\).
Thus, if a schedule of length \(T\) exists, \(B_k\) finds one of length \((1+1/k)T\).
DP for rounded long jobs: Since each long job has rounded processing time a multiple of \(T/k^2\), the number of distinct types is \(O(k^2)\). The DP can be done in time \(n^{O(k^2)}\), which is polynomial in \(n\) but exponential in \(k^2\).
To PTAS (Binary Search)
We want to find a schedule with makespan at most \((1+\epsilon)OPT\). Let \(k=⌈1/\epsilon⌉\).
We perform a binary search on \(T\) over an interval \([L,U]\) that is guaranteed to contain \(OPT\):
- \(L_0=\max\{\frac{1}{m}\sum p_j, \max_jp_j\}\) (a lower bound).
- \(U_0=\frac{1}{m}\sum p_j+\max_jp_j\) (an upper bound, e.g., from list scheduling).
Search:
While \(L<U\):
- Set \(T=⌊(L+U)/2⌋\).
- Run \(B_k\) with this \(T\).
- If \(B_k\) produces a schedule, set \(U=T\) (we have a schedule of length \(\leq (1+\epsilon)T\)).
- Else, set \(L=T+1\) (no schedule of length \(T\) exists, so \(OPT>T\)).
Termination: When \(L=U\), we have a schedule of length at most \((1+\epsilon)L≤(1+\epsilon)OPT\).
Running time: Each \(B_k\) call takes \(n^{O(k^2)}=n^{O(1/ϵ^2)}\), and we make \(O(\log(U0−L0))\) calls (polynomial in input size). The overall complexity is exponential in \(1/ϵ^2\), so it is a \(PTAS\) but not an \(FPTAS\).
Strong NP-Hardness and Impossibility of FPTAS
Theorem: If a strongly \(NP\)-hard optimization problem has an integer-valued objective and the optimum is bounded by a polynomial in the unary input size, then the problem does not admit an \(FPTAS\) unless \(P=NP\).
Proof sketch: Suppose an \(FPTAS\) exists with running time polynomial in \(1/ϵ\). Set \(ϵ=1/p(|Iu|)\), where \(p\) is the polynomial bound on \(OPT\). Then the \(FPTAS\) returns a solution with value \(<OPT+1\) (since \(\epsilon\cdot OPT<1\)), hence an optimal solution. The running time becomes polynomial in the unary input size, contradicting strong \(NP\)-hardness.
Corollary: Minimum makespan scheduling is strongly \(NP\)-hard, so it cannot have an \(FPTAS\) (unless \(P=NP\)).
Approximation Classes
FPTAS (Fully Polynomial Time Approximation Scheme): approximation ratio \(1\pm\epsilon\) in time polynomial in \(n\) and \(1/\epsilon\). Example: Knapsack.
PTAS (Polynomial Time Approximation Scheme): ratio \(1\pm\epsilon\) in time polynomial in \(n\) (may be exponential in \(1/\epsilon\)). Example: Minimum makespan scheduling.
APX: constant factor approximation. Example: \(k\)-center.
log-APX: approximation ratio \(O(\log n)\). Example: Set Cover.
Poly-APX: ratio \(O(n\alpha)\). Example: Coloring.
exp-APX: ratio exponential in \(n\). Example TSP.
Approximation Classes Hierarchy
The hierarchy collapse if \(P=NP\).