Greedy and Local Search Algorithms

Locally optimal solution might not lead to a globally optimal one.

Greedy algorithms

A solution is constructed step-by-step. At each step the next part of the solution is constructed in order to be locally optimal
Primal feasible algorithms: the solution is constructed during the algorithm

Local search algorithms

Start with an arbitrary feasible solution
Check if a small local change improves the objective function
In the affirmative case, it changes the solution, otherwise it stops.
They are primal feasible algorithms: they always maintain a feasible solution

The Knapsack Problem

Problem Setting

Given

A set of $n$ elements: $I$
A value $v_i$ for each $i\in I$
A size $s_i$ for each $i\in I$
A maximum size $B$

Find a subset $S\subseteq I$ s.t.

$\sum_{i\in S}s_i\leq B$
$\sum_{i\in S}v_i$ is maximum

Assume that $s_i\leq B\forall i$

Greedy Knapsack

See here.

Theorem: Greedy Knapsack algorithm is an $1/2$-approximation algorithm for the knapsack problem.

Show that an upper bound ;to the optimum, and show that the factor between the solution and upper bound is at most $1/2$.

Let $\ell$ be the first element not chosen by Greedy Knapsack.

Let $\bar v=\sum_{j=1}^{\ell-1}v_j$ and $\bar s=\sum_{j=1}^{\ell-1}s_{j}$.

We show that $OPT\leq\bar v+v_\ell$

If we exchange any subset of $S=\{1,2,\cdots,\ell-1\}$ with any subset of $I\backslash S=\{\ell,\cdots,n\}$ that does not increase the occupancy, then we do not increase the value.

This implies that the greedy finds an optimal solution if the overall occupancy of the greedy solution matches the size of the knapsack.

If we consider a knapsack with size $\bar s+s_\ell$, the greedy finds an optimal solution with value $OPT'=\bar v+v_\ell$.

Thus, $OPT\leq OPT'=\bar v+v_\ell$.

$OPT\leq\bar v+v_\ell\leq v_G+v_1$. There are two cases:

$V_G\geq v_1$: $OPT\leq v_G+v_1\leq 2v_G$

$V_G< v_1$: $OPT\leq v_G+v_1\leq 2v_1$

In any case, $OPT\leq2\max\{v_G,v_1\}$.

The k-center Problem

Problem Setting

Given

A complete graph $G=(V,E)$
Distance $d_{ij}$ for each $i,j\in V$ satisfying symmetry, triangle inequality
Integer $k$

Given a set $S\subseteq V$, we define

for a node $i\in V,d(i,S)=\min_{j\in S}d_{ij}$
radius of $S$: $\max_{i\in V}d(i,S)$

Find a set of pivots $S\subseteq V$ s.t.

$|S|=k$
The radius of $S$ is minimum

Greedy k-center Algorithm

Pick arbitrarily $i\in V$

$S=\{i\}$

while $|S|<k$ do

$\ell=\arg\max_{j\in V}d(j,S)$
$S=S\cup\{\ell\}$

Theorem: The greedy $k$-center algorithm is a $2$-approximation algorithm for the $k$-center problem.

Let $S^*=\{j_1,\cdots,j_k\}$ be the optimal solution and $r^*$ its radius.

$V_1,\cdots,V_k$ is the partition of $V$ induced by $S^*$.

For each $i=1,\cdots,k$ and each pair $j,j'\in V_i,d_{jj'}\leq2r^*$.

$d_{jj'}\leq d_{jj_i}+d_{j_ij'}\leq2r^*$ (by triangular inequality)

Two cases.

Each node in $S$ is in a different set $V_i$: each node in $V$ is at distance $\leq 2r^*$ from a node in $S$.

Two nodes in $S$ are in the same set $V_i$: The algorithm first selects $j'$ and then $j$. $d_{jj'}\leq 2r^*$ and $j$ is the furthest from $S$ in the iteration when $j$ is selected.

Hence each other node is at most $2r^*$ distance from the nodes already selected.

This holds also in the successive iterations, $d(i,S)\leq 2r^*$ for each $i\in V$.

Hardness of Approximation

Theorem: There is no $\alpha$-approximation algorithm for the $k$-center problem for a $\alpha<2$, unless $P=NP$.

Show that if there exists an $\alpha$-approximation algorithm with $\alpha<2$ for $k$-center problem, we can solve the dominating set problem:

Given a graph $G$ and an integer $k$

Question: is there a set $S\subseteq V$ s.t. $|S|\leq k$; for each $v\in V$, either $v\in S$ or its any neighbor is in $S$.

Given an instance of dominating set, we define an instance of $k$-center where $d_{ij}=1$ if edge $(i,j)\in E$; $2$ otherwise.

There is a dominating set of size $k$ iff $r^*=1$.

If $r^*=1$, then any $\alpha$-approximation algorithm with $\alpha<2$ produces a solution with $r=1$.

Scheduling Jobs

On a Single Machine

Minimum Lateness Scheduling Problem

Given

A set of $n$ jobs
A processing time $p_j$ for each job $j=1,\cdots,n$
A release date $r_j\geq 0$ for each job $j=1,\cdots,n$
A due date $d_j$ for each job $j=1,\cdots,n$.

Define

Completion time of a job $j$, $C_j$: time when $j$ completes its processing
Lateness of a job $j$, $L_j=C_j-d_j$.

Goal:

Schedule all the jobs minimizing the maximum lateness: $L_\max=\max_jL_j$
A machine can schedule at most one job at a time.
If a machine starts to process a job, it must process it until its completion.

Observation

The problem is $NP$-hard. Even deciding if there exists a schedule for which $L_\max\leq0$ is strongly $NP$-hard.

The definition of approximation ratio has the following issue

Consider a $\rho$-approximation algorithm
For all the input instances with optimal value $L_\max^*\leq0$, the algorithm finds a solution with $L_\max=\rho\cdot L_\max^*\leq0$.
As deciding if there exists a schedule for which $L_\max\leq0$ is strongly $NP$-hard, this implies that $P=NP$.

Walkaround: Consider that all the due dates are negative. (the maximum due date is less than the minimum release date)

Lower Bound

For a subset $S$ of jobs: $$ r(S)=\min_{j\in S}r_j,\p(S)=\sum_{j\in S}p_j,\d(S)=\max_{j\in S}d_j $$ $L_\max^*$ is the optimal value.

Lemma: For each subset $S$ of jobs, $L_\max^*\geq r(S)+p(S)-d(S)$.

Consider the last job $j$ of $S$ to be processed by an optimal schedule. None of the jobs of $S$ can be processed before $r(S)$.

The jobs of $S$ requires $p(S)$.

Since $j$ is the last job of $S$, it cannot be completed before. $r(S)+p(S),C_j\geq r(S)+p(S)$.

$d_j\leq d(S)$

$L_\max^*\geq L_j=C_j-d_j\geq r(S)+p(S)-d(S)$

Algorithm Earliest Due Date

AKA Earliest Deadline First

A job $j$ is available at time $t$ if $r_j\leq t$.

The algorithm: At each moment that the machine is idle, start processing an available job with the earliest due date.

Theorem: EDD is $2$-approximation algorithm for the problem of minimizing the maximum lateness on a single machine to release dates with negative due dates.

$j$ is the job with maximum lateness: $L_\max=C_j-d_j$.

$t$ is the smallest time s.t. the machine processes without idle time in $[t,C_j]$.

$S$ is the set of jobs processed in $[t,C_j]$.

No jobs of $S$ were available before $t$ and at least a job of $S$ is available after $t$, hence $r(s)=t$.

Only jobs of $S$ are processed in $[t,C_j]$, hence $p(S)=C_j-t=C_j-r(S)$ and $C_j=r(S)+p(S)$.

By lemma, $L_\max^*\geq r(S)+p(S)-d(s)\geq r(S)+p(S)=C_j$.

Lemma with $\{j\}$, $L_\max^*\geq r_j+p_j-d_j\geq -d_j$

$2\cdot L_\max^*\geq C_j-d_j=L_\max$

On Identical Parallel Machines

Problem Setting

Given

A set of $n$ jobs
A processing time $p_j$ for each job $j=1,\cdots,n$
$m$ parallel machines

Assumptions:

Completion time of a job $j$: $C_j$ is the time when $j$ completes its processing
All jobs are available at time $0$
A machine can schedule at most one job at a time
If a machine begins to process a job it must process it until its completion

Goal: Schedule all the jobs minimizing the maximum completion time (makespan) $$ C_\max=\max_jC_j. $$

Local Search Algorithm

Start with any schedule

Repeat

Let $\ell$ be the job which finishes last
If there exists a machine that finishes before $C_\ell-p_\ell$, then move $\ell$ to such machine

until the last job cannot be transferred.

Lower Bounds of Local Search

Each job must be scheduled $$ C_\max^*\geq\max_jp_j. $$ On average a machine has assigned $\frac{1}{m}\sum_{j=1}^np_j$ units of work. There exists a machine that has at least $\frac{1}{m}\sum_{j=1}^np_j$ units of work: $$ C_\max^{*\geq\frac{1}{m}\sum_{j=1}}np_j $$ Theorem: The local search algorithm is a $2$-approximation algorithm for the minimum makespan scheduling on identical parallel machines.

Show that:

The solution obtained is a factor $2$ from the optimal

The running time of the algorithm is polynomial

For approximation factor:

Let $\ell$ be s.t. $C_\max=C_\ell$. As the algorithm terminates, every other machine is busy in time interval $[0,S_\ell],S_\ell=C_\ell-p_\ell$.

We partition the time into $[0,S_\ell]$ and $[S_\ell, C_\ell]$:

$[S_\ell, C_\ell]$: As $C_\max^*\geq\max_jp_j$, then interval $[S_\ell, C_\ell]$ has length $p_\ell\leq C_\max^*$.

$[0,S_\ell]$: In the interval, all the machines are busy. Total amount of work $mS_\ell\leq\sum_{j=1}^np_j$, $S_\ell\leq\frac{1}{m}\sum_{j=1}^np_j$. $C_\max^*\geq\frac{1}{m}\sum_{j=1}^np_j\geq S_\ell$

We have $C_\ell\leq 2C_\max^*$.

Running time:

We assume that transferring jobs to the machine that finishes earliest. Let $C_\max$ be its completion time.

At each iteration $C_\max$ never increases and if it remains the same, the number of machine that achieve $C_\max$ decreases.

The same holds for $C_\min$.

Show that a job is never transferred twice

Suppose that job $j$ is transferred from $i$ to $i'$ and then to $i''$

The machine that finishes earliest has a completion time of $C_\min$ and $C_\min'$

To have an improvement $C_\min'<C_\min$, a contradiction.

The algorithm terminates in at most $n$ iterations.

Improved Analysis of Local Search

We can exclude $\ell$ processed in $[0,S_\ell]$ $$ S_\ell\leq\frac{1}{m}\sum_{j\neq\ell}p_j $$ The total length is: $$ p_\ell+\frac{1}{m}\sum_{j\neq\ell}p_j=(1-\frac{1}{m})p_\ell+\frac{1}{m}\sum_{j=1}^{np_j\leq(1-\frac{1}{m})C_\max}+C_\max^{*=(2-\frac{1}{m})C_\max}. $$

Greedy Algorithm - List Scheduling

Assign a job as soon as there is an available machine to process it.

Order the jobs in a list arbitrarily
Schedule the job at the top of the list to an available machine

The analysis is trivial:

Apply the local search algorithm to the obtained schedule, it immediately declare that the solution cannot be improved.

Theorem: List Scheduling algorithm is a $2$-approximation algorithm for the minimum makespan scheduling on identical parallel machines.

Greedy Algorithm - Longest Processing Time

Sort the jobs in non-increasing order and apply list scheduling.

Theorem: The longest processing time rule is a $4/3$-approximation algorithm for the minimum makespan scheduling on identical parallel machines.

Take a minimal counterexample to the theorem.

Assume that $p_1\geq\cdots\geq p_n$

The last job $\ell$ to complete is the smallest one $p_n$ ( in the minimal counterexample), otherwise we can obtain a smaller counterexample by removing jobs $\ell+1,\cdots,n$.

Two cases

$p_\ell\leq1/3C_\max^*$:

By the analysis of local search algorithm $C_\ell-p_\ell\leq C_\max^*$, therefore $C_\ell\leq4/3C_\max^*$

$p_\ell>1/3C_\max^*$:

Since $p_\ell$ is the smallest job, then all the jobs have processing time strictly greater than $1/3C_\max^*$

If $p_i>1/3C_\max^*$ for each job $i$, then the longest processing time rule computes an optimal schedule.

Lemma: If $p_i>1/3C_\max^*$ for each job $i$, then the longest processing time rule computes an optimal schedule.

In the optimal schedule each machine can process at most two jobs!

Therefore, $n\leq 2m$.

If the longest processing time rule schedules $2$ jobs per machine, then it is the optimum.

Otherwise, if there is the third job on some machine, the total processing time is $\ge C_\max^*$, deriving a contradiction.

Maximizing Monotone Submodular Functions

The float is the time between a payment by check and the time that the funds for that payment are deducted from the banking account.

During that time, the account holder can continue to accrue interest on the money.

We want to maximize the overall gain obtained by the floats on a set of bank accounts.

It can be modeled as a maximization problem.

Problem Setting

Given

$B$ a set of banks.
$P$ a set of payees.
$v_{ij}\geq0$ the value of the float created by paying payee $j\in P$ from bank account $i\in B$.
an integer $k$.

Find a subset $S\subseteq B$ s.t.

$|S|\leq k$
$v(S)=\sum_{j\in P}\max_{i\in S}v_{ij}$ is maximum

Greedy Max Floats

At each step, add the bank that most increases the objective function.

$S=\emptyset$

while $|S|\leq k$ do

$i=\arg\max_{i\in B}\{v(S\cup\{i\})-v(S)\}$
$S=S\cup\{i\}$

return $S$

Lemma: Let $O$ be the optimal solution, $S$ be the set of banks at the start of some iteration of max floats, $i\in B$ be the bank chosen in the iteration. then $$ v(S\cup{i})-v(S)\geq\frac{1}{k}(v(O)-v(S)). $$ Theorem: Max floats gives a $(1-\frac{1}{e})$-approximation algorithm for the float maximization problem.

Sub-modular Functions

A function $f$ satisfying the following property is called submodular: For any $X\subseteq Y$, and for any $\ell\not\in Y$: $$ f(Y\cup{\ell})-f(Y)\leq f(X\cup{\ell})-f(X) $$ The property of decreasing marginal benefits.

Equivalent definition: $f(S)+f(T)\geq f(S\cup T)+f(S\cap T)$.

A more general problem

Given a set of items $E$
a monotone submodular objective function $f(S)\forall S\subseteq E$
an integer $k$

Find a subset $S\subseteq E$ s.t.

$|S|\leq k$
$f(S)$ is maximum

The greedy algorithm gives a $(1-\frac{1}{e})$-approximation for this problem.