随机算法 9 Prophets, Secretaries and Martingales
First part of lecture notes on prophet inequality and secretary problem is by Anupam Gupta (CMU)
Prophet inequalities and Secretary problems are 2 classes of problem where online decision-making meets stochasticity: in the first set of problems the inputs are random variables, whereas in the second one the inputs are worst-case but revealed to the algorithm (aka decision maker) in random order.
Here are some results, proofs and techniques about them.
The Prophet Inequality
在之前的算法博弈论课上学过,具体可以参考我的对应 课堂笔记
random variables . We know their distributions up-front, but not their realizations. The realizations are revealed one-by-one.
We want to give a strategy that, upon seeing the value decides either to choose in which case we get reward and the process stops. Or we can pass, in which case we move on to the next random variables, and are not allowed to come back to every again. We want to maximize our expected reward. If X_\max=\max\{X_1,X_2,\cdots,X_n\}, it is clear that our reward can’t exceed \mathbb E[X_\max].
Theorem 1 by Krengel, Sucheston and Garling. There exists a strategy with expected reward \frac{1}{2}\mathbb E[X_\max].
Let be the median of the distribution of X_\max, i.e., \Pr[X_\max\ge\tau]=1/2.
Now the strategy is simple: pick the first which exceeds . Clearly, we will pick an item with probability exactly , then notice that
\mathbb E[X_\max]\le\tau+\mathbb E[(X_\max-\tau)^+]\\ \le\tau+\mathbb E[\sum_{i=1}^n(X_i-\tau)^+].
Then for algorithm:
ALG\ge\tau\cdot\Pr[X_\max\ge\tau]+\sum_{i=1}^n\mathbb E[(X_i-\tau)^+]\cdot\Pr[\bigwedge_{j\le i}(X_j\lt\tau)]\\ \ge\tau\cdot\Pr[X_\max\ge\tau]+\sum_{i=1}^n\mathbb E[(X_i-\tau)^+]\cdot\Pr[X_\max\lt\tau]
By letting \Pr[X_\max\lt\tau]=\Pr[X_\max\ge\tau]=1/2, we have
ALG\ge\frac{1}{2}\mathbb E[X_\max]
The proof is beautiful, but hard to generalize. What if we can choose variables?
There are LP-based version of proof, which can be generalized:
Define as the probability that element takes on the highest value. Hence . Moreover, suppose is s.t. , i.e., the -th percentile for . Define
as value of conditioned on it lying in the top -th percentile. Clearly, \mathbb E[X_\max]\le\sum_iv_i(p_i)\cdot p_i.
A simpler weaker bound. Here’s a simple algorithm that gets value \frac{1}{4}\sum_iv_i(p_i)\cdot p_i\ge\frac{1}{4}\mathbb E[X_\max].
If we haven’t chosen an item among , when looking at item , pass with probability half without even looking at , else accept it if .
Definition: we “reach” item if we have not picked an item before . The expected value of the algorithm is
Since we pick each item with probability , the expected number of items we choose is half. So, by Markov’s inequality, the probability we pick no item at all is at least half. Hence, the probability we reach item is at least one half too, the above expression is as claimed.
The bound of 2. To improve the algorithm, suppose we denote the probability of reaching item by and suppose we reject item outright with probability . Then inequality shows that
Above, we ensured that , and hence lost a factor of . But if we ensures that , we would get the desired result of \frac{1}{2}\mathbb E[X_\max]. For the first item and hence we can set .
For future items, since
we can just set . A simple induction shows that . Summing up this equation to get , so that and is well-defined.
The Computational Aspect
If the distribution of the random variable stream is explicitly given, the proofs above immediately give us algorithms. But what if are given via a black-box (we can only access via samples)?
The first proof merely relies on finding the median: finding an approximate median s.t. \Pr[X_\max\lt\hat\tau]\in(\frac{1}{2}-\epsilon,\frac{1}{2}+\epsilon) gives us expected reward \frac{1}{2+\epsilon/2}\mathbb E[X_\max]. To do this, sample from X_\max O(\epsilon^{-2}\log\delta^{-1}) times and take is an approximate median with probability at least .
For the second proof, there are 2 ways of making it algorithmic. Firstly, if the quantities are polynomially bounded, estimate and by sampling. Alternatively, solve the convex program
and use the ’s from its solution in lieu of ’s in the algorithm above.
Since getting samples from the distributions may be expensive, there are method to get a constant fraction of \mathbb E[X_\max] via taking just one sample from each of the s.
Extensions: Picking Multiple Items
What if we are allowed to choose variables from among the ?
If is the probability that is among the top values, we have
The upper bound on our quantity of interest remains essentially unchanged
What about an algorithm to get value of the value in this inequality? Then same: reject each item outright with probability , else pick if . Proof goes through unchanged.
Better
One can get within of the value. For , this matches . One can, however, get a fractor of using a simple concentration bound.
Suppose we reduce the rejection probability to . Then the probability that we reach some item without having picked items already is lower-bounded by the probability that we pick at most elements in the entire process. Since we reject items with probability , the expected number of elements we pick is .
Hence, the probability that we pick less than items is at least , by a Hoeffding bound for sums of independent random variables. Now setting ensures that the probability of reaching each item is at least , and a argument similar to that in Proof shows that
which gives the claimed bound of .
Secretary Problems
items, each has some intrinsic non-negative value.
Assume the values are distinct, but we know nothing about their ranges.
We only know .
The item are presented one by one. Upon seeing an item, we can either pick it or we can pass. The goal is to maximize the probability of picking the item with the highest value.
If an adversary chooses the order in which the items are presented, every deterministic strategy must fail.
❓Secretary problem: what if the items are presented in uniformly random order? For this setting, it seems surprising that one can prove the following theorem:
Theorem 2.1. There is a strategy to pick the best item with probability at least .
A simple algorithm and proof showing probability of : ignore the first items, and then pick the next item that is better than all the ones seen so far. Note that this algorithm succeeds if the best item is in the second half of the items (happens with probability ). Hence . Rejecting the first half of the items is not optimal, and there are other cases where the algorithm succeeds that this simple analysis does not account for.
First, the algorithm should never pick an element that is not the best so far. Moreover, if we define
then is a non-increasing and is increasing.
Any optimal strategy should pass at times where and pick at other times if we see an element that is best so far. i.e., a strategy of the type used in the simple proof of is optimal.
So if we reject the first items, then the probability we succeed in picking the global best is
This is . So the first position where is given by the smallest s.t. , i.e., for large , hence .
There is an alternate proof that uses a convex-programming. We will write down an LP that captures some properties of any feasible solution, optimize this LP and show a strategy whose success probability is comparable to the objective of this LP.
Fix an optimal strategy. Assume w.l.o.g(不失一般性) that it does not pick any item that is not the best so far.
Let be the probability that this strategy picks an item at position . Let be the probability that we pick an item at position , conditioned on it being the best so far. So .
Now the probability of picking the best item is
Constraints: . But also
But not picking the first items is independent of being the best so far, so we get
Hence the success probability of any strategy is upper-bounded by the following LP in variables :
Now it can be checked that the solution for and for is a feasible solution, where is defined by the smallest value s.t. .
Finally we can get a stopping strategy whose success probability matches that of the LP. Indeed solve the LP.
For position if we’ve not picked an item already and if this item is the best so far, pick it with probability . By the LP constraint, this probability . More over, removing the conditioning shows we pick an item at location with probability , and a calculation similar to the one above shows that the algorithm’s success probability is , the same as the LP.
Extension: Game-Theoretic Issues
In the optimal strategy, we don’t pick any items in the first timesteps, and then we pick items with quite varying probabilities.
Suppose we insist that for each position , the probability of picking the item at position is the same.
Let’s fix any such strategy, and write an LP capturing the success probbailities of this strategy with uniformity condition as a constraint. Suppose is this uniform probability. Again, let be the probability of picking an item at position , conditioned on it being the best so far. Note that we may pick items even if they are not the best so far, just to satisfy the uniformity condition; hence instead of , we have .
By same argument constraint in LP, we know that .
And the strategy’s success probability is again . So we can now solve the LP
Buchbinder et al. shows the optimal value of this LP is at least .
Extension: Multiple Items
Suppose we want items, maximizing the expected sum of these values.
Suppose the set of largest values is , and their total value is . It is easy to get an algorithm with expected value .
E.g., split the items into groups of items, and run the single-item algorithm separately on each of these. Or ignore the first half of the elements, look at the value of the highest value item in this set, and pick all items in the second half with values grater than . And indeed, ignoring half the items must lose a constant factor in expectation.
But here’s an algorithm that gives value where as . We set and . Ignore the first items.
Now look at the value of the -highest valued item in this ignored set, and pick the first elements with values grater than among the remaining elements.
Second part of lecture notes on martingales is by Elias Koutsoupias (Oxford)
Martingales
中文翻译:”鞅“,之前从来没听说过。
Chernoff bounds cannot be used for sums of dependent variables.
If the dependency of particular type, the Azuma-Hoeffding inequality provides a general concentration bound.
Definition of martingale: A martingale is a sequence of random variables of bounded expectation s.t. for every ,
More generally, a sequence of random variables is a martingale w.r.t. the sequence if for all the following conditions hold:
- is a function of
Here are some example of martingales.
Gambler’s Fortune
A gambler plays a sequence of fair games. Let be the amount that the gambler’s total winnings immediately after the -th game. Since the games are fair, and , which shows that very finite sequence is a martingale. This is a generalization of the simple random walk on a line. The gambler can use the past outcomes and any algorithm to calculate the amount of the next bet.
Balls and Bins
Suppose that we throw balls into bins independently and uniformly at random.
Consider the expected number of empty bins. Let be the random variable representing the bin into which the -th ball falls. Let be a random variable representing the number of empty bins. Then the sequence of random variables
is a martingale. Clearly is a function of the ’s and has bounded expectation. Furthermore
We can view as an estimate of after having observed the outcomes . At the beginning is a crude estimate, simply the expectation of . As we add more balls to the bins. ’s give improved estimates of , and at the end we get the exact value .
Doob Martingales
The balls and bins is a typical doob martingale. Doob martingales are processes in which we obtain a sequence of improved estimates of the value of a random variable as information about it is revealed progressively. More precisely, suppose is a r.v. that is a function of random variables . As we observe the sequence of random variables , we improve our estimates of . The sequence of the mean estimates
form a martingale w.r.t. the sequence (provided that ’s bounded). Indeed, when we argued that the balls and bins process is a martingale, we used no property of the experiment, therefore the following holds in general
Azuma-Hoeffding Inequality
Let be a martingale s.t. .
Then for any ,
and
Proof of the first inequality, (similar proof for second inequality):
By applying Markov’s inequality to an appropriate random variable and it is similar to the proof of Chernoff’s bounds.
We use the variables . The step of the proof:
- We use Chernoff bounds and instead of bounding , we bound :
Then focus on , which can be replaced by ’s
by telescoping, .
- We can’t use since s are not independent. Consider the conditional expectation.
because for fixed , all but the last factor in the product are constant and can be moved out.
Then find an upper bound on .
Observe that by martingale property.
Then by the condition ,
for , . Then rewrite as , where , then use the convexity of to get
Combine the above to get
It follows that
- Then take expectations on both sides to get rid of the conditional expectation:
- Then use same standard techniques
- By induction:
- Therefore
- Set to minimize the above expression and get the bound of the theorem.
Apply AH Bounds on Gambler’s Fortune
A gambler plays a sequence of fair games. We have seen that if denotes the gambler’s total winnings immediately after the -th game, the sequence is a martingale. Suppose that the gambler has a very sophisticated algorithm to decide the amount that he bets every day that takes into account past bets and outcomes. Since the games are fair, the expected gain is zero.
Winning is not concentrated around the mean value in general.
For example, the case that gambler puts higher and higher bets. Suppose now that there is a bound on the size of bets, By AH inequality, the final winnings are concentrated with high probability in where .