This time: Kasper Green Larsen

No slides!

Basic Probabilistic Inequalities

The Dictionary Problem: set SS of nn keys: x1,,xnx_1,\cdots,x_n from a universe [U]={0,,U1}[U]=\{0,\cdots,U-1\}. Goal: To store SS in a data structure s.t., given a query element x[U]x\in[U], we can quickly determine whether xSx\in S.

Hashing with Chaining: A data structure solves the dictionary problems: Construct an array AA with mm entries, denoted A[0],,A[m1]A[0],\cdots,A[m-1].

  • Each array entry stores an initially empty linked list.

  • Denote the list stored at entry ii by List(A[i])List(A[i]).

  • Pick a random hash function h:[U][m]h:[U]\rightarrow[m]

    Assumption: hh is truly random and can be evaluated in constant time, i.e.,

    1. \forall set of distinct elements {x1,,xk}[U]\{x_1,\cdots,x_k\}\subseteq[U] and \forall set of values v1,,vk[m]v_1,\cdots,v_k\in[m], we have \Pr_h[h(x_1)=v_1\and\cdots\and h(x_k)=v_k]=\frac{1}{m^k}.
    2. we can compute h(x)h(x) in O(1)O(1) time.
  • After choosing hh, we process x1,,xnx_1,\cdots,x_n of SS. For element xix_i, we compute h(xi)h(x_i) and append xix_i to the list List(A[h(xi)])List(A[h(x_i)]).

  • To answer whether an element xx is in SS, we simply compute h(x)h(x) and scan through the list List(A[h(x)])List(A[h(x)]) to see if it is there.

  • Space: O(m+n)O(m+n).

Linearity of Expectation

Analyze the expected query time of hashing in chaining.

The work spend when answering the query xx is proportional to the number of elements from SS also stored in List(A[h(x)])List(A[h(x)]).

Bad idea: compute E[List(A[h(x)])]\mathbb E[List(A[h(x)])] directly: (Too hard)

=i=0niPr{i=List(A[h(x)])}=i=0ni(1m)i(11m)niCni=\sum_{i=0}^ni\cdot\Pr\{i=List(A[h(x)])\}\\ =\sum_{i=0}^ni\cdot(\frac{1}{m})^i(1-\frac{1}{m})^{n-i}C_n^i

Good idea: break down XX.

Random variable XiX_i for each xix_i, if h(xi)=h(x),Xi=1h(x_i)=h(x), X_i=1, else Xi=0X_i=0.

The expected query time is O(Eh[iXi])O(\mathbb E_h[\sum_iX_i])

Definition of expectation: E[X]=xXPr[X=x]x\mathbb E[X]=\sum_{x\in X}Pr[X=x]\cdot x

Linearity of expectation: E[ixi]=iE[xi]\mathbb E[\sum_ix_i]=\sum_i\mathbb E[x_i]

Thus, by linearity of expectation,

Eh[iXi]=iEh[Xi]Eh[Xi]=Pr[h(xi)=h(x)]1+Pr[h(xi)h(x)]0=Pr[h(xi)=h(x)]\mathbb E_h[\sum_iX_i]=\sum_i\mathbb E_h[X_i]\\ \mathbb E_h[X_i]=\Pr[h(x_i)=h(x)]\cdot1+\Pr[h(x_i)\neq h(x)]\cdot0\\=\Pr[h(x_i)=h(x)]

if xi=x,Pr[h(xi)=h(x)]=1x_i=x,\Pr[h(x_i)=h(x)]=1. if xix,Pr[h(xi)=h(x)]=1mx_i\neq x,\Pr[h(x_i)=h(x)]=\frac{1}{m}.

Thus,

iEh[Xi]1+i:xixEh[Xi]1+nm\sum_i\mathbb E_h[X_i]\le1+\sum_{i:x_i\neq x}\mathbb E_h[X_i]\le1+\frac{n}{m}

choose m=Θ(n)m=\Theta(n), then query time is O(1)O(1) and space usage is O(n)O(n).

Linearity of expectation is also valid if random variables are not independent!!!

Markov’s Inequality

Let XX be a non-negative discrete random variable. t>0\forall t\gt0, we have Pr[x>t]<E[X]t\Pr[x\gt t]\lt\frac{\mathbb E[X]}{t} and Pr[xt]E[X]t\Pr[x\ge t]\le\frac{\mathbb E[X]}{t}

Proof for continuous version

E[X]=0xf(x)dxt>0,E[X]txf(x)dxttf(x)dx=ttf(x)dx=tPr(Xt)\mathbb E[X]=\int_0^\infty xf(x)dx\\ \forall t>0,\mathbb E[X]\ge\int_t^\infty xf(x)dx\\ \ge\int_t^\infty tf(x)dx=t\int_t^\infty f(x)dx=t\cdot\Pr(X\ge t)

Conclusion: If we insert nn elements into hashing with chaining data structure and then ask one query xx, then the linked list List(A[h(x)])List(A[h(x)]) that we have to scan through contains more than t(1+nm)t(1+\frac{n}{m}) elements with probability less than 1t1\over t.

Pr[X>t(1+nm)]<E[X]t(1+nm)1+nmt(1+nm)=1t\Pr[X\gt t(1+\frac{n}{m})]\lt\frac{\mathbb E[X]}{t(1+\frac{n}{m})}\le\frac{1+\frac{n}{m}}{t(1+\frac{n}{m})}=\frac{1}{t}

Markov’s Inequality gives a loose bound: Pr[List(A[h(x)])=n](1+n/m)/n=O(1/n)(m=Θ(n))\Pr[List(A[h(x)])=n] \le(1+n/m)/n=O(1/n)(m=\Theta(n))

Chernoff Bound

😰😰😰What the hell is this?!

Introducing random variable YjY_j: Yj=1Y_j=1 if h(xj)=ih(x_j)=i else Yj=0Y_j=0.

We have List(A[i])=jYj|List(A[i])|=\sum_jY_j.

YjY_js are independent and E[Yj]=1m\mathbb E[Y_j]=\frac{1}{m}.

Chernoff Bound: Let {Xi}\{X_i\} be a set of independent boolean random variables and let X=iXiX=\sum_iX_i. Then,

μE[X],0<δ<1\forall\mu\le\mathbb E[X],\forall0\lt\delta\lt1 we have

Pr[X<(1δ)μ]<eδ2μ2\Pr[X\lt(1-\delta)\mu]\lt e^{-\frac{\delta^2\mu}{2}}

μE[X],0<δ<1\forall\mu\ge\mathbb E[X],\forall0\lt\delta\lt1 we have

Pr[X>(1+δ)μ]<eδ2μ3\Pr[X\gt(1+\delta)\mu]\lt e^{-\frac{\delta^2\mu}{3}}

Finally, δ1,μE[X]\forall\delta\ge1,\forall\mu\ge\mathbb E[X], we have

Pr[X>(1+δ)μ]<(eδ(1+δ)1+δ)μ\Pr[X\gt(1+\delta)\mu]\lt(\frac{e^\delta}{(1+\delta)^{1+\delta}})^\mu

We have E[Y]=nm\mathbb E[Y]=\frac{n}{m}, δ1\forall\delta\ge1, we have Chernoff Bound:

Pr[Y>(1+δ)nm]<(eδ(1+δ)1+δ)nm\Pr[Y\gt(1+\delta)\frac{n}{m}]\lt(\frac{e^\delta}{(1+\delta)^{1+\delta}})^{\frac{n}{m}}

Application in our scenario: m=n,t=(1+δ)2,μ=E[Y]=nm=O(1)m=n,t=(1+\delta)\ge2,\mu=\mathbb E[Y]={n\over m}=O(1):

Pr[List(A[i])>t]=Pr[Y>t]<et1tt<(et)t\Pr[|List(A[i])|\gt t]=\Pr[Y\gt t]\lt\frac{e^{t-1}}{t^t}\lt(\frac{e}{t})^t

Conclusion: we spend more than O(t)O(t) time with probability decreasing as fast as ttt^{-t}.

Proof of Chernoff Bound

By Markov’s inequality,

t>0:Pr(x>a)=Pr(etX>eta)<E[etX]etaDenote MX(t)=E[etX],MX(t)=i=1nMXi(t)MX(t)=pet+(1p)=1+p(et1)ep(et1)Pr(X>(1+δ)μ)<E[etX]et(1+δ)μ=i=1nMXi(t)et(1+δ)μe(et1)ipiet(1+δ)μ=e(et1)E[X]et(1+δ)μ[eet1et(1+δ)]μ\forall t\gt0:\Pr(x\gt a)=\Pr(e^{tX}\gt e^{ta})\lt\frac{\mathbb E[e^{tX}]}{e^{ta}}\\ \text{Denote }M_X(t)=\mathbb E[e^{tX}], M_X(t)=\prod_{i=1}^nM_{X_i}(t)\\ M_X(t)=pe^t+(1-p)=1+p(e^t-1)\le e^{p(e^t-1)}\\ \therefore\Pr(X\gt(1+\delta)\mu)\lt\frac{\mathbb E[e^{tX}]}{e^{t(1+\delta)\mu}}=\frac{\prod_{i=1}^nM_{X_i}(t)}{e^{t(1+\delta)\mu}}\\ \le\frac{e^{(e^t-1)\sum_ip_i}}{e^{t(1+\delta)\mu}}=\frac{e^{(e^t-1)\mathbb E[X]}}{e^{t(1+\delta)\mu}}\le[\frac{e^{e^t-1}}{e^{t(1+\delta)}}]^\mu\\

Let t=ln(1+δ)0t=\ln(1+\delta)\ge0, we have:

Pr[X>(1+δ)μ]<(eδ(1+δ)1+δ)μln[(eδ(1+δ)1+δ)μ]=μ(δ(1+δ)ln(1+δ))x0,ln(1+x)x1+x2μ(δ(1+δ)ln(1+δ))δ22+δμ<μδ23(0<δ<1)Pr[X>(1+δ)μ]<eδ2μ3(0<δ<1)\Pr[X\gt(1+\delta)\mu]\lt(\frac{e^\delta}{(1+\delta)^{1+\delta}})^\mu\\ \Rightarrow\ln[(\frac{e^\delta}{(1+\delta)^{1+\delta}})^\mu]=\mu(\delta-(1+\delta)\ln(1+\delta))\\ \because\forall x\ge0,\ln(1+x)\ge\frac{x}{1+\frac{x}{2}}\\ \therefore\mu(\delta-(1+\delta)\ln(1+\delta))\le-\frac{\delta^2}{2+\delta}\mu\\ \lt-\frac{\mu\delta^2}{3}(\because0\lt\delta\lt1)\\ \Pr[X\gt(1+\delta)\mu]\lt e^{-\frac{\delta^2\mu}{3}}(0\lt\delta\lt1)

Analogously, we have

t>0:Pr(x<a)=Pr(etX>eta)<E[etX]etaPr[X<(1δ)μ]<(eet1et(1δ))μ()\forall t\gt0:\Pr(x\lt a)=\Pr(e^{-tX}\gt e^{-ta})\lt\frac{\mathbb E[e^{-tX}]}{e^{-ta}}\\ \Rarr\cdots\Rarr\cdots\\ \Rarr\Pr[X\lt(1-\delta)\mu]\lt(\frac{e^{e^{-t}-1}}{e^{-t(1-\delta)}})^\mu(**)\\

Let t=ln(1δ)>0t=-\ln(1-\delta)\gt0,

()=(eδ(1δ)1δ)μ()0<δ<1,(1δ)ln(1δ)>δ+δ22()(eδeδ+δ22)μ=eδ2μ2Pr[X<(1δ)μ]<eδ2μ2(**)=(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}})^\mu(*)\\ \because \forall0\lt\delta\lt1,(1-\delta)\ln(1-\delta)\gt-\delta+\frac{\delta^2}{2}\\ \therefore(*)\le(\frac{e^{-\delta}}{e^{-\delta+\frac{\delta^2}{2}}})^\mu=e^{-\frac{\delta^2\mu}{2}}\\ \Rightarrow\Pr[X\lt(1-\delta)\mu]\lt e^{-\frac{\delta^2\mu}{2}}

Union Bound

Let {Ei}\{E_i\} be events, then

Pr[iEi]iPr[Ei]\Pr[\bigcup_iE_i]\le\sum_i\Pr[E_i]

If we define EiE_i as event that List(A[i])>4lnnlnlnn|List(A[i])|\gt4\frac{\ln n}{\ln\ln n}, then by Chernoff Bound,

Pr[Ei]<((e/4)lnlnn/lnn)4lnn/lnlnn<(lnlnn/lnn)4lnn/lnlnn\Pr[E_i]\lt((e/4)\ln\ln n/\ln n)^{4\ln n/\ln\ln n}\lt(\ln\ln n/\ln n)^{4\ln n/\ln\ln n}

(n>3>en\gt3\gt e is needed!)

😲CRAZY MATH!!!

lnlnnlnn(lnlnnlnn)4lnnlnlnn(lnnlnn)4lnnlnlnn=(lnn)2lnnlnlnn=(elnlnn)2lnnlnlnn=e2lnnlnlnnlnlnn=n2\because \ln\ln n\le\sqrt{\ln n}\\ \therefore(\frac{\ln\ln n}{\ln n})^{\frac{4\ln n}{\ln\ln n}}\le(\frac{\sqrt{\ln n}}{\ln n})^{\frac{4\ln n}{\ln\ln n}}=(\ln n)^{-\frac{2\ln n}{\ln\ln n}}\\ =(e^{\ln\ln n})^{-2\frac{\ln n}{\ln\ln n}}=e^{-2\frac{\ln n}{\ln\ln n}\cdot\ln\ln n}=n^{-2}

By union bound, we have

Pr[iEi]in2=n1\Pr[\bigcup_iE_i]\le\sum_in^{-2}=n^{-1}

This means with probability 11n1-\frac{1}{n}, there is not even a single list List(A[i])List(A[i]) with size more than 4lnnlnlnn4\frac{\ln n}{\ln\ln n}, assuming (n3)(n\ge3).

In this case, the worst case query time of the data structure is O(lognloglogn)O(\frac{\log n}{\log\log n})

Hashing with Chaining with m=nm=n, the worst query time is O(lognloglogn)O(\frac{\log n}{\log\log n}) with probability 11n1-\frac{1}{n} when hh is truly random.

The construction time is O(n)O(n) and the space is O(n)O(n).

Worst Case

After insert x1,,xnx_1,\cdots,x_n, one of the list exceeds 4lnnlnlnn4\frac{\ln n}{\ln\ln n}, then we discard the current data structure and try again with a freshly chosen hh.

This performs at least ii iterations with probability no more than (1n)i1(\frac{1}{n})^{i-1}, thus the expected construction time is upper bounded by

i=1(1/n)i1O(n)=O(n)\sum_{i=1}^\infty(1/n)^{i-1}\cdot O(n)=O(n)

We have thus achieved a worst case query time of O(logn/loglogn)O(\log n/\log\log n) with space O(n)O(n) and expected construction time O(n)O(n).

O-Notation

中文中有个很好的对应概念”数量级“

For two functions f:RR,g:RRf:\mathbb R\rightarrow\mathbb R,g:\mathbb R\rightarrow\mathbb R, we can write f(n)=O(g(n))f(n)=O(g(n)) if:

  • There exists a constants n0Rn_0\in\mathbb R and c>0c\gt0, s.t. n>n0\forall n\gt n_0, we have f(n)cg(n)f(n)\le c\cdot g(n).

    e.g. n3/6n2/2=O(n3)n^3/6-n^2/2=O(n^3)

Sum of same magnitude. if i=1,,k,fi(n)=O(g(n))\forall i=1,\cdots,k,f_i(n)=O(g(n)), then i=1kfi(n)=O(g(n))\sum_{i=1}^kf_i(n)=O(g(n))

Constants don’t matter!

Products. if f(n)=O(h(n)),g(n)=O(l(n))f(n)=O(h(n)),g(n)=O(l(n)) then

f(n)g(n)=O(h(n)l(n))f(n)g(n)=O(h(n)l(n))

Base of Logs. The base of logarithms don’t matter! We only use f(n)=O(logn)f(n)=O(\log n).

Careful with exponents!!! 2log2n=O(logn)2\log_2n=O(\log n), but 22log2n=O(n2)2^{2\log_2n}=O(n^2)