This time: Kasper Green Larsen
No slides!
Basic Probabilistic InequalitiesThe Dictionary Problem: set S S S of n n n keys: x 1 , ⋯ , x n x_1,\cdots,x_n x 1 , ⋯ , x n from a universe [ U ] = { 0 , ⋯ , U − 1 } [U]=\{0,\cdots,U-1\} [ U ] = { 0 , ⋯ , U − 1 } . Goal: To store S S S in a data structure s.t., given a query element x ∈ [ U ] x\in[U] x ∈ [ U ] , we can quickly determine whether x ∈ S x\in S x ∈ S .
Hashing with Chaining: A data structure solves the dictionary problems: Construct an array A A A with m m m entries, denoted A [ 0 ] , ⋯ , A [ m − 1 ] A[0],\cdots,A[m-1] A [ 0 ] , ⋯ , A [ m − 1 ] .
Each array entry stores an initially empty linked list.
Denote the list stored at entry i i i by L i s t ( A [ i ] ) List(A[i]) L i s t ( A [ i ] ) .
Pick a random hash function h : [ U ] → [ m ] h:[U]\rightarrow[m] h : [ U ] → [ m ]
Assumption: h h h is truly random and can be evaluated in constant time, i.e.,
∀ \forall ∀ set of distinct elements { x 1 , ⋯ , x k } ⊆ [ U ] \{x_1,\cdots,x_k\}\subseteq[U] { x 1 , ⋯ , x k } ⊆ [ U ] and ∀ \forall ∀ set of values v 1 , ⋯ , v k ∈ [ m ] v_1,\cdots,v_k\in[m] v 1 , ⋯ , v k ∈ [ m ] , we have \Pr_h[h(x_1)=v_1\and\cdots\and h(x_k)=v_k]=\frac{1}{m^k} .we can compute h ( x ) h(x) h ( x ) in O ( 1 ) O(1) O ( 1 ) time. After choosing h h h , we process x 1 , ⋯ , x n x_1,\cdots,x_n x 1 , ⋯ , x n of S S S . For element x i x_i x i , we compute h ( x i ) h(x_i) h ( x i ) and append x i x_i x i to the list L i s t ( A [ h ( x i ) ] ) List(A[h(x_i)]) L i s t ( A [ h ( x i ) ] ) .
To answer whether an element x x x is in S S S , we simply compute h ( x ) h(x) h ( x ) and scan through the list L i s t ( A [ h ( x ) ] ) List(A[h(x)]) L i s t ( A [ h ( x ) ] ) to see if it is there.
Space: O ( m + n ) O(m+n) O ( m + n ) .
Linearity of ExpectationAnalyze the expected query time of hashing in chaining.
The work spend when answering the query x x x is proportional to the number of elements from S S S also stored in L i s t ( A [ h ( x ) ] ) List(A[h(x)]) L i s t ( A [ h ( x ) ] ) .
Bad idea: compute E [ L i s t ( A [ h ( x ) ] ) ] \mathbb E[List(A[h(x)])] E [ L i s t ( A [ h ( x ) ] ) ] directly: (Too hard)
= ∑ i = 0 n i ⋅ Pr { i = L i s t ( A [ h ( x ) ] ) } = ∑ i = 0 n i ⋅ ( 1 m ) i ( 1 − 1 m ) n − i C n i =\sum_{i=0}^ni\cdot\Pr\{i=List(A[h(x)])\}\\ =\sum_{i=0}^ni\cdot(\frac{1}{m})^i(1-\frac{1}{m})^{n-i}C_n^i = i = 0 ∑ n i ⋅ Pr { i = L i s t ( A [ h ( x ) ] ) } = i = 0 ∑ n i ⋅ ( m 1 ) i ( 1 − m 1 ) n − i C n i
Good idea: break down X X X .
Random variable X i X_i X i for each x i x_i x i , if h ( x i ) = h ( x ) , X i = 1 h(x_i)=h(x), X_i=1 h ( x i ) = h ( x ) , X i = 1 , else X i = 0 X_i=0 X i = 0 .
The expected query time is O ( E h [ ∑ i X i ] ) O(\mathbb E_h[\sum_iX_i]) O ( E h [ ∑ i X i ] )
Definition of expectation: E [ X ] = ∑ x ∈ X P r [ X = x ] ⋅ x \mathbb E[X]=\sum_{x\in X}Pr[X=x]\cdot x E [ X ] = ∑ x ∈ X P r [ X = x ] ⋅ x
Linearity of expectation: E [ ∑ i x i ] = ∑ i E [ x i ] \mathbb E[\sum_ix_i]=\sum_i\mathbb E[x_i] E [ ∑ i x i ] = ∑ i E [ x i ]
Thus, by linearity of expectation,
E h [ ∑ i X i ] = ∑ i E h [ X i ] E h [ X i ] = Pr [ h ( x i ) = h ( x ) ] ⋅ 1 + Pr [ h ( x i ) ≠ h ( x ) ] ⋅ 0 = Pr [ h ( x i ) = h ( x ) ] \mathbb E_h[\sum_iX_i]=\sum_i\mathbb E_h[X_i]\\ \mathbb E_h[X_i]=\Pr[h(x_i)=h(x)]\cdot1+\Pr[h(x_i)\neq h(x)]\cdot0\\=\Pr[h(x_i)=h(x)] E h [ i ∑ X i ] = i ∑ E h [ X i ] E h [ X i ] = Pr [ h ( x i ) = h ( x ) ] ⋅ 1 + Pr [ h ( x i ) = h ( x ) ] ⋅ 0 = Pr [ h ( x i ) = h ( x ) ]
if x i = x , Pr [ h ( x i ) = h ( x ) ] = 1 x_i=x,\Pr[h(x_i)=h(x)]=1 x i = x , Pr [ h ( x i ) = h ( x ) ] = 1 . if x i ≠ x , Pr [ h ( x i ) = h ( x ) ] = 1 m x_i\neq x,\Pr[h(x_i)=h(x)]=\frac{1}{m} x i = x , Pr [ h ( x i ) = h ( x ) ] = m 1 .
Thus,
∑ i E h [ X i ] ≤ 1 + ∑ i : x i ≠ x E h [ X i ] ≤ 1 + n m \sum_i\mathbb E_h[X_i]\le1+\sum_{i:x_i\neq x}\mathbb E_h[X_i]\le1+\frac{n}{m} i ∑ E h [ X i ] ≤ 1 + i : x i = x ∑ E h [ X i ] ≤ 1 + m n
choose m = Θ ( n ) m=\Theta(n) m = Θ ( n ) , then query time is O ( 1 ) O(1) O ( 1 ) and space usage is O ( n ) O(n) O ( n ) .
Linearity of expectation is also valid if random variables are not independent!!!
Markov’s InequalityLet X X X be a non-negative discrete random variable. ∀ t > 0 \forall t\gt0 ∀ t > 0 , we have Pr [ x > t ] < E [ X ] t \Pr[x\gt t]\lt\frac{\mathbb E[X]}{t} Pr [ x > t ] < t E [ X ] and Pr [ x ≥ t ] ≤ E [ X ] t \Pr[x\ge t]\le\frac{\mathbb E[X]}{t} Pr [ x ≥ t ] ≤ t E [ X ]
Proof for continuous version
E [ X ] = ∫ 0 ∞ x f ( x ) d x ∀ t > 0 , E [ X ] ≥ ∫ t ∞ x f ( x ) d x ≥ ∫ t ∞ t f ( x ) d x = t ∫ t ∞ f ( x ) d x = t ⋅ Pr ( X ≥ t ) \mathbb E[X]=\int_0^\infty xf(x)dx\\ \forall t>0,\mathbb E[X]\ge\int_t^\infty xf(x)dx\\ \ge\int_t^\infty tf(x)dx=t\int_t^\infty f(x)dx=t\cdot\Pr(X\ge t) E [ X ] = ∫ 0 ∞ x f ( x ) d x ∀ t > 0 , E [ X ] ≥ ∫ t ∞ x f ( x ) d x ≥ ∫ t ∞ t f ( x ) d x = t ∫ t ∞ f ( x ) d x = t ⋅ Pr ( X ≥ t )
Conclusion: If we insert n n n elements into hashing with chaining data structure and then ask one query x x x , then the linked list L i s t ( A [ h ( x ) ] ) List(A[h(x)]) L i s t ( A [ h ( x ) ] ) that we have to scan through contains more than t ( 1 + n m ) t(1+\frac{n}{m}) t ( 1 + m n ) elements with probability less than 1 t 1\over t t 1 .
Pr [ X > t ( 1 + n m ) ] < E [ X ] t ( 1 + n m ) ≤ 1 + n m t ( 1 + n m ) = 1 t \Pr[X\gt t(1+\frac{n}{m})]\lt\frac{\mathbb E[X]}{t(1+\frac{n}{m})}\le\frac{1+\frac{n}{m}}{t(1+\frac{n}{m})}=\frac{1}{t} Pr [ X > t ( 1 + m n ) ] < t ( 1 + m n ) E [ X ] ≤ t ( 1 + m n ) 1 + m n = t 1
Markov’s Inequality gives a loose bound: Pr [ L i s t ( A [ h ( x ) ] ) = n ] ≤ ( 1 + n / m ) / n = O ( 1 / n ) ( m = Θ ( n ) ) \Pr[List(A[h(x)])=n] \le(1+n/m)/n=O(1/n)(m=\Theta(n)) Pr [ L i s t ( A [ h ( x ) ] ) = n ] ≤ ( 1 + n / m ) / n = O ( 1 / n ) ( m = Θ ( n ) )
Chernoff Bound😰😰😰What the hell is this?!
Introducing random variable Y j Y_j Y j : Y j = 1 Y_j=1 Y j = 1 if h ( x j ) = i h(x_j)=i h ( x j ) = i else Y j = 0 Y_j=0 Y j = 0 .
We have ∣ L i s t ( A [ i ] ) ∣ = ∑ j Y j |List(A[i])|=\sum_jY_j ∣ L i s t ( A [ i ] ) ∣ = ∑ j Y j .
Y j Y_j Y j s are independent and E [ Y j ] = 1 m \mathbb E[Y_j]=\frac{1}{m} E [ Y j ] = m 1 .
Chernoff Bound: Let { X i } \{X_i\} { X i } be a set of independent boolean random variables and let X = ∑ i X i X=\sum_iX_i X = ∑ i X i . Then,
∀ μ ≤ E [ X ] , ∀ 0 < δ < 1 \forall\mu\le\mathbb E[X],\forall0\lt\delta\lt1 ∀ μ ≤ E [ X ] , ∀ 0 < δ < 1 we have
Pr [ X < ( 1 − δ ) μ ] < e − δ 2 μ 2 \Pr[X\lt(1-\delta)\mu]\lt e^{-\frac{\delta^2\mu}{2}} Pr [ X < ( 1 − δ ) μ ] < e − 2 δ 2 μ
∀ μ ≥ E [ X ] , ∀ 0 < δ < 1 \forall\mu\ge\mathbb E[X],\forall0\lt\delta\lt1 ∀ μ ≥ E [ X ] , ∀ 0 < δ < 1 we have
Pr [ X > ( 1 + δ ) μ ] < e − δ 2 μ 3 \Pr[X\gt(1+\delta)\mu]\lt e^{-\frac{\delta^2\mu}{3}} Pr [ X > ( 1 + δ ) μ ] < e − 3 δ 2 μ
Finally, ∀ δ ≥ 1 , ∀ μ ≥ E [ X ] \forall\delta\ge1,\forall\mu\ge\mathbb E[X] ∀ δ ≥ 1 , ∀ μ ≥ E [ X ] , we have
Pr [ X > ( 1 + δ ) μ ] < ( e δ ( 1 + δ ) 1 + δ ) μ \Pr[X\gt(1+\delta)\mu]\lt(\frac{e^\delta}{(1+\delta)^{1+\delta}})^\mu Pr [ X > ( 1 + δ ) μ ] < ( ( 1 + δ ) 1 + δ e δ ) μ
We have E [ Y ] = n m \mathbb E[Y]=\frac{n}{m} E [ Y ] = m n , ∀ δ ≥ 1 \forall\delta\ge1 ∀ δ ≥ 1 , we have Chernoff Bound :
Pr [ Y > ( 1 + δ ) n m ] < ( e δ ( 1 + δ ) 1 + δ ) n m \Pr[Y\gt(1+\delta)\frac{n}{m}]\lt(\frac{e^\delta}{(1+\delta)^{1+\delta}})^{\frac{n}{m}} Pr [ Y > ( 1 + δ ) m n ] < ( ( 1 + δ ) 1 + δ e δ ) m n
Application in our scenario: m = n , t = ( 1 + δ ) ≥ 2 , μ = E [ Y ] = n m = O ( 1 ) m=n,t=(1+\delta)\ge2,\mu=\mathbb E[Y]={n\over m}=O(1) m = n , t = ( 1 + δ ) ≥ 2 , μ = E [ Y ] = m n = O ( 1 ) :
Pr [ ∣ L i s t ( A [ i ] ) ∣ > t ] = Pr [ Y > t ] < e t − 1 t t < ( e t ) t \Pr[|List(A[i])|\gt t]=\Pr[Y\gt t]\lt\frac{e^{t-1}}{t^t}\lt(\frac{e}{t})^t Pr [ ∣ L i s t ( A [ i ] ) ∣ > t ] = Pr [ Y > t ] < t t e t − 1 < ( t e ) t
Conclusion: we spend more than O ( t ) O(t) O ( t ) time with probability decreasing as fast as t − t t^{-t} t − t .
Proof of Chernoff Bound
By Markov’s inequality,
∀ t > 0 : Pr ( x > a ) = Pr ( e t X > e t a ) < E [ e t X ] e t a Denote M X ( t ) = E [ e t X ] , M X ( t ) = ∏ i = 1 n M X i ( t ) M X ( t ) = p e t + ( 1 − p ) = 1 + p ( e t − 1 ) ≤ e p ( e t − 1 ) ∴ Pr ( X > ( 1 + δ ) μ ) < E [ e t X ] e t ( 1 + δ ) μ = ∏ i = 1 n M X i ( t ) e t ( 1 + δ ) μ ≤ e ( e t − 1 ) ∑ i p i e t ( 1 + δ ) μ = e ( e t − 1 ) E [ X ] e t ( 1 + δ ) μ ≤ [ e e t − 1 e t ( 1 + δ ) ] μ \forall t\gt0:\Pr(x\gt a)=\Pr(e^{tX}\gt e^{ta})\lt\frac{\mathbb E[e^{tX}]}{e^{ta}}\\ \text{Denote }M_X(t)=\mathbb E[e^{tX}], M_X(t)=\prod_{i=1}^nM_{X_i}(t)\\ M_X(t)=pe^t+(1-p)=1+p(e^t-1)\le e^{p(e^t-1)}\\ \therefore\Pr(X\gt(1+\delta)\mu)\lt\frac{\mathbb E[e^{tX}]}{e^{t(1+\delta)\mu}}=\frac{\prod_{i=1}^nM_{X_i}(t)}{e^{t(1+\delta)\mu}}\\ \le\frac{e^{(e^t-1)\sum_ip_i}}{e^{t(1+\delta)\mu}}=\frac{e^{(e^t-1)\mathbb E[X]}}{e^{t(1+\delta)\mu}}\le[\frac{e^{e^t-1}}{e^{t(1+\delta)}}]^\mu\\ ∀ t > 0 : Pr ( x > a ) = Pr ( e t X > e t a ) < e t a E [ e t X ] Denote M X ( t ) = E [ e t X ] , M X ( t ) = i = 1 ∏ n M X i ( t ) M X ( t ) = p e t + ( 1 − p ) = 1 + p ( e t − 1 ) ≤ e p ( e t − 1 ) ∴ Pr ( X > ( 1 + δ ) μ ) < e t ( 1 + δ ) μ E [ e t X ] = e t ( 1 + δ ) μ ∏ i = 1 n M X i ( t ) ≤ e t ( 1 + δ ) μ e ( e t − 1 ) ∑ i p i = e t ( 1 + δ ) μ e ( e t − 1 ) E [ X ] ≤ [ e t ( 1 + δ ) e e t − 1 ] μ
Let t = ln ( 1 + δ ) ≥ 0 t=\ln(1+\delta)\ge0 t = ln ( 1 + δ ) ≥ 0 , we have:
Pr [ X > ( 1 + δ ) μ ] < ( e δ ( 1 + δ ) 1 + δ ) μ ⇒ ln [ ( e δ ( 1 + δ ) 1 + δ ) μ ] = μ ( δ − ( 1 + δ ) ln ( 1 + δ ) ) ∵ ∀ x ≥ 0 , ln ( 1 + x ) ≥ x 1 + x 2 ∴ μ ( δ − ( 1 + δ ) ln ( 1 + δ ) ) ≤ − δ 2 2 + δ μ < − μ δ 2 3 ( ∵ 0 < δ < 1 ) Pr [ X > ( 1 + δ ) μ ] < e − δ 2 μ 3 ( 0 < δ < 1 ) \Pr[X\gt(1+\delta)\mu]\lt(\frac{e^\delta}{(1+\delta)^{1+\delta}})^\mu\\ \Rightarrow\ln[(\frac{e^\delta}{(1+\delta)^{1+\delta}})^\mu]=\mu(\delta-(1+\delta)\ln(1+\delta))\\ \because\forall x\ge0,\ln(1+x)\ge\frac{x}{1+\frac{x}{2}}\\ \therefore\mu(\delta-(1+\delta)\ln(1+\delta))\le-\frac{\delta^2}{2+\delta}\mu\\ \lt-\frac{\mu\delta^2}{3}(\because0\lt\delta\lt1)\\ \Pr[X\gt(1+\delta)\mu]\lt e^{-\frac{\delta^2\mu}{3}}(0\lt\delta\lt1) Pr [ X > ( 1 + δ ) μ ] < ( ( 1 + δ ) 1 + δ e δ ) μ ⇒ ln [ ( ( 1 + δ ) 1 + δ e δ ) μ ] = μ ( δ − ( 1 + δ ) ln ( 1 + δ ) ) ∵ ∀ x ≥ 0 , ln ( 1 + x ) ≥ 1 + 2 x x ∴ μ ( δ − ( 1 + δ ) ln ( 1 + δ ) ) ≤ − 2 + δ δ 2 μ < − 3 μ δ 2 ( ∵ 0 < δ < 1 ) Pr [ X > ( 1 + δ ) μ ] < e − 3 δ 2 μ ( 0 < δ < 1 )
Analogously, we have
∀ t > 0 : Pr ( x < a ) = Pr ( e − t X > e − t a ) < E [ e − t X ] e − t a ⇒ ⋯ ⇒ ⋯ ⇒ Pr [ X < ( 1 − δ ) μ ] < ( e e − t − 1 e − t ( 1 − δ ) ) μ ( ∗ ∗ ) \forall t\gt0:\Pr(x\lt a)=\Pr(e^{-tX}\gt e^{-ta})\lt\frac{\mathbb E[e^{-tX}]}{e^{-ta}}\\ \Rarr\cdots\Rarr\cdots\\ \Rarr\Pr[X\lt(1-\delta)\mu]\lt(\frac{e^{e^{-t}-1}}{e^{-t(1-\delta)}})^\mu(**)\\ ∀ t > 0 : Pr ( x < a ) = Pr ( e − t X > e − t a ) < e − t a E [ e − t X ] ⇒ ⋯ ⇒ ⋯ ⇒ Pr [ X < ( 1 − δ ) μ ] < ( e − t ( 1 − δ ) e e − t − 1 ) μ ( ∗ ∗ )
Let t = − ln ( 1 − δ ) > 0 t=-\ln(1-\delta)\gt0 t = − ln ( 1 − δ ) > 0 ,
( ∗ ∗ ) = ( e − δ ( 1 − δ ) 1 − δ ) μ ( ∗ ) ∵ ∀ 0 < δ < 1 , ( 1 − δ ) ln ( 1 − δ ) > − δ + δ 2 2 ∴ ( ∗ ) ≤ ( e − δ e − δ + δ 2 2 ) μ = e − δ 2 μ 2 ⇒ Pr [ X < ( 1 − δ ) μ ] < e − δ 2 μ 2 (**)=(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}})^\mu(*)\\ \because \forall0\lt\delta\lt1,(1-\delta)\ln(1-\delta)\gt-\delta+\frac{\delta^2}{2}\\ \therefore(*)\le(\frac{e^{-\delta}}{e^{-\delta+\frac{\delta^2}{2}}})^\mu=e^{-\frac{\delta^2\mu}{2}}\\ \Rightarrow\Pr[X\lt(1-\delta)\mu]\lt e^{-\frac{\delta^2\mu}{2}} ( ∗ ∗ ) = ( ( 1 − δ ) 1 − δ e − δ ) μ ( ∗ ) ∵ ∀ 0 < δ < 1 , ( 1 − δ ) ln ( 1 − δ ) > − δ + 2 δ 2 ∴ ( ∗ ) ≤ ( e − δ + 2 δ 2 e − δ ) μ = e − 2 δ 2 μ ⇒ Pr [ X < ( 1 − δ ) μ ] < e − 2 δ 2 μ
Union BoundLet { E i } \{E_i\} { E i } be events, then
Pr [ ⋃ i E i ] ≤ ∑ i Pr [ E i ] \Pr[\bigcup_iE_i]\le\sum_i\Pr[E_i] Pr [ i ⋃ E i ] ≤ i ∑ Pr [ E i ]
If we define E i E_i E i as event that ∣ L i s t ( A [ i ] ) ∣ > 4 ln n ln ln n |List(A[i])|\gt4\frac{\ln n}{\ln\ln n} ∣ L i s t ( A [ i ] ) ∣ > 4 l n l n n l n n , then by Chernoff Bound,
Pr [ E i ] < ( ( e / 4 ) ln ln n / ln n ) 4 ln n / ln ln n < ( ln ln n / ln n ) 4 ln n / ln ln n \Pr[E_i]\lt((e/4)\ln\ln n/\ln n)^{4\ln n/\ln\ln n}\lt(\ln\ln n/\ln n)^{4\ln n/\ln\ln n} Pr [ E i ] < ( ( e / 4 ) ln ln n / ln n ) 4 l n n / l n l n n < ( ln ln n / ln n ) 4 l n n / l n l n n
(n > 3 > e n\gt3\gt e n > 3 > e is needed!)
😲CRAZY MATH!!!
∵ ln ln n ≤ ln n ∴ ( ln ln n ln n ) 4 ln n ln ln n ≤ ( ln n ln n ) 4 ln n ln ln n = ( ln n ) − 2 ln n ln ln n = ( e ln ln n ) − 2 ln n ln ln n = e − 2 ln n ln ln n ⋅ ln ln n = n − 2 \because \ln\ln n\le\sqrt{\ln n}\\ \therefore(\frac{\ln\ln n}{\ln n})^{\frac{4\ln n}{\ln\ln n}}\le(\frac{\sqrt{\ln n}}{\ln n})^{\frac{4\ln n}{\ln\ln n}}=(\ln n)^{-\frac{2\ln n}{\ln\ln n}}\\ =(e^{\ln\ln n})^{-2\frac{\ln n}{\ln\ln n}}=e^{-2\frac{\ln n}{\ln\ln n}\cdot\ln\ln n}=n^{-2} ∵ ln ln n ≤ ln n ∴ ( ln n ln ln n ) l n l n n 4 l n n ≤ ( ln n ln n ) l n l n n 4 l n n = ( ln n ) − l n l n n 2 l n n = ( e l n l n n ) − 2 l n l n n l n n = e − 2 l n l n n l n n ⋅ l n l n n = n − 2
By union bound, we have
Pr [ ⋃ i E i ] ≤ ∑ i n − 2 = n − 1 \Pr[\bigcup_iE_i]\le\sum_in^{-2}=n^{-1} Pr [ i ⋃ E i ] ≤ i ∑ n − 2 = n − 1
This means with probability 1 − 1 n 1-\frac{1}{n} 1 − n 1 , there is not even a single list L i s t ( A [ i ] ) List(A[i]) L i s t ( A [ i ] ) with size more than 4 ln n ln ln n 4\frac{\ln n}{\ln\ln n} 4 l n l n n l n n , assuming ( n ≥ 3 ) (n\ge3) ( n ≥ 3 ) .
In this case, the worst case query time of the data structure is O ( log n log log n ) O(\frac{\log n}{\log\log n}) O ( l o g l o g n l o g n )
Hashing with Chaining with m = n m=n m = n , the worst query time is O ( log n log log n ) O(\frac{\log n}{\log\log n}) O ( l o g l o g n l o g n ) with probability 1 − 1 n 1-\frac{1}{n} 1 − n 1 when h h h is truly random.
The construction time is O ( n ) O(n) O ( n ) and the space is O ( n ) O(n) O ( n ) .
Worst CaseAfter insert x 1 , ⋯ , x n x_1,\cdots,x_n x 1 , ⋯ , x n , one of the list exceeds 4 ln n ln ln n 4\frac{\ln n}{\ln\ln n} 4 l n l n n l n n , then we discard the current data structure and try again with a freshly chosen h h h .
This performs at least i i i iterations with probability no more than ( 1 n ) i − 1 (\frac{1}{n})^{i-1} ( n 1 ) i − 1 , thus the expected construction time is upper bounded by
∑ i = 1 ∞ ( 1 / n ) i − 1 ⋅ O ( n ) = O ( n ) \sum_{i=1}^\infty(1/n)^{i-1}\cdot O(n)=O(n) i = 1 ∑ ∞ ( 1 / n ) i − 1 ⋅ O ( n ) = O ( n )
We have thus achieved a worst case query time of O ( log n / log log n ) O(\log n/\log\log n) O ( log n / log log n ) with space O ( n ) O(n) O ( n ) and expected construction time O ( n ) O(n) O ( n ) .
O-Notation中文中有个很好的对应概念”数量级“
For two functions f : R → R , g : R → R f:\mathbb R\rightarrow\mathbb R,g:\mathbb R\rightarrow\mathbb R f : R → R , g : R → R , we can write f ( n ) = O ( g ( n ) ) f(n)=O(g(n)) f ( n ) = O ( g ( n ) ) if:
There exists a constants n 0 ∈ R n_0\in\mathbb R n 0 ∈ R and c > 0 c\gt0 c > 0 , s.t. ∀ n > n 0 \forall n\gt n_0 ∀ n > n 0 , we have f ( n ) ≤ c ⋅ g ( n ) f(n)\le c\cdot g(n) f ( n ) ≤ c ⋅ g ( n ) .
e.g. n 3 / 6 − n 2 / 2 = O ( n 3 ) n^3/6-n^2/2=O(n^3) n 3 / 6 − n 2 / 2 = O ( n 3 )
Sum of same magnitude. if ∀ i = 1 , ⋯ , k , f i ( n ) = O ( g ( n ) ) \forall i=1,\cdots,k,f_i(n)=O(g(n)) ∀ i = 1 , ⋯ , k , f i ( n ) = O ( g ( n ) ) , then ∑ i = 1 k f i ( n ) = O ( g ( n ) ) \sum_{i=1}^kf_i(n)=O(g(n)) ∑ i = 1 k f i ( n ) = O ( g ( n ) )
Constants don’t matter!
Products. if f ( n ) = O ( h ( n ) ) , g ( n ) = O ( l ( n ) ) f(n)=O(h(n)),g(n)=O(l(n)) f ( n ) = O ( h ( n ) ) , g ( n ) = O ( l ( n ) ) then
f ( n ) g ( n ) = O ( h ( n ) l ( n ) ) f(n)g(n)=O(h(n)l(n)) f ( n ) g ( n ) = O ( h ( n ) l ( n ) )
Base of Logs. The base of logarithms don’t matter! We only use f ( n ) = O ( log n ) f(n)=O(\log n) f ( n ) = O ( log n ) .
Careful with exponents!!! 2 log 2 n = O ( log n ) 2\log_2n=O(\log n) 2 log 2 n = O ( log n ) , but 2 2 log 2 n = O ( n 2 ) 2^{2\log_2n}=O(n^2) 2 2 l o g 2 n = O ( n 2 )