Brief Introduction

Invertible Bloom Lookup Table (abbr. IBLT)

a fancy hash-table stores key-value pair (x,y)(x,y)

space budget: tt

Guarantee

It “works” when the number of key-value pairs, nn, in table has ntn\le t.

“temporarily” stops working if n>tn\gt t.

starts working again if nn drops below tt.

Operations

(x,y) is key-value pair.

insert(x,y), delete(x,y), get(x), listentries.

Details

Start with table with k3k\ge3 rows, m=2×tm=2\times t columns.

In each row, we have a hash function h1,,hkh_1,\cdots,h_k.

hih_i is universal.

Each of the entry, called cell, stores values:

  • count
  • key_sum
  • value_sum

Insert Operation

  1. for i=1i=1 to kk
    • c=A[i][hi(x)]c=A[i][h_i(x)]
    • c.count++c.count++
    • c.key_sum+=xc.key\_sum+= x
    • c.value_sum+=yc.value\_sum+=y

Delete Operation

“undo” insert operation

We must know the pair is there before deletion.

  1. for i=1i=1 to kk
    • c=A[i][hi(x)]c=A[i][h_i(x)]
    • c.countc.count--
    • c.key_sum=xc.key\_sum-= x
    • c.value_sum=yc.value\_sum-=y

Get Operation

slightly different from min sketch count

  1. ret=ret=\infty

  2. for i=1i=1 to kk

    • c=A[i][hi(x)]c=A[i][h_i(x)]

    • if c.count==0c.count==0 return “not there”

    • else if c.count==1c.count==1 then

      • if c.key_sum==xc.key\_sum==x

        return c.value_sumc.value\_sum

      • else return “not there”

  3. return “don’t know”

Probability of Success

If ntn\le t, then get(x) succeeds with probability 12k\ge1-2^{-k}.

Assume k>1k\gt1, nn pairs: (xi,yi)(x_i,y_i) is stored in IBLT, ntn\le t.

For get(x)

  • If xx is in IBLT

    1. xx 在 IBLT 的这一行里面,但是 get 函数发现对应的哈希值指向的位置都发生了碰撞的概率。
    2. 明明 xx 不在 IBLT 中,但是 get 函数发现对应的哈希值指向的位置中存在一个值,即 count=1 但 value 不是 xx 对应的。

    Define fail-event Ei:{h(x)=h(xi)}E_i:\{h(x)=h(x_i)\},

Pr[get(x) fails]=Pr[xixEi]xixPr[Ei]xix1m=n1mt1m=t12t12\Pr[get(x)\text{ fails}]=\Pr[\bigcup_{x_i\neq x}E_i]\le\sum_{x_i\neq x}\Pr[E_i]\\ \le\sum_{x_i\neq x}\frac{1}{m}=\frac{n-1}{m}\le\frac{t-1}{m}=\frac{t-1}{2t}\le\frac{1}{2}

  • If xx is not in IBLT.

    明明 xx 不在 IBLT 的这一行里面,但是 get 函数得出该键对应的哈希值指向的位置在 IBLT 这一行中产生碰撞的概率。

    Define fail-event Eij={h(x)=h(xi)=h(xj)}E_{ij}=\{h(x)=h(x_i)=h(x_j)\}

    Pr[ijEij]i<jPr[Eij]=i<j1m2=Cn21m2=n(n1)2m2t22m2t28t2=18\Pr[\bigcup_{i\neq j}E_{ij}]\le\sum_{i\lt j}\Pr[E_{ij}]=\sum_{i\lt j}\frac{1}{m^2}=C_n^2\frac{1}{m^2}\\ =\frac{n(n-1)}{2m^2}\le\frac{t^2}{2m^2}\le\frac{t^2}{8t^2}=\frac{1}{8}

If we choose k=log2t+log21δk=\log_2t+\log_2\frac{1}{\delta}, the probability of failure is at most δ\delta.

How to Avoid False Deletion?

another set of universal hash function g1,,gk:U[R]g_1,\cdots,g_k:U\rarr[R]

while query, check whether c.hash_sum==g(c.key_sum)c.hash\_sum==g(c.key\_sum)

ListEntries Operation

  1. Q=Q=\empty

  2. for cc in IBLT

    • if c.count==1c.count==1
      • Q.push(c.key_sum,c.value_sum)Q.push(c.key\_sum,c.value\_sum)

    // Max size of QQ is mkmk here

    • while Q.size>0Q.size\gt0

      • (x,y)=Q.top()(x,y)=Q.top()

      • Q.pop()Q.pop()

      • if output.continas(x)==falseoutput.continas(x)==false

        output.add(x,y)output.add(x,y)

      • for i=1i=1 to kk

        • c=A[i][hi(x)]c'=A[i][h_i(x)]
        • c.countc'.count--
        • c.key_sum=xc'.key\_sum-=x
        • c.value_sum=yc'.value\_sum-=y
        • if c.count==1c'.count==1
          • Q.push(c.key_sum,c,value_sum)Q.push(c'.key\_sum,c',value\_sum)

Analysis of Success Probability

hyper-graph

node: memory location

hh is truly random

nn edges, e1,,ene_1,\cdots,e_n are independently uniformly random.

List entries fails iff non-empty 2-core exists.

i.e. largest subgraph where every node has degree2degree\ge2

F={ListEntries fails}F=\{ListEntries\text{ fails}\}

Fj={ListEntries fails with j key-value left}F_j=\{ListEntries\text{ fails with j key-value left}\}

Pr[F]=Pr[j=2nFj]j=2nPr[Fj]   (1)\Pr[F]=\Pr[\bigcup_{j=2}^nF_j]\le\sum_{j=2}^n\Pr[F_j]\ \ \ (1)

For every SCxjS\in C_{x}^j, FjS={ListEntries fails, with S as j keys}F_{jS}=\{ListEntries\text{ fails, with S as j keys}\}

(1)=j=2nPr[SCxjFjS]j=2nSCxjPr[FjS]   (2)(1)=\sum_{j=2}^n\Pr[\bigcup_{S\in C_x^j}F_{jS}]\le\sum_{j=2}^n\sum_{S\in C_x^j}\Pr[F_{jS}]\ \ \ (2)

Denote: (T1,,Tk)(C[m]j/2)k(T_1,\cdots,T_k)\in (C_{[m]}^{j/2})^k

Fj,S,T1,,Tk={LE fails for S and i:hi(S)Ti}F_{j,S,T_1,\cdots,T_k}=\{LE\text{ fails for S and }\forall_i:h_i(S)\subseteq T_i\}

(2)=j=2nSCxj(T1,,Tk)(C[m]j/2)kPr[Fj,S,T1,,Tk]   (3)(2)=\sum_{j=2}^n\sum_{S\in C_x^j}\sum_{(T_1,\cdots,T_k)\in (C_{[m]}^{j/2})^k}\Pr[F_{j,S,T_1,\cdots,T_k}]\ \ \ (3)

Pr[h1(x1)T1]=j2mPr[h1(S)T1=(j2m)jPr[i:hi(S)Ti]=(j2m)jkPr[Fj,S,T1,,Tk]Pr[i:hi(S)Ti]=(j2m)jk(3)j=2nCnj(Cmj/2)k(j2m)jkj=2n(enj)j(m2ej)jk/2(j2m)jk=j=2n(enj)j(jem2)jk/2=j=2n(e2n2m)j(je2m)j(k/21)j=2n2j(je2m)j(k/21)214(em)k2(em)k2(1/t)k2\Pr[h_1(x_1)\subseteq T_1]=\frac{j}{2m}\\ \Pr[h_1(S)\subseteq T_1=(\frac{j}{2m})^j\\ \Pr[\forall_i:h_i(S)\subseteq T_i]=(\frac{j}{2m})^{jk}\\ \Pr[F_{j,S,T_1,\cdots,T_k}]\le\Pr[\forall_i:h_i(S)\subseteq T_i]=(\frac{j}{2m})^{jk}\\ (3)\le\sum_{j=2}^n C_n^j (C_m^{j/2})^k(\frac{j}{2m})^{jk}\le\sum_{j=2}^n(\frac{en}{j})^j(\frac{m2e}{j})^{jk/2}(\frac{j}{2m})^{jk}\\ =\sum_{j=2}^n(\frac{en}{j})^j(\frac{je}{m2})^{jk/2}=\sum_{j=2}^n(\frac{e^2n}{2m})^j(\frac{je}{2m})^{j(k/2-1)}\\ \le\sum_{j=2}^n2^{-j}(\frac{je}{2m})^{j(k/2-1)}\le2\cdot\frac{1}{4}\cdot(\frac{e}{m})^{k-2}\le(\frac{e}{m})^{k-2}\le(1/t)^{k-2}

☠️☠️☠️