Kd-Trees

Problem set

given a set PP of nn points in R2\mathbb R^2

store PP in a data structure s.t. given a query rectangle RR, we can find the points in RR efficiently.

idea

  • generalize BST to R2\mathbb R^2
  • every node vv corresponds to a rectangular region SvS_v; the points in the subtree of vv lie in SvS_v.

Build Kd-Tree

BuildKDTree(P,dP,d)

  1. if P={p}P=\{p\} then return a leaf node representing pp.

  2. if dd is even then

    partition PP into P1,P2P_1,P_2 using a vertical line through the point vv with median x-coordinate

  3. else

    partition PP into P1,P2P_1,P_2 using a horizontal line through the point vv with median y-coordinate

  4. ll\leftarrow BuildKDTree(P1,d+1P_1,d+1)

  5. rr\leftarrow BuildKDTree(P2,d+1P_2,d+1)

  6. return a tree with root vv and left subtree ll and right subtree rr.

Analysis

running time: O(nlogn)O(n\log n)

size: O(n)O(n)

Search in Kd-Tree

SearchKDTree(v,Rv,R)

  1. if vv is a leaf then report the point pp stored at vv if pRp\in R.

  2. else

    ​ if Slc(v)RS_{lc(v)}\sube R then ReportSubstree(lc(v)lc(v))

    ​ else if Slc(v)S_{lc(v)} intersects RR then SearchKDTree(lc(v),Rlc(v),R)

    ​ if Src(v)RS_{rc(v)}\sube R then ReportSubstree(rc(v)rc(v))

    ​ else if Src(v)S_{rc(v)} intersects RR then SearchKDTree(rc(v),Rrc(v),R)

Analysis

query time is proportional to number of regions in SvS_v intersected by RR.

rectangles visited by ReportSubtree produce output, so their cost k\le k

We bound number of regions Q(n)Q(n) intersected by a vertical line:

assume vv corresponds to a vertical splitting time, qq in one region of lc(v)lc(v) and rc(v)rc(v), this suggests Q(n)=1+Q(n2)Q(n)=1+Q(\frac{n}{2})

Q(n)={O(1),if n=12+2Q(n4),otherwiseQ(n)=\begin{cases} O(1),\text{if }n=1\\ 2+2Q(\frac{n}{4}),\text{otherwise} \end{cases}

4 regions after two steps. any vertical or horizontal lines, intersects 2 regions

Qt(n)=O(n)Qt(n)=O(\sqrt n)

Theorem

a Kd-Tree uses O(n)O(n) space, can be built in O(nlogn)O(n\log n) time, and can report all points in a query rectangle RR in O(n+k)O(\sqrt n+k) time.

Windowing Queries

Problem set

given a set S\mathcal S of nn disjoint line segments in the plane.

store S\mathcal S in data structure s.t. given a query rectangle RR, we can find the segments in S\mathcal S intersecting RR efficiently.

The segments that intersect RR

  1. have an endpoint in RR

    find them using a range query with RR on the set of end points

  2. intersect the boundary of RR

**先考虑一种退化的情况:**what if the disjoint lines are orthogonal?

store S\mathcal S in a data structure s.t. given a vertical query segment qq, we can find the segments in S\mathcal S interesting qq efficiently.

Interval Stabbing Queries

问题:一些平行的线段和一条直线相交的查询

given a set S\mathcal S of nn intervals in R1\mathbb R^1

store S\mathcal S in a data structure s.t. given a query value qq, we can find the intervals in S\mathcal S intersecting qq efficiently.

we store S\mathcal S in a segment tree (aka interval tree) T\mathcal T

线段树,太典了

T\mathcal T is balanced BST on the end points

the root of the tree vv stores the intervals l(v)l(v) that contain vv.

the left subtree ll of vv stores the intervals that lie completely left of vv.

the right subtree rr of vv stores the intervals that lie completely right of vv.

store these intervals twice:

  1. sorted on increasing left endpoint
  2. sorted on decreasing right end point

Pseudo code

Query(q,T)(q,T)

  • if qq is left of vv then

    report intervals from l(v)l(v) using the list of left-end points, stop at the first interval right of qq.

    Query(q,l)(q,l)

  • else if qq is right of vv then

    report intervals from r(v)r(v) using the list of right-end points, stop at the first interval left of qq.

    Query(q,r)(q,r)

Analysis

space usage: O(n)O(n)

query time: O(logn+k)O(\log n+k), kk is numbers of intervals reported

preprocessing time: O(nlogn)O(n\log n)

Segment stabbing queries

问题:一些平行的线段和一条线段相交的查询

相当于两个维度上的覆盖问题

space usage: O(n)O(n)

query time: O(log2n+k)O(\log^2 n+k), kk is numbers of intervals reported

preprocessing time: O(nlogn)O(n\log n)

Unparalleled stabbing queries

再进一步,如果给出的线段不平行呢?(问题进一步变成:给出一些不平行的线段和一条直线,求和这条直线相交的线段个数)

那么线段树 + 优先搜索树方法就行不通了。

split the problem into elementary intervals in which a vertical line intersects the same segments

storing all segments segments in all elementary intervals uses Θ(n2)\Theta(n^2) space.

再将这些 elementary segments 投影到一个方向上,变成平行的。Project the segments onto the x-axis, yielding intervals. we build a different data structure for interval stabbing.

问题又转成了 interval stabbing

Store the elementary intervals as leaves in a balanced BST T\mathcal T.

Every node vv corresponds to an interval lvl_v, which is the union of the elementary intervals stored in its subtree.

store a canonical subset S(v)SS(v)\sube S of intervals s.t. sSs\in S if and only if lvsl_v\sube s but parent(v)l⊈sparent(v)_l\not\sube s.

这里有点 tricky,为了尽量少的节点存下这些 element intervals,尽量往上面的祖先节点存,aka 节点存储的是包含 element intervals 的最大正规集 maximal canonical set。这样复杂度就是 logn\log n 级别的了(对于每根线段,有 O(logn)O(\log n) 个线段树节点存储)

T\mathcal T is a segment tree.

query: find all nodes vv s.t. qlvq\in l_v, and for each such node report all intervals in S(v)S(v).

query time: O(logn+k)O(\log n+k) where kk is the output size.

space: every interval is stored O(logn)O(\log n) times, at most twice per level. O(nlogn)O(n\log n) in total.

how do we build T\mathcal T?

build a BST on the elementary intervals, insert the intervals in sSs\in S one by one.

to insert ss we visit at most 4 nodes per level.

preprocessing time: O(nlogn)O(n\log n)

把查询的直线换成线段呢?

space O(nlogn)O(n\log n)

query O(log2n+k)O(\log^2n+k)

preprocessing time O(nlogn)O(n\log n)

Come back to window queries

the segment that intersect RR

  1. have an endpoint in RR

    find them using a range query with RR on the set of end points

    (O(log2n+k)O(\log^2n+k) query, O(nlogn)O(n\log n) space)

  2. intersect the boundary of RR

    find them using a segment tree

    (O(log2n+k)O(\log^2n+k) query, nlognn\log n space)

Theorem

we can solve windowing queries in O(log2n+k)O(\log^2n+k) time, using O(nlogn)O(n\log n) space after O(nlogn)O(n\log n) preprocessing time.

这个 log2n\log^2n 可以优化成 logn\log n???