计算几何 W37 Sorting and Searching in Parallel
阅读材料
prefix sums and their application
by Guy E. Blelloch, CMU
parallel tasks: building blocks, and merge.
dynamic programming & divide and conquer
extended from sequential algorithms
all-prefix-sums operation on PRAM
intro
prefix sum operation has many extension: quick-sort, lexical analysis, tree operations…
primitive instruction in some machine(原子操作)
define: scan operation is an array all-prefix-sums operation.
implementation
array length n, EREW PRAM, time complexity: O(n/p+log p)
binary tree, for each depth: each father node first passes its left child value to right child.
if n>p, 采用分块的方式, time complexity: O(n/p+log p)
up sweep: sum[v]=sum[L[v]]+sum[R[v]]
down sweep: prescan[L[v]]=prescan[v], prescan[R[v]]=sum[L[v]]+prescan[v]
after completing down sweep, each vertex of the tree contains the sum of all the leaf value that precede it.
applications
line-of-sight and radix-sort
recurrence equations 递归方程的计算
segmented scans
allocation processors
课堂笔记
More application of prefix sums
- minimum
- broadcast
- partition
Sorting
Quicksort
O(n) work, O(log n) depth, EREW PRAM
randomized, sub-optimal depth, optimal work
Merge Sort
1 | procedure BASICMS(I,S) |
- O(n) depth
- optimal work O(n log n)
- Theoretically bad
- probably lower constants compared to fully parallel
- with p processor we get
对 merge 操作并行化
假设这里有 2 个排好序的数组 A B,合并为一个数组
假设数组 B 中的一个数字 bi,它在合并后的数组的位置为 i+rank(B[i],A),rank 用二分查找找到。
1 | procedure SEGMERGE(A,B,M,p) |
asymmetric merge
- break large trunk into small piece
so in the parallel merge sort we replace sequentially merge into SEGMERGE.
Searching
input: A sorted array with n elements
query: a value v
output: the predecessor of v
The brutal force:
1 | procedure BF(A[1...n],v) |
how to do parallel?
In CREW
parallel based on brutal force method:
1 | procedure PARALLEL-BF(A[1...n],l,h,v) |
In EREW
divide into p subsets of size n/p
Binary search in parallel
- both results optimal
- CREW provably stronger than EREW
- for a natural problem
Lower Bounds of PRAM
basic of lower bounds
goal:
- showing that a problem is difficult
- worst-case lower bound
- existence of at least one difficult input
CREW Lower bounds
OR Problem
n boolean values 0 or 1
output: boolean or of the values
O(1) solution in common CRCW
a lower bound will also separate the weakest CRCW from CREW
- CREW provably weaker than CRCW on a natural problem
seems impossible, even if there is at most one value of 1
Building lower bounds
assuming
- processors have infinite registers
- processors have infinite computational power
- processors can compute arbitrary functions of their register
a computational model for lower bounds
from PRAM to circuits
- we have a depth i circuit
- magic CPU gates
- arbitrary fan-out
- 2 fan-in
Separations
deep connections between complexity theory, circuits and PRAM algorithms
switching lemma from complexity theory
XOR problem: compute the XOR of n bits
- in sum-CRCW PRAM
- Lower bound in priority/common-CRCW PRAM
OR problem
- time in common-CRCW
- in CREW
Broadcast: send v to all processors
- time in CREW
- in EREW
Searching: given n numbers in increasing order find predecessor of v
- in CREW
- in EREW
A Searching Lower bound in EREW
searching: given n numbers in increasing order, find predecessor of 0 in 000…000111…111
proof idea:
- every change should change the output
- every change should change some memory locations
- every change should affect some processor
- at least one processor affected by changes
- lower bound
tight!
- divide into disjoint sub-arrays
- Binary search in each
Complexity and Related Areas
PRAM models have close ties to Complexity theory.
- CRCW PRAM simulated using arbitrary fan out circuits.
- CRCW can compute arbitrary boolean functions in time (potentially exponentially number of processors)