The key to fast multiplication of integers and polynomials is the discrete Fourier transform.
Roots of Unity
en2πi=cosn2π+isinn2πi=−1
A complex number \alpha\in\C is a n-th root of unity if αn=1.
It is a primitive n-th root of unity if αm=1 for all m=1,⋯,n−1.
There are exactly φ(n) primitive n-th roots of unity where φ(n) is the number of positive integers less than or equal to n that are relatively prime to n.
🤔Lemma of cancellation property. Let ω be any primitive n-th root of unity.
j=1∑n−1ωjs={0 if s≡0modnn if s≡0modn
If s≡0modn, ωjs=1. Thus the lemma holds.
If s≡0modn, xn−1=(x−1)∑j=0n−1xj.
Substituting x=ωs makes LHS 0. Thus RHS equals 0, since ωs=1, the lemma holds.
🤔Lemma of reverse multiplication. F(ω−1)⋅F(ω)=F(ω)⋅F(ω−1)=nIn where In is the identity matrix.
Proof by Lemma of cancellation property.
Connection to polynomial evaluation and interpolation. Let a be the coefficient vector of the polynomial P(X)=∑i=1n−1aiXi. Then computing DFT(a) amounts to evaluating the polynomial P(X) at all the n-th roots of unity, at
X=1,ω,ω2,⋯,ωn−1.
Similarly, computing DFT−1(A) amounts to recovering the polynomial P(X) from its values (A0,⋯,An−1) at the same n points.
The Fast Fourier Transform
Naive algorithm for DFT and DFT−1: O(n2).
Fast Fourier Transform: O(nlogn).
FFT algorithm: divide-and-conquer.
Let n be a power of 2. Computing DFT(a) amounts to computing the n values
P(1),P(ω),⋯,P(ωn−1).
Actually, P(X) has even part and odd part.
P(X)=Pe(X2)+X⋅Po(X2).
Where Pe and Po are polynomials of degrees at most n/2 and (n−1)/2 respectively.
Pseudo code of FFT
Input: A polynomial P(X) with coefficients given by an n-vector a, ω, a primitive n-th root of unity.
Output: DFTn(a)
Evaluate Pe(X2) and Po(X2) at X2=1,ω2,ω4,⋯,ω2n−2.
Multiply Po(ω2j) by ωj for j=0,⋯,n−1.
Add Pe(ω2j) to ωjPo(ω2j) for j=0,⋯,n−1.
Analysis of FFT:
In step 1, since ωn=1,ωn+2=ω2,⋯,ω2n−2=ωn−2. So it suffices to evaluate Pe and Po at only n/2 values.
i.e., at all the (n/2)-th roots of unity. This is equivalent to the problem of computing DFTn/2(Pe) and DFTn/2(Po).
Step 2 and 3 take n multiplications and n additions respectively.
Thus, T(n)=2T(n/2)+2n⇒T(n)=2nlogn for n is a power of 2.
🎯Theorem of FFT’s complexity. Assuming the availability of a primitive n-th root of unity, the discrete Fourier transform DFTn and its inverse can be computed in O(nlogn) complex arithmetic operations.
Polynomial Multiplication
Convolution and polynomial multiplication
Assume n≥2. The convolution of two n-vectors a,b is the n-vector
c=a∗b=(c0,⋯,cn−1)⊤
where ci=∑j=0iajbi−j.
Let P(X),Q(X) be polynomials of degrees less than n/2. Then R(X)=P(X)Q(X) is a polynomial of degree less than n−1.
Let a and b denote the coefficient vectors of P and Q. Then it is not hard to see a∗b gives the coefficient vector of R(X).
Thus convolution is essentially polynomial multiplication.
🎯Theorem of convolution. Let a,b be n-vectors whose initial ⌊n/2⌋ entries are zeros. Then
DFT−1(DFT(a)∗DFT(b))=a∗b.
This reduces the problem of convolution to two DFT and DFT−1 computations.
🎯Theorem of polynomial computation’s complexity. Assuming the availability of a primitive n-th root of unity, we can compute the product PQ of two polynomials P,Q\in\C[X] of degree less than n in O(nlogn) complex operations.
Modular FFT
Define ZM where M=2L+1.
An element x∈ZM is a zero-divisor if there exist y=0 s.t. x⋅y=0.
The inverse element of x is y s.t. xy=1.
🤔An element x∈ZM has a multiplicative inverse x−1 iff x is not a zero-divisor.
Representation and basic operations modulo M
Let 2L≡−1modM be denoted with special symbol 1ˉ. Each element of ZM╲{1ˉ} as binary string (bL−1,⋯,b0) of length L; the element 1ˉ is given a special representation.
For example, M=17,L=4. Then 13 is represented by (1,1,0,1).
It’s easy to add and subtract in ZM under this representation in O(L) time.
Multiplying X by 2j amounts to left-shifting the string X by j positions.
If (bL−1,⋯,b0) is multiplied by 2j(0<j<L), the results is given as
(bL−j−1,⋯,b0,0,⋯,0)−(0,⋯,0,bL−1,⋯,bL−j)
Primitive roots of unity modulo M
Let K=2k and K divides L. We define ω=2L/K.
🤔Lemma: In ZM, ω is a primitive (2K)-th root of unity.
🤔The cancellation property holds:
j=0∑2K−1ωjs≡{0modM if s≡0mod2K2KmodM if s≡0mod2k
🎯Theorem. The transforms DFT2K(a) and DFT2K−1(A) for (2K)-vectors a,A∈(ZM)2K can be computed using the FFT method, taking O(KLlogK) bit operations.
T(2K)=2T(K)+O(KL)
Fast Integer Multiplication
🎯Theorem of integer multiplication’s complexity. Given two integers u,v of sizes at most n bits, we can form their product uv in O(nlognloglogn) bit operations.
Here we prove a weaker bound nlog2.6n.
A simplified Schönhage-Strassen algorithm
Goal: Product W=UV. Assume that U,V are N-bit binary numbers where N=2n.
Choose K=2k,L=3⋅2ℓ where
k=⌊2n⌋,ℓ=⌈n−k⌉.
k,ℓ are integers, n is not necessarily be integer.
k+ℓ≥n, we may view U as 2k+ℓ-bit numbers, padding with zeros. Break up U into K pieces, each of bit size 2ℓ.
Similar padding to V, we have (2K)-vector:
Uˉ=(0,⋯,0,UK−1,⋯,U0)Vˉ=(0,⋯,0,VK−1,⋯,V0)
where each component has 2ℓ bits.
We can regard Uˉ,Vˉ as coefficient vectors of the polynomials P(X)=∑j=0K−1UjXj and Q(X)=∑j=0K−1VjXj. Let
Wˉ=(W2K−1,⋯,W0)
be the convolution of Uˉ,Vˉ. Each Wi in Wˉ satisfies the inequality
0≤Wi≤K⋅22⋅2ℓ
since it is the sum of at most K products of the form UjVi−j. Hence
0≤Wi<23⋅2ℓ≤M
where M=2L+1 as usual.
Wˉ is the coefficient vector of the product R(X)=P(X)Q(X). Since P(22ℓ)=U and Q(22ℓ)=V, it follows that R(22ℓ)=UV=W. Hence
W=j=0∑2K−122ℓjWj.
We can easily obtain each summation in this sum from Wˉ by multiplying each Wj with 22ℓj. As each Wj has k+2⋅2ℓ<L non-zero bits.
Each bit of W is obtained by summing at most 3 bits plus at most 2 carry bits.
🤔Lemma: The product W can be obtained from Wˉ in O(N) bit operations.
Wˉ=DFT−1(DFT(Uˉ)⋅DFT(Vˉ)).
These transforms take O(KLlogK)=O(nlogn) bit operations.
The inner product requires 2K multiplications of L-bit numbers, which is recursively accomplished. Let T(N) denote the bit complexity of algorithm:
T(N)=O(NlogN)+2K⋅T(L).
Write t(n)=T(N)/N where N=2n. The recurrence becomes
t(n)=O(n)+2NKT(L)=O(n)+2L3T(L)=O(n)+6t(n/2+c),
for some constant c. Define s(n)=t(n+2c). Then
s(n)=O(n+2c)+6t((n/2)+2c)=O(n)+6s(n/2).
This resolves to s(n)=O(nlg6). Back-substituting, we obtain
T(N)=O(NlogαN),α=lg6<2.5848
Refinements
The choice of L=3⋅2ℓ is sub-optimal.
The method implies
T(N)=O(Nlog2+ϵN)
for any ϵ>0.
Improvement: compute each Wi in two parts:
Let M′=22⋅2ℓ+1 and M′′=K.
Since M′,M′′ are relatively prime and Wi<M′M′′, it follows that if we have computed Wi′=WimodM′ and Wi′′=WimodM′′, then we can recover Wi using the Chinese remainder theorem.
Computing all the Wi′′'s and the reconstruction of Wi from Wi′,Wi′′ can be accomplished in linear time.
Thus the new recurrence is
t(n)=n+4t(n/2),
which has the solution t(n)=O(n2) or T(N)=O(Nlog2N).
In order to get ultimate result, we need improve recurrence t(n)=n+2t(n/2) by using a variant convolution called negative wrapped convolution and DFTK instead of DFT2K.