Algebraic Structure

GF(pm)GF(p^m): Galois field of order pmp^m, pp is prime.

伽罗瓦域是一个包含加法和乘法的代数结构,满足加法和乘法的交换律、结合律、分配律,每个元素都有逆元。

Zn\Z_n: Integers modulo n1n\ge1.

Mm,n(R)M_{m,n}(R): mm by nn matrices over a ring RR.

这个玩意是一个矩阵,只是其中每一个元素都来自环。乘法通常不满足交换律

Mn(R)M_n(R): Mn,n(R)M_{n,n}(R).

The Discrete Fourier Transform

数 字 信 号 处 理

The key to fast multiplication of integers and polynomials is the discrete Fourier transform.

Roots of Unity

e2πni=cos2πn+isin2πni=1e^{\frac{2\pi}{n}i}=\cos\frac{2\pi}{n}+i\sin\frac{2\pi}{n}\\ i=\sqrt{-1}

A complex number \alpha\in\C is a nn-th root of unity if αn=1\alpha^n=1.

It is a primitive nn-th root of unity if αm1\alpha^m\neq1 for all m=1,,n1m=1,\cdots,n-1.

There are exactly φ(n)\varphi(n) primitive nn-th roots of unity where φ(n)\varphi(n) is the number of positive integers less than or equal to nn that are relatively prime to nn.

🤔Lemma of cancellation property. Let ω\omega be any primitive nn-th root of unity.

j=1n1ωjs={0 if s≢0modnn if s0modn\sum_{j=1}^{n-1}\omega^{js}=\begin{cases} 0\text{ if }s\not\equiv0\mod n\\ n\text{ if }s\equiv0\mod n \end{cases}

  • If s0modns\equiv0\mod n, ωjs=1\omega^{js}=1. Thus the lemma holds.

  • If s≢0modns\not\equiv0\mod n, xn1=(x1)j=0n1xjx^n-1=(x-1)\sum_{j=0}^{n-1}x^j.

    Substituting x=ωsx=\omega^s makes LHS 00. Thus RHS equals 00, since ωs1\omega^s\neq1, the lemma holds.

Let F(ω)=Fn(ω)F(\omega)=F_n(\omega) denote the matrix

[11111ωω2ωn11ω2ω4ω2(n1)1ωn1ω2(n1)ω(n1)2]\left[ \begin{array}{ccccc} 1&1&1&\cdots&1\\ 1&\omega&\omega^2&\cdots&\omega^{n-1}\\ 1&\omega^2&\omega^4&\cdots&\omega^{2(n-1)}\\ \vdots&&&&\vdots\\ 1&\omega^{n-1}&\omega^{2(n-1)}&\cdots&\omega^{(n-1)^2} \end{array} \right]

📖Definition of DFT and its inverse. Let \mathbf a=(a_0,\cdots,a_{n-1})^\top\in\C^n. The discret Fourier transform of a\mathbf a is

DFTn(a)=A=(A0,,An1)Ai=j=0n1ajωiji=0,,n1DFT_n(\mathbf a)=\mathbf A=(A_0,\cdots,A_{n-1})^\top\\ A_i=\sum_{j=0}^{n-1}a_j\omega^{ij}\forall i=0,\cdots,n-1

i.e.,

DFTn(a)=1n[11111ω1ω2ωn+11ωn+1ω2(n1)ω(n1)2][A0A1An1].DFT_n(\mathbf a)= \frac{1}{n} \cdot \left[ \begin{array}{ccccc} 1&1&1&\cdots&1\\ 1&\omega^{-1}&\omega^{-2}&\cdots&\omega^{-n+1}\\ \vdots&&&&\vdots\\ 1&\omega^{-n+1}&\omega^{-2(n-1)}&\cdots&\omega^{-(n-1)^2} \end{array} \right] \cdot \left[ \begin{array}{c} A_0\\A_1\\ \vdots\\ A_{n-1} \end{array} \right].

The inverse discrete Fourier transform of A=(A0,,An1)\mathbf A=(A_0,\cdots,A_{n-1})^\top is DFTn1(A)=1nF(ω1)ADFT_n^{-1}(\mathbf A)=\frac{1}{n}F(\omega^{-1})\cdot \mathbf A.

DFTn1(a)=F(ω)[a0a1an1]=[A0A1An1].DFT_n^{-1}(\mathbf a)=F(\omega)\cdot\left[ \begin{array}{c} a_0\\a_1\\ \vdots\\ a_{n-1} \end{array} \right] =\left[ \begin{array}{c} A_0\\A_1\\ \vdots\\ A_{n-1} \end{array} \right].

🤔Lemma of reverse multiplication. F(ω1)F(ω)=F(ω)F(ω1)=nInF(\omega^{-1})\cdot F(\omega)=F(\omega)\cdot F(\omega^{-1})=nI_n where InI_n is the identity matrix.

Proof by Lemma of cancellation property.

Connection to polynomial evaluation and interpolation. Let a\mathbf a be the coefficient vector of the polynomial P(X)=i=1n1aiXiP(X)=\sum_{i=1}^{n-1}a_iX^i. Then computing DFT(a)DFT(\mathbf a) amounts to evaluating the polynomial P(X)P(X) at all the nn-th roots of unity, at

X=1,ω,ω2,,ωn1.X=1,\omega,\omega^2,\cdots,\omega^{n-1}.

Similarly, computing DFT1(A)DFT^{-1}(\mathbf A) amounts to recovering the polynomial P(X)P(X) from its values (A0,,An1)(A_0,\cdots,A_{n-1}) at the same nn points.

The Fast Fourier Transform

Naive algorithm for DFTDFT and DFT1DFT^{-1}: O(n2)O(n^2).

Fast Fourier Transform: O(nlogn)O(n\log n).

FFT algorithm: divide-and-conquer.

Let nn be a power of 22. Computing DFT(a)DFT(\mathbf a) amounts to computing the nn values

P(1),P(ω),,P(ωn1).P(1),P(\omega),\cdots,P(\omega^{n-1}).

Actually, P(X)P(X) has even part and odd part.

P(X)=Pe(X2)+XPo(X2).P(X)=P_e(X^2)+X\cdot P_o(X^2).

Where PeP_e and PoP_o are polynomials of degrees at most n/2n/2 and (n1)/2(n-1)/2 respectively.

Pseudo code of FFT

Input: A polynomial P(X)P(X) with coefficients given by an nn-vector a\mathbf a, ω\omega, a primitive nn-th root of unity.

Output: DFTn(a)DFT_n(\mathbf a)

  1. Evaluate Pe(X2)P_e(X^2) and Po(X2)P_o(X^2) at X2=1,ω2,ω4,,ω2n2X^2=1,\omega^2,\omega^4,\cdots,\omega^{2n-2}.
  2. Multiply Po(ω2j)P_o(\omega^{2j}) by ωj\omega^j for j=0,,n1j=0,\cdots,n-1.
  3. Add Pe(ω2j)P_e(\omega^{2j}) to ωjPo(ω2j)\omega^jP_o(\omega^{2j}) for j=0,,n1j=0,\cdots,n-1.

Analysis of FFT:

  • In step 1, since ωn=1,ωn+2=ω2,,ω2n2=ωn2\omega^n=1,\omega^{n+2}=\omega^2,\cdots,\omega^{2n-2}=\omega^{n-2}. So it suffices to evaluate PeP_e and PoP_o at only n/2n/2 values.

    i.e., at all the (n/2)(n/2)-th roots of unity. This is equivalent to the problem of computing DFTn/2(Pe)DFT_{n/2}(P_e) and DFTn/2(Po)DFT_{n/2}(P_o).

  • Step 2 and 3 take nn multiplications and nn additions respectively.

Thus, T(n)=2T(n/2)+2nT(n)=2nlognT(n)=2T(n/2)+2n\Rarr T(n)=2n\log n for nn is a power of 22.

🎯Theorem of FFT’s complexity. Assuming the availability of a primitive nn-th root of unity, the discrete Fourier transform DFTnDFT_n and its inverse can be computed in O(nlogn)O(n\log n) complex arithmetic operations.

Polynomial Multiplication

Convolution and polynomial multiplication

Assume n2n\ge2. The convolution of two nn-vectors a,b\mathbf a,\mathbf b is the nn-vector

c=ab=(c0,,cn1)\mathbf c=\mathbf a*\mathbf b=(c_0,\cdots,c_{n-1})^\top

where ci=j=0iajbijc_i=\sum_{j=0}^ia_jb_{i-j}.

Let P(X),Q(X)P(X),Q(X) be polynomials of degrees less than n/2n/2. Then R(X)=P(X)Q(X)R(X)=P(X)Q(X) is a polynomial of degree less than n1n-1.

Let a\mathbf a and b\mathbf b denote the coefficient vectors of PP and QQ. Then it is not hard to see ab\mathbf a*\mathbf b gives the coefficient vector of R(X)R(X).

Thus convolution is essentially polynomial multiplication.

🎯Theorem of convolution. Let a,b\mathbf a,\mathbf b be nn-vectors whose initial n/2\lfloor n/2\rfloor entries are zeros. Then

DFT1(DFT(a)DFT(b))=ab.DFT^{-1}(DFT(\mathbf a)*DFT(\mathbf b))=\mathbf a*\mathbf b.

This reduces the problem of convolution to two DFTDFT and DFT1DFT^{-1} computations.

🎯Theorem of polynomial computation’s complexity. Assuming the availability of a primitive nn-th root of unity, we can compute the product PQPQ of two polynomials P,Q\in\C[X] of degree less than nn in O(nlogn)O(n\log n) complex operations.

Modular FFT

Define ZM\Z_M where M=2L+1M=2^L+1.

An element xZMx\in\Z_M is a zero-divisor if there exist y0y\neq0 s.t. xy=0x\cdot y=0.

The inverse element of xx is yy s.t. xy=1xy=1.

🤔An element xZMx\in\Z_M has a multiplicative inverse x1x^{-1} iff xx is not a zero-divisor.

Representation and basic operations modulo M

Let 2L1modM2^L\equiv-1\mod M be denoted with special symbol 1ˉ\bar 1. Each element of ZM{1ˉ}\Z_M\diagdown\{\bar 1\} as binary string (bL1,,b0)(b_{L-1},\cdots,b_0) of length LL; the element 1ˉ\bar1 is given a special representation.

For example, M=17,L=4M=17,L=4. Then 1313 is represented by (1,1,0,1)(1,1,0,1).

It’s easy to add and subtract in ZM\Z_M under this representation in O(L)O(L) time.

Multiplying XX by 2j2^j amounts to left-shifting the string XX by jj positions.

If (bL1,,b0)(b_{L-1},\cdots,b_0) is multiplied by 2j(0<j<L)2^j(0\lt j\lt L), the results is given as

(bLj1,,b0,0,,0)(0,,0,bL1,,bLj)(b_{L-j-1},\cdots,b_0,0,\cdots,0)-(0,\cdots,0,b_{L-1},\cdots,b_{L-j})

Primitive roots of unity modulo M

Let K=2kK=2^k and KK divides LL. We define ω=2L/K\omega=2^{L/K}.

🤔Lemma: In ZM\Z_M, ω\omega is a primitive (2K)(2K)-th root of unity.

🤔The cancellation property holds:

j=02K1ωjs{0modM if s≢0mod2K2KmodM if s0mod2k\sum_{j=0}^{2K-1}\omega^{js}\equiv \begin{cases} 0\mod M\text{ if }s\not\equiv0\mod 2K\\ 2K\mod M\text{ if }s\equiv0\mod 2k \end{cases}

🎯Theorem. The transforms DFT2K(a)DFT_{2K}(\mathbf a) and DFT2K1(A)DFT^{-1}_{2K}(\mathbf A) for (2K)(2K)-vectors a,A(ZM)2K\mathbf a,\mathbf A\in(\Z_M)^{2K} can be computed using the FFT method, taking O(KLlogK)O(KL\log K) bit operations.

T(2K)=2T(K)+O(KL)T(2K)=2T(K)+O(KL)

Fast Integer Multiplication

🎯Theorem of integer multiplication’s complexity. Given two integers u,vu,v of sizes at most nn bits, we can form their product uvuv in O(nlognloglogn)O(n\log n\log\log n) bit operations.

Here we prove a weaker bound nlog2.6nn\log^{2.6}n.

A simplified Schönhage-Strassen algorithm

Goal: Product W=UVW=UV. Assume that U,VU,V are NN-bit binary numbers where N=2nN=2^n.

Choose K=2k,L=32K=2^k,L=3\cdot2^\ell where

k=n2,=nk.k=\lfloor\frac{n}{2}\rfloor,\ell=\lceil n-k\rceil.

k,k,\ell are integers, nn is not necessarily be integer.

k+nk+\ell\ge n, we may view UU as 2k+2^{k+\ell}-bit numbers, padding with zeros. Break up UU into KK pieces, each of bit size 22^\ell.

Similar padding to VV, we have (2K)(2K)-vector:

Uˉ=(0,,0,UK1,,U0)Vˉ=(0,,0,VK1,,V0)\bar U=(0,\cdots,0,U_{K-1},\cdots,U_0)\\ \bar V=(0,\cdots,0,V_{K-1},\cdots,V_0)

where each component has 22^\ell bits.

We can regard Uˉ,Vˉ\bar U,\bar V as coefficient vectors of the polynomials P(X)=j=0K1UjXjP(X)=\sum_{j=0}^{K-1}U_jX^j and Q(X)=j=0K1VjXjQ(X)=\sum_{j=0}^{K-1}V_jX_j. Let

Wˉ=(W2K1,,W0)\bar W=(W_{2K-1},\cdots,W_0)

be the convolution of Uˉ,Vˉ\bar U,\bar V. Each WiW_i in Wˉ\bar W satisfies the inequality

0WiK2220\le W_i\le K\cdot2^{2\cdot2^\ell}

since it is the sum of at most KK products of the form UjVijU_jV_{i-j}. Hence

0Wi<232M0\le W_i\lt 2^{3\cdot2^\ell}\le M

where M=2L+1M=2^L+1 as usual.

Wˉ\bar W is the coefficient vector of the product R(X)=P(X)Q(X)R(X)=P(X)Q(X). Since P(22)=UP(2^{2^\ell})=U and Q(22)=VQ(2^{2^\ell})=V, it follows that R(22)=UV=WR(2^{2^\ell})=UV=W. Hence

W=j=02K122jWj.W=\sum_{j=0}^{2K-1}2^{2^\ell j}W_j.

We can easily obtain each summation in this sum from Wˉ\bar W by multiplying each WjW_j with 22j2^{2^\ell j}. As each WjW_j has k+22<Lk+2\cdot2^\ell\lt L non-zero bits.

Each bit of WW is obtained by summing at most 33 bits plus at most 22 carry bits.

🤔Lemma: The product WW can be obtained from Wˉ\bar W in O(N)O(N) bit operations.

Wˉ=DFT1(DFT(Uˉ)DFT(Vˉ))\bar W=DFT^{-1}(DFT(\bar U)\cdot DFT(\bar V)).

These transforms take O(KLlogK)=O(nlogn)O(KL\log K)=O(n\log n) bit operations.

The inner product requires 2K2K multiplications of LL-bit numbers, which is recursively accomplished. Let T(N)T(N) denote the bit complexity of algorithm:

T(N)=O(NlogN)+2KT(L).T(N)=O(N\log N)+2K\cdot T(L).

Write t(n)=T(N)/Nt(n)=T(N)/N where N=2nN=2^n. The recurrence becomes

t(n)=O(n)+2KNT(L)=O(n)+23LT(L)=O(n)+6t(n/2+c),t(n)=O(n)+2\frac{K}{N}T(L)=O(n)+2\frac{3}{L}T(L)=O(n)+6t(n/2+c),

for some constant cc. Define s(n)=t(n+2c)s(n)=t(n+2c). Then

s(n)=O(n+2c)+6t((n/2)+2c)=O(n)+6s(n/2).s(n)=O(n+2c)+6t((n/2)+2c)=O(n)+6s(n/2).

This resolves to s(n)=O(nlg6)s(n)=O(n^{\lg 6}). Back-substituting, we obtain

T(N)=O(NlogαN),α=lg6<2.5848T(N)=O(N\log^\alpha N),\alpha=\lg6\lt2.5848

Refinements

The choice of L=32L=3\cdot2^\ell is sub-optimal.

The method implies

T(N)=O(Nlog2+ϵN)T(N)=O(N\log^{2+\epsilon}N)

for any ϵ>0\epsilon\gt0.

Improvement: compute each WiW_i in two parts:

  • Let M=222+1M'=2^{2\cdot2^\ell}+1 and M=KM''=K.
  • Since M,MM',M'' are relatively prime and Wi<MMW_i\lt M' M'', it follows that if we have computed Wi=WimodMW_i'=W_i\mod M' and Wi=WimodMW_i''=W_i\mod M'', then we can recover WiW_i using the Chinese remainder theorem.
  • Computing all the WiW_i'''s and the reconstruction of WiW_i from Wi,WiW_i',W_i'' can be accomplished in linear time.

Thus the new recurrence is

t(n)=n+4t(n/2),t(n)=n+4t(n/2),

which has the solution t(n)=O(n2)t(n)=O(n^2) or T(N)=O(Nlog2N)T(N)=O(N\log^2N).

In order to get ultimate result, we need improve recurrence t(n)=n+2t(n/2)t(n)=n+2t(n/2) by using a variant convolution called negative wrapped convolution and DFTKDFT_K instead of DFT2KDFT_{2K}.