Rader's FFT
April 14, 2020
Cooley-Tukey algorithm [5] is the most common FFT algorithm. It
reduces the computational time of a $N$-point DFT from $\mathcal{O}(N^2)$
to $\mathcal{O}(N\ log\ N)$ for highly composite $N$. However, when
$N$ is prime, it cannot improve the computational time. Rader's FFT
algorithm solves this problem. It converts a $N$-point DFT to a $N-1$
cyclic convolution. It uses some very interesting number theory to do
this transformation. The cyclic convolution can then in turn be computed
very efficiently e.g. using FFT [1]. For example, Rader's algorithm
can convert a $17$-point DFT to a $16$-point cyclic convolution. The
$16$-point cyclic convolution can in turn be computed very efficiently
by using $16$-point FFT.
In this blog post, I will cover cyclic convolution as well as the number
theory used in this algorithm, but first lets look at an example. Consider a
$5$-point DFT.
\begin{align}
\begin{bmatrix}
y(0) \\
y(1) \\
y(2) \\
y(3) \\
y(4) \\
\end{bmatrix}
=
\begin{bmatrix}
1 & 1 & 1 & 1 & 1 \\
1 & \omega & \omega^2 & \omega^3 & \omega^4 \\
1 & \omega^2 & \omega^4 & \omega^6 & \omega^8 \\
1 & \omega^3 & \omega^6 & \omega^9 & \omega^{12} \\
1 & \omega^4 & \omega^8 & \omega^{12} & \omega^{16} \\
\end{bmatrix}
\begin{bmatrix}
x(0) \\
x(1) \\
x(2) \\
x(3) \\
x(4) \\
\end{bmatrix}
\end{align}
where, $\omega = e^{\frac{-2{\pi}i}{5}}$ is the $5^{th}$ root of
unity. Since, $\omega^5 = 1$, we can rewrite the above equation as,
\begin{align}
\begin{bmatrix}
y(0) \\
y(1) \\
y(2) \\
y(3) \\
y(4) \\
\end{bmatrix}
=
\begin{bmatrix}
1 & 1 & 1 & 1 & 1 \\
1 & \omega & \omega^2 & \omega^3 & \omega^4 \\
1 & \omega^2 & \omega^4 & \omega & \omega^3 \\
1 & \omega^3 & \omega & \omega^4 & \omega^2 \\
1 & \omega^4 & \omega^3 & \omega^2 & \omega \\
\end{bmatrix}
\begin{bmatrix}
x(0) \\
x(1) \\
x(2) \\
x(3) \\
x(4) \\
\end{bmatrix}
\end{align}
We can again rewrite the computation as,
\begin{align}
y(0) &= \sum_{i=0}^{4} x(i) \\
\\
\begin{bmatrix}
y(1) \\
y(2) \\
y(3) \\
y(4) \\
\end{bmatrix}
&=
\begin{bmatrix}
\omega & \omega^2 & \omega^3 & \omega^4 \\
\omega^2 & \omega^4 & \omega & \omega^3 \\
\omega^3 & \omega & \omega^4 & \omega^2 \\
\omega^4 & \omega^3 & \omega^2 & \omega \\
\end{bmatrix}
\begin{bmatrix}
x(1) \\
x(2) \\
x(3) \\
x(4) \\
\end{bmatrix}
+
\begin{bmatrix}
x(0) \\
x(0) \\
x(0) \\
x(0) \\
\end{bmatrix}
\end{align}
In the next two steps, we will transform the matrix on the right into a
circulant matrix [1] i.e. a matrix where every row is the same as
the previous row, rotated to the right by 1. Multiplication of a circulant
matrix by a vector is equivalent to a circular convolution [1].
The transformations are extremely simple but ingenious and are the
crux of the algorithm. Both transformations involve multiplying with
a permutation matrix [6]. Multiplying a matrix, $A$, with a
permutation matrix, $P$, simply permutes the rows (when pre-multiplying
i.e. $PA$) or columns (when post-multiplying i.e. $AP$) of the given
matrix, $A$. The design of the permutation matrix requires some number
theory which we will discuss later.
Step 1
Let $P_1$ be a permutation matrix given by,
\begin{align}
P_1 &=
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\end{align}
Premultiplying both sides of the above equation with $P_1$, we get
\begin{align}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
y(1) \\
y(2) \\
y(3) \\
y(4) \\
\end{bmatrix}
&=
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
\omega & \omega^2 & \omega^3 & \omega^4 \\
\omega^2 & \omega^4 & \omega & \omega^3 \\
\omega^3 & \omega & \omega^4 & \omega^2 \\
\omega^4 & \omega^3 & \omega^2 & \omega \\
\end{bmatrix}
\begin{bmatrix}
x(1) \\
x(2) \\
x(3) \\
x(4) \\
\end{bmatrix}
+
\begin{bmatrix}
x(0) \\
x(0) \\
x(0) \\
x(0) \\
\end{bmatrix}
\\
\begin{bmatrix}
y(1) \\
y(2) \\
y(4) \\
y(3) \\
\end{bmatrix}
&=
\begin{bmatrix}
\omega & \omega^2 & \omega^3 & \omega^4 \\
\omega^2 & \omega^4 & \omega & \omega^3 \\
\omega^4 & \omega^3 & \omega^2 & \omega \\
\omega^3 & \omega & \omega^4 & \omega^2 \\
\end{bmatrix}
\begin{bmatrix}
x(1) \\
x(2) \\
x(3) \\
x(4) \\
\end{bmatrix}
+
\begin{bmatrix}
x(0) \\
x(0) \\
x(0) \\
x(0) \\
\end{bmatrix}
\end{align}
Step 2
Let $P_2$ be a permutation matrix given by,
\begin{align}
P_2 &=
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\end{align}
Note that,
\begin{align}
\begin{bmatrix}
x(1) \\
x(2) \\
x(3) \\
x(4) \\
\end{bmatrix}
=
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
x(1) \\
x(3) \\
x(4) \\
x(2) \\
\end{bmatrix}
=
P_2
\begin{bmatrix}
x(1) \\
x(3) \\
x(4) \\
x(2) \\
\end{bmatrix}
\end{align}
By substituting, we get
\begin{align}
\begin{bmatrix}
y(1) \\
y(2) \\
y(4) \\
y(3) \\
\end{bmatrix}
&=
\begin{bmatrix}
\omega & \omega^2 & \omega^3 & \omega^4 \\
\omega^2 & \omega^4 & \omega & \omega^3 \\
\omega^4 & \omega^3 & \omega^2 & \omega \\
\omega^3 & \omega & \omega^4 & \omega^2 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
x(1) \\
x(3) \\
x(4) \\
x(2) \\
\end{bmatrix}
+
\begin{bmatrix}
x(0) \\
x(0) \\
x(0) \\
x(0) \\
\end{bmatrix} \\
&=
\begin{bmatrix}
\omega & \omega^3 & \omega^4 & \omega^2 \\
\omega^2 & \omega & \omega^3 & \omega^4 \\
\omega^4 & \omega^2 & \omega & \omega^3 \\
\omega^3 & \omega^4 & \omega^2 & \omega \\
\end{bmatrix}
\begin{bmatrix}
x(1) \\
x(3) \\
x(4) \\
x(2) \\
\end{bmatrix}
+
\begin{bmatrix}
x(0) \\
x(0) \\
x(0) \\
x(0) \\
\end{bmatrix}
\end{align}
In summary, we did the following transformation, to convert the matrix on the right into a circulant matrix on the left.
\begin{align}
\begin{bmatrix}
\omega & \omega^3 & \omega^4 & \omega^2 \\
\omega^2 & \omega & \omega^3 & \omega^4 \\
\omega^4 & \omega^2 & \omega & \omega^3 \\
\omega^3 & \omega^4 & \omega^2 & \omega \\
\end{bmatrix}
&=
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
\omega & \omega^2 & \omega^3 & \omega^4 \\
\omega^2 & \omega^4 & \omega & \omega^3 \\
\omega^3 & \omega & \omega^4 & \omega^2 \\
\omega^4 & \omega^3 & \omega^2 & \omega \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix} \\
&=
P_1
\begin{bmatrix}
\omega & \omega^2 & \omega^3 & \omega^4 \\
\omega^2 & \omega^4 & \omega & \omega^3 \\
\omega^3 & \omega & \omega^4 & \omega^2 \\
\omega^4 & \omega^3 & \omega^2 & \omega \\
\end{bmatrix}
P_2
\end{align}
Now, we will answer the following two questions:
- How did we design the transformations (permutation matrices $P_1$ and $P_2$)?
- How will transforming the computation to a cyclic convolution help?
The answer to the first question comes from number theory (Primitive
Root). The answer to the second question comes from linear algebra
(Eigen Decomposition).
Primitive Root [4]
If $p$ is prime, and $x$ is a natural number less than $p$, i.e.
$x \in [1,p-1]$ then Fermat's Little Theorem tells us that
$$
x^{p-1} \equiv 1\ \text{mod}\ p
$$
For example, if $p = 5$, then
\begin{align}
1^4 &= 1 &\equiv 1\ \text{mod}\ 5 \\
2^4 &= 16 &\equiv 1\ \text{mod}\ 5 \\
3^4 &= 81 &\equiv 1\ \text{mod}\ 5 \\
4^4 &= 256 &\equiv 1\ \text{mod}\ 5 \\
\end{align}
Now, it turns out that we can find an integer $b \in [1,p-1]$ such that
$b^i \not\equiv 1\ \text{mod}\ p$, for $i \in [1,p-2]$. Such an integer
$b$, is called a primitive root of $p$.
\begin{array}{l|l|l}
\hline
\hline
& b = 2 & b = 3 & b = 4 \\
\hline
i = 0 & 2^0 \equiv 1 & 3^0 \equiv 1 & 4^0 \equiv 1 \\
i = 1 & 2^1 \equiv 2 & 3^1 \equiv 3 & 4^1 \equiv 4 \\
i = 2 & 2^2 \equiv 4 & 3^2 \equiv 4 & 4^2 \equiv 1 \\
i = 3 & 2^3 \equiv 3 & 3^3 \equiv 2 & 4^3 \equiv 4 \\
i = 4 & 2^4 \equiv 1 & 3^4 \equiv 1 & 4^4 \equiv 1 \\
\hline
\hline
\end{array}
Therefore, $2$ and $3$ are the primitive roots of $5$, but $4$ is
not. Also note that $3$ is the multiplicative inverse of 2, because $
2 \times 3 = 6 \equiv 1\ \text{mod}\ 5$. Essentially, primitive roots
give us a permutation of the numbers in $[1,p-1]$. Using $2$, gives
us the sequence $\{ 1,2,4,3 \}$. Using $3$, gives the sequence
$\{ 1,3,4,2\}$.
Using $1$-indexing (as opposed to $0$-indexing), note that in the
permutation matrix $P_1$, the non-zero elements appear on rows $\{
1,2,4,3 \}$ from left to right. In the permutation matrix $P_2$, the
non-zero elements appear on rows $\{ 1,3,4,2 \}$.
As a side note, the example on Wikipedia [7] of an $11$-point DFT
transformation is correct but would be much easier to verify if the
outer $11\times11$ matrix is $0$-indexed and the inner $10\times10$
matrix is $1$-indexed.
Circulant Matrix & Cyclic Convolution
A circulant matrix is a square matrix in which every row is the same as
the previous row, rotated to the right by 1. Multiplication of a circulant
matrix by a vector is equivalent to a cyclic convolution [1].
Let $C$ be a circulant matrix of size $n$. Let $F$ be the $n\times n$
DFT matrix. $F$ diagonalizes any circulant matrix [1] i.e.
\begin{align}
C = F D F^{-1}
\end{align}
where $D$ is a diagonal matrix containing the eigenvalues. Therefore,
\begin{align}
CX = F D F^{-1}X
\end{align}
Using this formulation, the computation $CX$ can be done in
$\mathcal{O}(N\ log\ N)$ steps as opposed to $\mathcal{O}(N^2)$
as follows:
- $F^{-1}X$ can be done by an $\textit{IFFT}$ in $N\ log\ N$ steps.
- The eigenvalues in $D$ can be computed by a $\textit{FFT}$ in $N\ log\ N$ steps.
- Finally, multiplying by $F$ can be done by another $\textit{FFT}$ in $N\ log\ N$ steps.
Therefore, transforming the computation to a cyclic computation is useful.
References
- MIT 18.06, Circulant Matrices
- The DFT as Convolution or Filtering, C. Sidney Burrus et al.
- Elemantary Number Theory and Rader's FFT, SIAM Review
- UIC MCS425, Primitive Root
- Cooley-Tukey FFT
- Permutation Matrix
- Wikipedia: Rader's FFT