Rader's FFT

April 14, 2020

Cooley-Tukey algorithm [5] is the most common FFT algorithm. It reduces the computational time of a $N$-point DFT from $\mathcal{O}(N^2)$ to $\mathcal{O}(N\ log\ N)$ for highly composite $N$. However, when $N$ is prime, it cannot improve the computational time. Rader's FFT algorithm solves this problem. It converts a $N$-point DFT to a $N-1$ cyclic convolution. It uses some very interesting number theory to do this transformation. The cyclic convolution can then in turn be computed very efficiently e.g. using FFT [1]. For example, Rader's algorithm can convert a $17$-point DFT to a $16$-point cyclic convolution. The $16$-point cyclic convolution can in turn be computed very efficiently by using $16$-point FFT.

In this blog post, I will cover cyclic convolution as well as the number theory used in this algorithm, but first lets look at an example. Consider a $5$-point DFT.

\begin{align} \begin{bmatrix} y(0) \\ y(1) \\ y(2) \\ y(3) \\ y(4) \\ \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & \omega & \omega^2 & \omega^3 & \omega^4 \\ 1 & \omega^2 & \omega^4 & \omega^6 & \omega^8 \\ 1 & \omega^3 & \omega^6 & \omega^9 & \omega^{12} \\ 1 & \omega^4 & \omega^8 & \omega^{12} & \omega^{16} \\ \end{bmatrix} \begin{bmatrix} x(0) \\ x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} \end{align}

where, $\omega = e^{\frac{-2{\pi}i}{5}}$ is the $5^{th}$ root of unity. Since, $\omega^5 = 1$, we can rewrite the above equation as,

\begin{align} \begin{bmatrix} y(0) \\ y(1) \\ y(2) \\ y(3) \\ y(4) \\ \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & \omega & \omega^2 & \omega^3 & \omega^4 \\ 1 & \omega^2 & \omega^4 & \omega & \omega^3 \\ 1 & \omega^3 & \omega & \omega^4 & \omega^2 \\ 1 & \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} x(0) \\ x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} \end{align}

We can again rewrite the computation as,

\begin{align} y(0) &= \sum_{i=0}^{4} x(i) \\ \\ \begin{bmatrix} y(1) \\ y(2) \\ y(3) \\ y(4) \\ \end{bmatrix} &= \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \end{align}

In the next two steps, we will transform the matrix on the right into a circulant matrix [1] i.e. a matrix where every row is the same as the previous row, rotated to the right by 1. Multiplication of a circulant matrix by a vector is equivalent to a circular convolution [1].

The transformations are extremely simple but ingenious and are the crux of the algorithm. Both transformations involve multiplying with a permutation matrix [6]. Multiplying a matrix, $A$, with a permutation matrix, $P$, simply permutes the rows (when pre-multiplying i.e. $PA$) or columns (when post-multiplying i.e. $AP$) of the given matrix, $A$. The design of the permutation matrix requires some number theory which we will discuss later.

Step 1

Let $P_1$ be a permutation matrix given by,

\begin{align} P_1 &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \end{align}

Premultiplying both sides of the above equation with $P_1$, we get

\begin{align} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} y(1) \\ y(2) \\ y(3) \\ y(4) \\ \end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \\ \begin{bmatrix} y(1) \\ y(2) \\ y(4) \\ y(3) \\ \end{bmatrix} &= \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \end{align}

Step 2

Let $P_2$ be a permutation matrix given by,

\begin{align} P_2 &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \end{align}

Note that,

\begin{align} \begin{bmatrix} x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(3) \\ x(4) \\ x(2) \\ \end{bmatrix} = P_2 \begin{bmatrix} x(1) \\ x(3) \\ x(4) \\ x(2) \\ \end{bmatrix} \end{align}

By substituting, we get

\begin{align} \begin{bmatrix} y(1) \\ y(2) \\ y(4) \\ y(3) \\ \end{bmatrix} &= \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(3) \\ x(4) \\ x(2) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \\ &= \begin{bmatrix} \omega & \omega^3 & \omega^4 & \omega^2 \\ \omega^2 & \omega & \omega^3 & \omega^4 \\ \omega^4 & \omega^2 & \omega & \omega^3 \\ \omega^3 & \omega^4 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(3) \\ x(4) \\ x(2) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \end{align}

In summary, we did the following transformation, to convert the matrix on the right into a circulant matrix on the left.

\begin{align} \begin{bmatrix} \omega & \omega^3 & \omega^4 & \omega^2 \\ \omega^2 & \omega & \omega^3 & \omega^4 \\ \omega^4 & \omega^2 & \omega & \omega^3 \\ \omega^3 & \omega^4 & \omega^2 & \omega \\ \end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \\ &= P_1 \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} P_2 \end{align}

Now, we will answer the following two questions:

  • How did we design the transformations (permutation matrices $P_1$ and $P_2$)?
  • How will transforming the computation to a cyclic convolution help?

The answer to the first question comes from number theory (Primitive Root). The answer to the second question comes from linear algebra (Eigen Decomposition).

Primitive Root [4]


If $p$ is prime, and $x$ is a natural number less than $p$, i.e. $x \in [1,p-1]$ then Fermat's Little Theorem tells us that

$$ x^{p-1} \equiv 1\ \text{mod}\ p $$

For example, if $p = 5$, then

\begin{align} 1^4 &= 1 &\equiv 1\ \text{mod}\ 5 \\ 2^4 &= 16 &\equiv 1\ \text{mod}\ 5 \\ 3^4 &= 81 &\equiv 1\ \text{mod}\ 5 \\ 4^4 &= 256 &\equiv 1\ \text{mod}\ 5 \\ \end{align}

Now, it turns out that we can find an integer $b \in [1,p-1]$ such that $b^i \not\equiv 1\ \text{mod}\ p$, for $i \in [1,p-2]$. Such an integer $b$, is called a primitive root of $p$.

\begin{array}{l|l|l} \hline \hline & b = 2 & b = 3 & b = 4 \\ \hline i = 0 & 2^0 \equiv 1 & 3^0 \equiv 1 & 4^0 \equiv 1 \\ i = 1 & 2^1 \equiv 2 & 3^1 \equiv 3 & 4^1 \equiv 4 \\ i = 2 & 2^2 \equiv 4 & 3^2 \equiv 4 & 4^2 \equiv 1 \\ i = 3 & 2^3 \equiv 3 & 3^3 \equiv 2 & 4^3 \equiv 4 \\ i = 4 & 2^4 \equiv 1 & 3^4 \equiv 1 & 4^4 \equiv 1 \\ \hline \hline \end{array}

Therefore, $2$ and $3$ are the primitive roots of $5$, but $4$ is not. Also note that $3$ is the multiplicative inverse of 2, because $ 2 \times 3 = 6 \equiv 1\ \text{mod}\ 5$. Essentially, primitive roots give us a permutation of the numbers in $[1,p-1]$. Using $2$, gives us the sequence $\{ 1,2,4,3 \}$. Using $3$, gives the sequence $\{ 1,3,4,2\}$.

Using $1$-indexing (as opposed to $0$-indexing), note that in the permutation matrix $P_1$, the non-zero elements appear on rows $\{ 1,2,4,3 \}$ from left to right. In the permutation matrix $P_2$, the non-zero elements appear on rows $\{ 1,3,4,2 \}$.

As a side note, the example on Wikipedia [7] of an $11$-point DFT transformation is correct but would be much easier to verify if the outer $11\times11$ matrix is $0$-indexed and the inner $10\times10$ matrix is $1$-indexed.

Circulant Matrix & Cyclic Convolution


A circulant matrix is a square matrix in which every row is the same as the previous row, rotated to the right by 1. Multiplication of a circulant matrix by a vector is equivalent to a cyclic convolution [1]. Let $C$ be a circulant matrix of size $n$. Let $F$ be the $n\times n$ DFT matrix. $F$ diagonalizes any circulant matrix [1] i.e.

\begin{align} C = F D F^{-1} \end{align}

where $D$ is a diagonal matrix containing the eigenvalues. Therefore,

\begin{align} CX = F D F^{-1}X \end{align}

Using this formulation, the computation $CX$ can be done in $\mathcal{O}(N\ log\ N)$ steps as opposed to $\mathcal{O}(N^2)$ as follows:

  • $F^{-1}X$ can be done by an $\textit{IFFT}$ in $N\ log\ N$ steps.
  • The eigenvalues in $D$ can be computed by a $\textit{FFT}$ in $N\ log\ N$ steps.
  • Finally, multiplying by $F$ can be done by another $\textit{FFT}$ in $N\ log\ N$ steps.

Therefore, transforming the computation to a cyclic computation is useful.

References

  1. MIT 18.06, Circulant Matrices
  2. The DFT as Convolution or Filtering, C. Sidney Burrus et al.
  3. Elemantary Number Theory and Rader's FFT, SIAM Review
  4. UIC MCS425, Primitive Root
  5. Cooley-Tukey FFT
  6. Permutation Matrix
  7. Wikipedia: Rader's FFT