###### April 14, 2020

Cooley-Tukey algorithm [5] is the most common FFT algorithm. It reduces the computational time of a $N$-point DFT from $\mathcal{O}(N^2)$ to $\mathcal{O}(N\ log\ N)$ for highly composite $N$. However, when $N$ is prime, it cannot improve the computational time. Rader's FFT algorithm solves this problem. It converts a $N$-point DFT to a $N-1$ cyclic convolution. It uses some very interesting number theory to do this transformation. The cyclic convolution can then in turn be computed very efficiently e.g. using FFT [1]. For example, Rader's algorithm can convert a $17$-point DFT to a $16$-point cyclic convolution. The $16$-point cyclic convolution can in turn be computed very efficiently by using $16$-point FFT.

In this blog post, I will cover cyclic convolution as well as the number theory used in this algorithm, but first lets look at an example. Consider a $5$-point DFT.

\begin{align} \begin{bmatrix} y(0) \\ y(1) \\ y(2) \\ y(3) \\ y(4) \\ \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & \omega & \omega^2 & \omega^3 & \omega^4 \\ 1 & \omega^2 & \omega^4 & \omega^6 & \omega^8 \\ 1 & \omega^3 & \omega^6 & \omega^9 & \omega^{12} \\ 1 & \omega^4 & \omega^8 & \omega^{12} & \omega^{16} \\ \end{bmatrix} \begin{bmatrix} x(0) \\ x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} \end{align}

where, $\omega = e^{\frac{-2{\pi}i}{5}}$ is the $5^{th}$ root of unity. Since, $\omega^5 = 1$, we can rewrite the above equation as,

\begin{align} \begin{bmatrix} y(0) \\ y(1) \\ y(2) \\ y(3) \\ y(4) \\ \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & \omega & \omega^2 & \omega^3 & \omega^4 \\ 1 & \omega^2 & \omega^4 & \omega & \omega^3 \\ 1 & \omega^3 & \omega & \omega^4 & \omega^2 \\ 1 & \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} x(0) \\ x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} \end{align}

We can again rewrite the computation as,

\begin{align} y(0) &= \sum_{i=0}^{4} x(i) \\ \\ \begin{bmatrix} y(1) \\ y(2) \\ y(3) \\ y(4) \\ \end{bmatrix} &= \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \end{align}

In the next two steps, we will transform the matrix on the right into a circulant matrix [1] i.e. a matrix where every row is the same as the previous row, rotated to the right by 1. Multiplication of a circulant matrix by a vector is equivalent to a circular convolution [1].

The transformations are extremely simple but ingenious and are the crux of the algorithm. Both transformations involve multiplying with a permutation matrix [6]. Multiplying a matrix, $A$, with a permutation matrix, $P$, simply permutes the rows (when pre-multiplying i.e. $PA$) or columns (when post-multiplying i.e. $AP$) of the given matrix, $A$. The design of the permutation matrix requires some number theory which we will discuss later.

### Step 1

Let $P_1$ be a permutation matrix given by,

\begin{align} P_1 &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \end{align}

Premultiplying both sides of the above equation with $P_1$, we get

\begin{align} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} y(1) \\ y(2) \\ y(3) \\ y(4) \\ \end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \\ \begin{bmatrix} y(1) \\ y(2) \\ y(4) \\ y(3) \\ \end{bmatrix} &= \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \end{align}

### Step 2

Let $P_2$ be a permutation matrix given by,

\begin{align} P_2 &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \end{align}

Note that,

\begin{align} \begin{bmatrix} x(1) \\ x(2) \\ x(3) \\ x(4) \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(3) \\ x(4) \\ x(2) \\ \end{bmatrix} = P_2 \begin{bmatrix} x(1) \\ x(3) \\ x(4) \\ x(2) \\ \end{bmatrix} \end{align}

By substituting, we get

\begin{align} \begin{bmatrix} y(1) \\ y(2) \\ y(4) \\ y(3) \\ \end{bmatrix} &= \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(3) \\ x(4) \\ x(2) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \\ &= \begin{bmatrix} \omega & \omega^3 & \omega^4 & \omega^2 \\ \omega^2 & \omega & \omega^3 & \omega^4 \\ \omega^4 & \omega^2 & \omega & \omega^3 \\ \omega^3 & \omega^4 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} x(1) \\ x(3) \\ x(4) \\ x(2) \\ \end{bmatrix} + \begin{bmatrix} x(0) \\ x(0) \\ x(0) \\ x(0) \\ \end{bmatrix} \end{align}

In summary, we did the following transformation, to convert the matrix on the right into a circulant matrix on the left.

\begin{align} \begin{bmatrix} \omega & \omega^3 & \omega^4 & \omega^2 \\ \omega^2 & \omega & \omega^3 & \omega^4 \\ \omega^4 & \omega^2 & \omega & \omega^3 \\ \omega^3 & \omega^4 & \omega^2 & \omega \\ \end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \\ &= P_1 \begin{bmatrix} \omega & \omega^2 & \omega^3 & \omega^4 \\ \omega^2 & \omega^4 & \omega & \omega^3 \\ \omega^3 & \omega & \omega^4 & \omega^2 \\ \omega^4 & \omega^3 & \omega^2 & \omega \\ \end{bmatrix} P_2 \end{align}

Now, we will answer the following two questions:

• How did we design the transformations (permutation matrices $P_1$ and $P_2$)?
• How will transforming the computation to a cyclic convolution help?

The answer to the first question comes from number theory (Primitive Root). The answer to the second question comes from linear algebra (Eigen Decomposition).

### Primitive Root [4]

If $p$ is prime, and $x$ is a natural number less than $p$, i.e. $x \in [1,p-1]$ then Fermat's Little Theorem tells us that

$$x^{p-1} \equiv 1\ \text{mod}\ p$$

For example, if $p = 5$, then

\begin{align} 1^4 &= 1 &\equiv 1\ \text{mod}\ 5 \\ 2^4 &= 16 &\equiv 1\ \text{mod}\ 5 \\ 3^4 &= 81 &\equiv 1\ \text{mod}\ 5 \\ 4^4 &= 256 &\equiv 1\ \text{mod}\ 5 \\ \end{align}

Now, it turns out that we can find an integer $b \in [1,p-1]$ such that $b^i \not\equiv 1\ \text{mod}\ p$, for $i \in [1,p-2]$. Such an integer $b$, is called a primitive root of $p$.

\begin{array}{l|l|l} \hline \hline & b = 2 & b = 3 & b = 4 \\ \hline i = 0 & 2^0 \equiv 1 & 3^0 \equiv 1 & 4^0 \equiv 1 \\ i = 1 & 2^1 \equiv 2 & 3^1 \equiv 3 & 4^1 \equiv 4 \\ i = 2 & 2^2 \equiv 4 & 3^2 \equiv 4 & 4^2 \equiv 1 \\ i = 3 & 2^3 \equiv 3 & 3^3 \equiv 2 & 4^3 \equiv 4 \\ i = 4 & 2^4 \equiv 1 & 3^4 \equiv 1 & 4^4 \equiv 1 \\ \hline \hline \end{array}

Therefore, $2$ and $3$ are the primitive roots of $5$, but $4$ is not. Also note that $3$ is the multiplicative inverse of 2, because $2 \times 3 = 6 \equiv 1\ \text{mod}\ 5$. Essentially, primitive roots give us a permutation of the numbers in $[1,p-1]$. Using $2$, gives us the sequence $\{ 1,2,4,3 \}$. Using $3$, gives the sequence $\{ 1,3,4,2\}$.

Using $1$-indexing (as opposed to $0$-indexing), note that in the permutation matrix $P_1$, the non-zero elements appear on rows $\{ 1,2,4,3 \}$ from left to right. In the permutation matrix $P_2$, the non-zero elements appear on rows $\{ 1,3,4,2 \}$.

As a side note, the example on Wikipedia [7] of an $11$-point DFT transformation is correct but would be much easier to verify if the outer $11\times11$ matrix is $0$-indexed and the inner $10\times10$ matrix is $1$-indexed.

### Circulant Matrix & Cyclic Convolution

A circulant matrix is a square matrix in which every row is the same as the previous row, rotated to the right by 1. Multiplication of a circulant matrix by a vector is equivalent to a cyclic convolution [1]. Let $C$ be a circulant matrix of size $n$. Let $F$ be the $n\times n$ DFT matrix. $F$ diagonalizes any circulant matrix [1] i.e.

\begin{align} C = F D F^{-1} \end{align}

where $D$ is a diagonal matrix containing the eigenvalues. Therefore,

\begin{align} CX = F D F^{-1}X \end{align}

Using this formulation, the computation $CX$ can be done in $\mathcal{O}(N\ log\ N)$ steps as opposed to $\mathcal{O}(N^2)$ as follows:

• $F^{-1}X$ can be done by an $\textit{IFFT}$ in $N\ log\ N$ steps.
• The eigenvalues in $D$ can be computed by a $\textit{FFT}$ in $N\ log\ N$ steps.
• Finally, multiplying by $F$ can be done by another $\textit{FFT}$ in $N\ log\ N$ steps.

Therefore, transforming the computation to a cyclic computation is useful.