How does FFT reduce the complexity of DFT from $\mathcal{O}(N^2)$ to $\mathcal{O}(N\ log\ N)$?
March 26, 2020
In this post, I will use a simple example to explain the magic behind FFT.
The DFT matrix when $N = 4$, for example, is given by,
\begin{align}
\textit{DFT}(4) =
\begin{bmatrix}
1 & 1 & 1 & 1 \\
1 & -j & -1 & j \\
1 & -1 & 1 & -1 \\
1 & j & -1 & -j \\
\end{bmatrix}
\end{align}
Given a column vector
$X = \begin{bmatrix}x_0 & x_1 & x_2 & x_3\end{bmatrix}^T$,
its DFT is given by
$Y = \textit{DFT}(4) \times X$.
It is a dense matrix vector product and requires
$\mathcal{O}(4^2)$ computations.
The FFT algorithm decomposes the above dense DFT matrix into a product of
$log_2\ N$ i.e. 2 sparse matrices as shown below. The first two sparse
matrices have exactly 2 non zero elements per row (independent of $N$).
The third (last) matrix is an exchange matrix [1]. It just swaps the
second and third row of the input and does not involve any computation.
\begin{align}
\textit{DFT}(4) &=
\begin{bmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & 0 & -j \\
1 & 0 & -1 & 0 \\
0 & 1 & 0 & j \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 & 0 & 0 \\
1 & -1 & 0 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\end{align}
Using this formulation, the DFT of $X$ is given by,
\begin{align}
Y &=
\begin{bmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & 0 & -j \\
1 & 0 & -1 & 0 \\
0 & 1 & 0 & j \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 & 0 & 0 \\
1 & -1 & 0 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
x_0 \\
x_1 \\
x_2 \\
x_3 \\
\end{bmatrix} \\
&=
\begin{bmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & 0 & -j \\
1 & 0 & -1 & 0 \\
0 & 1 & 0 & j \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 & 0 & 0 \\
1 & -1 & 0 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
x_0 \\
x_2 \\
x_1 \\
x_3 \\
\end{bmatrix}
\end{align}
In the next step, the (sparse matrix) vector product involves only $2N$
computations instead of $N^2$ computations. Again, in the final step, the
(sparse matrix) vector product involves only $2N$ computations instead of
$N^2$ computations i.e. for each (sparse matrix) vector product, we do
$2N$ computations. Therefore, overall, the total number of computations
to do a DFT is equal to the number of sparse matrices times $2N$ i.e.
$(log_2\ N) \times 2N$ = $\mathcal{O}(N\ log\ N)$.
FFT Butterfly
The above sequence of $log_2\ N$ sparse matrix vector products is popularly
represented graphically using butterfly diagrams [2].
DFT Decomposition
The question which now remains to be answered is how do we decompose
the dense DFT matrix into a product of sparse matrices. As an example,
we will decompose the $\textit{DFT}(4)$ matrix. In the process, we will
make use of the block matrix notation [3]. From above,
\begin{align}
\textit{DFT}(4) =
\begin{bmatrix}
1 & 1 & 1 & 1 \\
1 & -j & -1 & j \\
1 & -1 & 1 & -1 \\
1 & j & -1 & -j \\
\end{bmatrix}
\end{align}
If we permute the matrix to move the odd columns (1 and 3) to the left
and the even columns (2 and 4) to the right, we get,
\begin{align}
\textit{DFT}(4) =
\begin{bmatrix}
1 & 1 & 1 & 1 \\
1 & -1 & -j & j \\
1 & 1 & -1 & -1 \\
1 & -1 & j & -j \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\end{align}
Using block matrix notation (for ease), we can write the first matrix as
\begin{align}
M =
\begin{bmatrix}
A & B \\
A & -B \\
\end{bmatrix}
\text{where, }
A =
\begin{bmatrix}
1 & 1 \\
1 & -1 \\
\end{bmatrix},
B =
\begin{bmatrix}
1 & 1 \\
-j & j \\
\end{bmatrix}
\end{align}
Note that $A == \textit{DFT}(2)$. Furthermore, we can decompose B as,
\begin{align}
B =
\begin{bmatrix}
1 & 0 \\
0 & -j \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 \\
1 & -1 \\
\end{bmatrix}
= CA
\text{, where }
C =
\begin{bmatrix}
1 & 0 \\
0 & -j \\
\end{bmatrix}
\end{align}
Therefore,
\begin{align}
M =
\begin{bmatrix}
A & CA \\
A & -CA \\
\end{bmatrix}
=
\begin{bmatrix}
I_2 & C \\
I_2 & -C \\
\end{bmatrix}
\begin{bmatrix}
A & 0 \\
0 & A \\
\end{bmatrix}
\text{where, } I_2 \text{ is the identity matrix}
\begin{bmatrix}
1 & 0 \\
0 & 1 \\
\end{bmatrix}
\end{align}
Expanding out all the block matrices, we get
\begin{align}
M =
\begin{bmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & 0 & -j \\
1 & 0 & -1 & 0 \\
0 & 1 & 0 & j \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 & 0 & 0 \\
1 & -1 & 0 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & -1 \\
\end{bmatrix}
\end{align}
and
\begin{align}
\textit{DFT}(4) =
\begin{bmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & 0 & -j \\
1 & 0 & -1 & 0 \\
0 & 1 & 0 & j \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 & 0 & 0 \\
1 & -1 & 0 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\end{align}
$Q.E.D.$
Note that $\textit{DFT}(2)$ appeared in the decomposition of
$\textit{DFT}(4)$. Likewise, $\textit{DFT}(4)$ appears in the
decomposition of $\textit{DFT}(8)$ and so on.
Code
Here is the code to verify the DFT factorization.
import numpy as np
from scipy.linalg import dft
np.set_printoptions(precision=2, suppress=True) # for compact output
sparse1 = np.array([
[1, 0, 1, 0 ],
[0, 1, 0, -1j],
[1, 0, -1, 0 ],
[0, 1, 0, 1j]])
sparse2 = np.array([
[1, 1, 0, 0],
[1, -1, 0, 0],
[0, 0, 1, 1],
[0, 0, 1, -1]])
permutation = np.array([
[1, 0, 0, 0],
[0, 0, 1, 0],
[0, 1, 0, 0],
[0, 0, 0, 1]])
# Get the DFT matrix
result1 = dft(4)
# DFT matrix factorization
result2 = (sparse1 @ sparse2 @ permutation)
print(np.absolute(result1 - result2))
References
- Exchange Matrix
- Butterfly Diagram
- Block Matrix