Back Propagation in Dilated Convolution Layer
January 21, 2018
Consider a valid dilated convolution [1][2] between an input
feature map, $X$ and a filter (synonymously kernel or weights), $W$
to produce an output feature map, $Y$. Let, $I$ be the horizontal
dilation and $J$ be the vertical dilation. In the following example, let
$I=2$ and $J=2$.
\begin{align}
\begin{bmatrix}
w_1 & w_2 \\
w_3 & w_4 \\
\end{bmatrix}
\circledast
\begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
x_5 & x_6 & x_7 & x_8 \\
x_9 & x_{10} & x_{11} & x_{12} \\
x_{13} & x_{14} & x_{15} & x_{16}
\end{bmatrix} &=
\begin{bmatrix}
y_1 & y_2 \\
y_3 & y_4 \\
\end{bmatrix} \\
\\
W \circledast X &= Y \\
\end{align}
During forward propagation, the outputs are given by
\begin{align}
y_1 &= w_1x_1 + w_2x_3 + w_3x_9 + w_4x_{11} \\
y_2 &= w_1x_2 + w_2x_4 + w_3x_{10} + w_4x_{12} \\
y_3 &= w_1x_5 + w_2x_7 + w_3x_{13} + w_4x_{15} \\
y_4 &= w_1x_6 + w_2x_8 + w_3x_{14} + w_4x_{16} \\
\end{align}
Let $L$ be the loss (synonymously error or cost) of the network we
want to minimize. During backward propagation, given $\frac{\partial
L}{\partial Y}$ we calculate $\frac{\partial L}{\partial W}$ i.e.
the weight gradient and $\frac{\partial L}{\partial X}$ i.e. the input
gradient. The weight gradient is used to adjust (learn) the values of
the weight matrix, while the input gradient is propagated backwards
through the network.
Weigth gradient
Using the equations above,
\begin{align}
\frac{\partial L}{\partial W}
&=
\begin{bmatrix}
\frac{\partial L}{\partial w_1} &
\frac{\partial L}{\partial w_2} \\
\frac{\partial L}{\partial w_3} &
\frac{\partial L}{\partial w_4} \\
\end{bmatrix} \\
&=
\begin{bmatrix}
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_1} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_1} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_1} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_1})
&
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_2} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_2} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_2} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_2})\\
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_3} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_3} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_3} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_3})
&
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_4} +
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_4} +
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_4} +
\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_4})
\end{bmatrix} \\
&=
\begin{bmatrix}
(\frac{\partial L}{\partial y_1}x_1 +
\frac{\partial L}{\partial y_2}x_2 +
\frac{\partial L}{\partial y_3}x_5 +
\frac{\partial L}{\partial y_4}x_6)
&
(\frac{\partial L}{\partial y_1}x_3 +
\frac{\partial L}{\partial y_2}x_4 +
\frac{\partial L}{\partial y_3}x_7 +
\frac{\partial L}{\partial y_4}x_8) \\
(\frac{\partial L}{\partial y_1}x_9 +
\frac{\partial L}{\partial y_2}x_{10} +
\frac{\partial L}{\partial y_3}x_{13} +
\frac{\partial L}{\partial y_4}x_{14})
&
(\frac{\partial L}{\partial y_1}x_{11} +
\frac{\partial L}{\partial y_2}x_{12} +
\frac{\partial L}{\partial y_3}x_{15} +
\frac{\partial L}{\partial y_4}x_{16})
\end{bmatrix} \\
&=
\begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
x_5 & x_6 & x_7 & x_8 \\
x_9 & x_{10} & x_{11} & x_{12} \\
x_{13} & x_{14} & x_{15} & x_{16}
\end{bmatrix} \circledast
\begin{bmatrix}
\frac{\partial L}{\partial y_1} &
\frac{\partial L}{\partial y_2} \\
\frac{\partial L}{\partial y_3} &
\frac{\partial L}{\partial y_4} \\
\end{bmatrix} \text{(with horizontal stride = 2 and vertical stride = 2)}
\end{align}
i.e. $\frac{\partial L}{\partial W}$ is equal to a strided valid
convolution between $X$ and $\frac{\partial L}{\partial Y}$ where
the horizontal stride is equal to the horizontal dilation, $I$
and the vertical stride is equal to the vertical dilation, $J$.
Input gradient
Using the equations above,
\begin{align}
\frac{\partial L}{\partial X}
&=
\begin{bmatrix}
\frac{\partial L}{\partial x_1} &
\frac{\partial L}{\partial x_2} &
\frac{\partial L}{\partial x_3} &
\frac{\partial L}{\partial x_4} &
\frac{\partial L}{\partial x_5} \\
\frac{\partial L}{\partial x_6} &
\frac{\partial L}{\partial x_7} &
\frac{\partial L}{\partial x_8} &
\frac{\partial L}{\partial x_9} &
\frac{\partial L}{\partial x_{10}} \\
\frac{\partial L}{\partial x_{11}} &
\frac{\partial L}{\partial x_{12}} &
\frac{\partial L}{\partial x_{13}} &
\frac{\partial L}{\partial x_{14}} &
\frac{\partial L}{\partial x_{15}} \\
\frac{\partial L}{\partial x_{16}} &
\frac{\partial L}{\partial x_{17}} &
\frac{\partial L}{\partial x_{18}} &
\frac{\partial L}{\partial x_{19}} &
\frac{\partial L}{\partial x_{20}}
\end{bmatrix} \\
&=
\begin{bmatrix}
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_1}) &
(\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_2}) &
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_3}) &
(\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_4}) \\
(\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_5}) &
(\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_6}) &
(\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_7}) &
(\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_8}) \\
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_9}) &
(\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_{10}}) &
(\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_{11}}) &
(\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_{12}}) \\
(\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_{13}}) &
(\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{14}}) &
(\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_{15}}) &
(\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{16}}) \\
\end{bmatrix} \\
&=
\begin{bmatrix}
(\frac{\partial L}{\partial y_1}w_1) &
(\frac{\partial L}{\partial y_2}w_1) &
(\frac{\partial L}{\partial y_1}w_2) &
(\frac{\partial L}{\partial y_2}w_2) \\
(\frac{\partial L}{\partial y_3}w_1) &
(\frac{\partial L}{\partial y_4}w_1) &
(\frac{\partial L}{\partial y_3}w_2) &
(\frac{\partial L}{\partial y_4}w_2) \\
(\frac{\partial L}{\partial y_1}w_3) &
(\frac{\partial L}{\partial y_2}w_3) &
(\frac{\partial L}{\partial y_1}w_4) &
(\frac{\partial L}{\partial y_2}w_4) \\
(\frac{\partial L}{\partial y_3}w_3) &
(\frac{\partial L}{\partial y_4}w_3) &
(\frac{\partial L}{\partial y_3}w_4) &
(\frac{\partial L}{\partial y_4}w_4) \\
\end{bmatrix} \\
&=
\begin{bmatrix}
w_4 & 0 & w_3 \\
0 & 0 & 0 \\
w_2 & 0 & w_1 \\
\end{bmatrix}
\circledast
\begin{bmatrix}
\frac{\partial L}{\partial y_1} &
\frac{\partial L}{\partial y_2} \\
\frac{\partial L}{\partial y_3} &
\frac{\partial L}{\partial y_4} \\
\end{bmatrix}
\end{align}
Note that,
\begin{align}
\begin{bmatrix}
w_4 & w_3 \\
w_2 & w_1 \\
\end{bmatrix}
&=
\begin{bmatrix}
0 & 1 \\
1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
w_1 & w_2 \\
w_3 & w_4 \\
\end{bmatrix}
\begin{bmatrix}
0 & 1 \\
1 & 0 \\
\end{bmatrix} \\
&= J_2 W J_2 \\
\end{align}
where, $J_2$ is an exchange matrix [3]. The first matrix will
reverse the rows of the matrix and the second will reverse the columns
of the matrix. Also,
\begin{align}
\begin{bmatrix}
w_4 & 0 & w_3 \\
0 & 0 & 0 \\
w_2 & 0 & w_1 \\
\end{bmatrix}
&=
\begin{bmatrix}
1 & 0 \\
0 & 0 \\
0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
w_4 & w_3 \\
w_2 & w_1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}
\end{align}
Therefore, $\frac{\partial L}{\partial X}$ is equal to a full
convolution between the dilated (horizontally by $I$ and vertically
by $J$) row and column reversed $W$ and $\frac{\partial L}{\partial Y}$.
References
- Dilated Convolutions.
- Valid, Same and Full Convolution.
- Exchange matrix.