# Back Propagation in Strided Convolution Layer

###### January 10, 2018

Consider a valid strided convolution [1] between an input feature map, $X$ and a filter (synonymously kernel or weights), $W$ to produce an output feature map, $Y$. Let, $U$ be the horizontal stride and $V$ be the vertical stride. In the following example, let $U=2$ and $V=2$. In the following example, there is some horizontal overlap but no vertical overlap as we move the filter across the input feature map. The case where, $U=V=1$ has been discussed in a different post.

\begin{align} \begin{bmatrix} w_1 & w_2 & w_3 \\ w_4 & w_5 & w_6 \\ \end{bmatrix} \circledast \begin{bmatrix} x_1 & x_2 & x_3 & x_4 & x_5 \\ x_6 & x_7 & x_8 & x_9 & x_{10} \\ x_{11} & x_{12} & x_{13} & x_{14} & x_{15} \\ x_{16} & x_{17} & x_{18} & x_{19} & x_{20} \\ \end{bmatrix} &= \begin{bmatrix} y_1 & y_2 \\ y_3 & y_4 \\ \end{bmatrix} \\ \\ W \circledast X &= Y \\ \end{align}

During forward propagation, the outputs are given by

\begin{align} y_1 &= w_1x_1 + w_2x_2 + w_3x_3 + w_4x_6 + w_5x_7 + w_6x_8 \\ y_2 &= w_1x_3 + w_2x_4 + w_3x_5 + w_4x_8 + w_5x_9 + w_6x_{10} \\ y_3 &= w_1x_{11} + w_2x_{12} + w_3x_{13} + w_4x_{16} + w_5x_{17} + w_6x_{18} \\ y_4 &= w_1x_{13} + w_2x_{14} + w_3x_{15} + w_4x_{18} + w_5x_{19} + w_6x_{20} \\ \end{align}
Let $L$ be the loss (synonymously error or cost) of the network we want to minimize. During backward propagation, given $\frac{\partial L}{\partial Y}$ we calculate $\frac{\partial L}{\partial W}$ i.e. the weight gradient, $\frac{\partial L}{\partial X}$ i.e. the input gradient. The weight gradient is used to adjust (learn) the values of the weight matrix, while the input gradient is propagated backwards through the network.

## Weight gradient

Using the equations above,

\begin{align} \frac{\partial L}{\partial W} &= \begin{bmatrix} \frac{\partial L}{\partial w_1} & \frac{\partial L}{\partial w_2} & \frac{\partial L}{\partial w_3} \\ \frac{\partial L}{\partial w_4} & \frac{\partial L}{\partial w_5} & \frac{\partial L}{\partial w_6} \\ \end{bmatrix} \\ &= \begin{bmatrix} (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_1} + \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_1} + \frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_1} + \frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_1}) & (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_2} + \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_2} + \frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_2} + \frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_2}) & (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_3} + \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_3} + \frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_3} + \frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_3}) \\ (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_4} + \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_4} + \frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_4} + \frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_4}) & (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_5} + \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_5} + \frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_5} + \frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_5}) & (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_6} + \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_6} + \frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_6} + \frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial w_6}) \\ \end{bmatrix} \\ &= \begin{bmatrix} (\frac{\partial L}{\partial y_1}x_1 + \frac{\partial L}{\partial y_2}x_3 + \frac{\partial L}{\partial y_3}x_{11} + \frac{\partial L}{\partial y_4}x_{13}) & (\frac{\partial L}{\partial y_1}x_2 + \frac{\partial L}{\partial y_2}x_4 + \frac{\partial L}{\partial y_3}x_{12} + \frac{\partial L}{\partial y_4}x_{14}) & (\frac{\partial L}{\partial y_1}x_3 + \frac{\partial L}{\partial y_2}x_5 + \frac{\partial L}{\partial y_3}x_{13} + \frac{\partial L}{\partial y_4}x_{15}) \\ (\frac{\partial L}{\partial y_1}x_6 + \frac{\partial L}{\partial y_2}x_8 + \frac{\partial L}{\partial y_3}x_{16} + \frac{\partial L}{\partial y_4}x_{18}) & (\frac{\partial L}{\partial y_1}x_7 + \frac{\partial L}{\partial y_2}x_9 + \frac{\partial L}{\partial y_3}x_{17} + \frac{\partial L}{\partial y_4}x_{19}) & (\frac{\partial L}{\partial y_1}x_8 + \frac{\partial L}{\partial y_2}x_{10} + \frac{\partial L}{\partial y_3}x_{18} + \frac{\partial L}{\partial y_4}x_{20}) \\ \end{bmatrix} \\ &= \begin{bmatrix} x_1 & x_2 & x_3 & x_4 & x_5 \\ x_6 & x_7 & x_8 & x_9 & x_{10} \\ x_{11} & x_{12} & x_{13} & x_{14} & x_{15} \\ x_{16} & x_{17} & x_{18} & x_{19} & x_{20} \\ \end{bmatrix} \circledast \begin{bmatrix} \frac{\partial L}{\partial y_1} & 0 & \frac{\partial L}{\partial y_2} \\ 0 & 0 & 0 \\ \frac{\partial L}{\partial y_3} & 0 & \frac{\partial L}{\partial y_4} \\ \end{bmatrix} \end{align}

Note that,

\begin{align} \begin{bmatrix} \frac{\partial L}{\partial y_1} & 0 & \frac{\partial L}{\partial y_2} \\ 0 & 0 & 0 \\ \frac{\partial L}{\partial y_3} & 0 & \frac{\partial L}{\partial y_4} \\ \end{bmatrix} &= \begin{bmatrix} 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} \frac{\partial L}{\partial y_1} & \frac{\partial L}{\partial y_2} \\ \frac{\partial L}{\partial y_3} & \frac{\partial L}{\partial y_4} \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}\\ &= D_V \frac{\partial L}{\partial Y} D_U \\ \end{align}

where, $D_V$ dilates the rows of the matrix vertically, for lack of a better term, and $D_U$ dilates the columns of the matrix horizontally. $D_V$ inserts $V-1$ rows of zeros between each row and $D_U$ inserts $U-1$ columns of zeros between each column. Therefore, $\frac{\partial L}{\partial W}$ is equal to a valid convolution between $X$ and horizontally and vertically dilated $\frac{\partial L}{\partial Y}$ i.e.

\begin{align} \frac{\partial L}{\partial W} = X \circledast (D_V \frac{\partial L}{\partial Y} D_U) \end{align}

## Input gradient

Using the equations above,

\begin{align} \frac{\partial L}{\partial X} &= \begin{bmatrix} \frac{\partial L}{\partial x_1} & \frac{\partial L}{\partial x_2} & \frac{\partial L}{\partial x_3} & \frac{\partial L}{\partial x_4} & \frac{\partial L}{\partial x_5} \\ \frac{\partial L}{\partial x_6} & \frac{\partial L}{\partial x_7} & \frac{\partial L}{\partial x_8} & \frac{\partial L}{\partial x_9} & \frac{\partial L}{\partial x_{10}} \\ \frac{\partial L}{\partial x_{11}} & \frac{\partial L}{\partial x_{12}} & \frac{\partial L}{\partial x_{13}} & \frac{\partial L}{\partial x_{14}} & \frac{\partial L}{\partial x_{15}} \\ \frac{\partial L}{\partial x_{16}} & \frac{\partial L}{\partial x_{17}} & \frac{\partial L}{\partial x_{18}} & \frac{\partial L}{\partial x_{19}} & \frac{\partial L}{\partial x_{20}} \end{bmatrix} \\ &= \begin{bmatrix} (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_1}) & (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_2}) & (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_3} + \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_3}) & (\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_4}) & (\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_5}) \\ (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_6}) & (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_7}) & (\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial x_8} + \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_8}) & (\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_9}) & (\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial x_{10}}) \\ (\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_{11}}) & (\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_{12}}) & (\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_{13}} + \frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{13}}) & (\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{14}}) & (\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{15}}) \\ (\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_{16}}) & (\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_{17}}) & (\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial x_{18}} + \frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{18}}) & (\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{19}}) & (\frac{\partial L}{\partial y_4}\frac{\partial y_4}{\partial x_{20}}) \\ \end{bmatrix} \\ &= \begin{bmatrix} (\frac{\partial L}{\partial y_1}w_1) & (\frac{\partial L}{\partial y_1}w_2) & (\frac{\partial L}{\partial y_1}w_3 + \frac{\partial L}{\partial y_2}w_1) & (\frac{\partial L}{\partial y_2}w_2) & (\frac{\partial L}{\partial y_2}w_3) \\ (\frac{\partial L}{\partial y_1}w_4) & (\frac{\partial L}{\partial y_1}w_5) & (\frac{\partial L}{\partial y_1}w_6 + \frac{\partial L}{\partial y_2}w_4) & (\frac{\partial L}{\partial y_2}w_5) & (\frac{\partial L}{\partial y_2}w_6) \\ (\frac{\partial L}{\partial y_3}w_1) & (\frac{\partial L}{\partial y_3}w_2) & (\frac{\partial L}{\partial y_3}w_3 + \frac{\partial L}{\partial y_4}w_1) & (\frac{\partial L}{\partial y_4}w_2) & (\frac{\partial L}{\partial y_4}w_3) \\ (\frac{\partial L}{\partial y_3}w_4) & (\frac{\partial L}{\partial y_3}w_5) & (\frac{\partial L}{\partial y_3}w_6 + \frac{\partial L}{\partial y_4}w_4) & (\frac{\partial L}{\partial y_4}w_5) & (\frac{\partial L}{\partial y_4}w_6) \\ \end{bmatrix} \\ &= \begin{bmatrix} w_6 & w_5 & w_4 \\ w_3 & w_2 & w_1 \\ \end{bmatrix} \circledast \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \frac{\partial L}{\partial y_1} & 0 & \frac{\partial L}{\partial y_2} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \frac{\partial L}{\partial y_3} & 0 & \frac{\partial L}{\partial y_4} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \\ \end{align}
Note that, \begin{align} \begin{bmatrix} w_6 & w_5 & w_4 \\ w_3 & w_2 & w_1 \\ \end{bmatrix} &= \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix} \begin{bmatrix} w_1 & w_2 & w_3 \\ w_4 & w_5 & w_6 \\ \end{bmatrix} \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ \end{bmatrix} \\ &= J_2 W J_3 \\ \end{align}

where, $J_2$ and $J_3$ are exchange matrices [2]. $J_2$ will reverse the rows of the matrix and $J_3$ will reverse the columns of the matrix. The second matrix is the dilated $\frac{\partial L}{\partial Y}$ and appropriately padded with zeros to prepare it for a full convolution [1]. Therefore, $\frac{\partial L}{\partial X}$ is equal to a full convolution between the row and column reversed $W$ and dilated (horizontally by $U$ and vertically by $V$) $\frac{\partial L}{\partial Y}$ i.e.

\begin{align} \frac{\partial L}{\partial X} = (J_2 W J_3) \circledast (D_V \frac{\partial L}{\partial Y} D_U) \end{align}