Back Propagation in Dilated Convolution Layer
January 21, 2018
Consider a valid dilated convolution [1][2] between an input
feature map, and a filter (synonymously kernel or weights),
to produce an output feature map, . Let, be the horizontal
dilation and be the vertical dilation. In the following example, let
and .
During forward propagation, the outputs are given by
Let be the loss (synonymously error or cost) of the network we
want to minimize. During backward propagation, given we calculate i.e.
the weight gradient and i.e. the input
gradient. The weight gradient is used to adjust (learn) the values of
the weight matrix, while the input gradient is propagated backwards
through the network.
Weigth gradient
Using the equations above,
i.e. is equal to a strided valid
convolution between and where
the horizontal stride is equal to the horizontal dilation,
and the vertical stride is equal to the vertical dilation, .
Input gradient
Using the equations above,
Note that,
where, is an exchange matrix [3]. The first matrix will
reverse the rows of the matrix and the second will reverse the columns
of the matrix. Also,
Therefore, is equal to a full
convolution between the dilated (horizontally by and vertically
by ) row and column reversed and .
References
- Dilated Convolutions.
- Valid, Same and Full Convolution.
- Exchange matrix.