Back Propagation in Dilated Convolution Layer

January 21, 2018

Consider a valid dilated convolution [1][2] between an input feature map, X and a filter (synonymously kernel or weights), W to produce an output feature map, Y. Let, I be the horizontal dilation and J be the vertical dilation. In the following example, let I=2 and J=2.

[w1w2w3w4][x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16]=[y1y2y3y4]WX=Y

During forward propagation, the outputs are given by

y1=w1x1+w2x3+w3x9+w4x11y2=w1x2+w2x4+w3x10+w4x12y3=w1x5+w2x7+w3x13+w4x15y4=w1x6+w2x8+w3x14+w4x16
Let L be the loss (synonymously error or cost) of the network we want to minimize. During backward propagation, given LY we calculate LW i.e. the weight gradient and LX i.e. the input gradient. The weight gradient is used to adjust (learn) the values of the weight matrix, while the input gradient is propagated backwards through the network.

Weigth gradient

Using the equations above,

LW=[Lw1Lw2Lw3Lw4]=[(Ly1y1w1+Ly2y2w1+Ly3y3w1+Ly4y4w1)(Ly1y1w2+Ly2y2w2+Ly3y3w2+Ly4y4w2)(Ly1y1w3+Ly2y2w3+Ly3y3w3+Ly4y4w3)(Ly1y1w4+Ly2y2w4+Ly3y3w4+Ly4y4w4)]=[(Ly1x1+Ly2x2+Ly3x5+Ly4x6)(Ly1x3+Ly2x4+Ly3x7+Ly4x8)(Ly1x9+Ly2x10+Ly3x13+Ly4x14)(Ly1x11+Ly2x12+Ly3x15+Ly4x16)]=[x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16][Ly1Ly2Ly3Ly4](with horizontal stride = 2 and vertical stride = 2)

i.e. LW is equal to a strided valid convolution between X and LY where the horizontal stride is equal to the horizontal dilation, I and the vertical stride is equal to the vertical dilation, J.

Input gradient

Using the equations above,

LX=[Lx1Lx2Lx3Lx4Lx5Lx6Lx7Lx8Lx9Lx10Lx11Lx12Lx13Lx14Lx15Lx16Lx17Lx18Lx19Lx20]=[(Ly1y1x1)(Ly2y2x2)(Ly1y1x3)(Ly2y2x4)(Ly3y3x5)(Ly4y4x6)(Ly3y3x7)(Ly4y4x8)(Ly1y1x9)(Ly2y2x10)(Ly1y1x11)(Ly2y2x12)(Ly3y3x13)(Ly4y4x14)(Ly3y3x15)(Ly4y4x16)]=[(Ly1w1)(Ly2w1)(Ly1w2)(Ly2w2)(Ly3w1)(Ly4w1)(Ly3w2)(Ly4w2)(Ly1w3)(Ly2w3)(Ly1w4)(Ly2w4)(Ly3w3)(Ly4w3)(Ly3w4)(Ly4w4)]=[w40w3000w20w1][Ly1Ly2Ly3Ly4]

Note that,

[w4w3w2w1]=[0110][w1w2w3w4][0110]=J2WJ2

where, J2 is an exchange matrix [3]. The first matrix will reverse the rows of the matrix and the second will reverse the columns of the matrix. Also,

[w40w3000w20w1]=[100001][w4w3w2w1][100001]

Therefore, LX is equal to a full convolution between the dilated (horizontally by I and vertically by J) row and column reversed W and LY.

References

  1. Dilated Convolutions.
  2. Valid, Same and Full Convolution.
  3. Exchange matrix.