关于卷积操作是如何进行的就不必多说了, 结合代码一步一步来看卷积层是怎么实现的.
先看一下其基本的组件函数, 首先是 determine_padding(filter_shape, output_shape="same"):
- def determine_padding(filter_shape, output_shape="same"):
- # No padding
- if output_shape == "valid":
- return (0, 0), (0, 0)
- # Pad so that the output shape is the same as input shape (given that stride=1)
- elif output_shape == "same":
- filter_height, filter_width = filter_shape
- # Derived from:
- # output_height = (height + pad_h - filter_height) / stride + 1
- # In this case output_height = height and stride = 1. This gives the
- # expression for the padding below.
- pad_h1 = int(math.floor((filter_height - 1)/2))
- pad_h2 = int(math.ceil((filter_height - 1)/2))
- pad_w1 = int(math.floor((filter_width - 1)/2))
- pad_w2 = int(math.ceil((filter_width - 1)/2))
- return (pad_h1, pad_h2), (pad_w1, pad_w2)
说明: 根据卷积核的形状以及 padding 的方式来计算出 padding 的值, 包括上, 下, 左, 右, 其中 out_shape=valid 表示不填充.
补充:
math.floor(x)表示返回小于或等于 x 的最大整数.
math.ceil(x)表示返回大于或等于 x 的最大整数.
带入实际的参数来看下输出:
pad_h,pad_w=determine_padding((3,3), output_shape="same")
输出:(1,1),(1,1)
然后是 image_to_column(images, filter_shape, stride, output_shape='same')函数
- def image_to_column(images, filter_shape, stride, output_shape='same'):
- filter_height, filter_width = filter_shape
- pad_h, pad_w = determine_padding(filter_shape, output_shape)# Add padding to the image
- images_padded = np.pad(images, ((0, 0), (0, 0), pad_h, pad_w), mode='constant')# Calculate the indices where the dot products are to be applied between weights
- # and the image
- k, i, j = get_im2col_indices(images.shape, filter_shape, (pad_h, pad_w), stride)
- # Get content from image at those indices
- cols = images_padded[:, k, i, j]
- channels = images.shape[1]
- # Reshape content into column shape
- cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)
- return cols
说明: 输入的 images 的形状是 [batchsize,channel,height,width], 类似于 pytorch 的图像格式的输入. 也就是说 images_padded 是在 height 和 width 上进行 padding 的. 在其中调用了 get_im2col_indices() 函数, 那我们接下来看看它是个什么样子的:
- def get_im2col_indices(images_shape, filter_shape, padding, stride=1):
- # First figure out what the size of the output should be
- batch_size, channels, height, width = images_shape
- filter_height, filter_width = filter_shape
- pad_h, pad_w = padding
- out_height = int((height + np.sum(pad_h) - filter_height) / stride + 1)
- out_width = int((width + np.sum(pad_w) - filter_width) / stride + 1)
- i0 = np.repeat(np.arange(filter_height), filter_width)
- i0 = np.tile(i0, channels)
- i1 = stride * np.repeat(np.arange(out_height), out_width)
- j0 = np.tile(np.arange(filter_width), filter_height * channels)
- j1 = stride * np.tile(np.arange(out_width), out_height)
- i = i0.reshape(-1, 1) + i1.reshape(1, -1)
- j = j0.reshape(-1, 1) + j1.reshape(1, -1)
- k = np.repeat(np.arange(channels), filter_height * filter_width).reshape(-1, 1)return (k, i, j)
说明: 单独看很难理解, 我们还是带着带着实际的参数一步步来看.
get_im2col_indices((1,3,32,32), (3,3), ((1,1),(1,1)), stride=1)
说明: 看一下每一个变量的变化情况, out_width 和 out_height 就不多说, 是卷积之后的输出的特征图的宽和高维度.
i0:np.repeat(np.arange(3),3):[0 ,0,0,1,1,1,2,2,2]
i0:np.tile([0,0,0,1,1,1,2,2,2],3):[0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2], 大小为:(27,)
i1:1*np.repeat(np.arange(32),32):[0,0,0......,31,31,31], 大小为:(1024,)
j0:np.tile(np.arange(3),3*3):[0,1,2,0,1,2,......], 大小为:(27,)
j1:1*np.tile(np.arange(32),32):[0,1,2,3,......,0,1,2,......,29,30,31], 大小为(1024,)
i:i0.reshape(-1,1)+i1.reshape(1,-1): 大小(27,1024)
j:j0.reshape(-1,1)+j1.reshape(1,-1): 大小(27,1024)
k:np.repeat(np.arange(3),3*3).reshape(-1,1): 大小(27,1)
补充:
numpy.pad(array, pad_width, mode, **kwargs):array 是要要被填充的数据, 第二个参数指定填充的长度, mod 用于指定填充的数据, 默认是 0, 如果是 constant, 则需要指定填充的值.
numpy.arange(start, stop, step, dtype = None): 举例 numpy.arange(3), 输出[0,1,2]
numpy.repeat(array,repeats,axis=None): 举例 numpy.repeat([0,1,2],3), 输出:[0,0,0,1,1,1,2,2,2]
numpy.tile(array,reps): 举例 numpy.tile([0,1,2],3), 输出:[0,1,2,0,1,2,0,1,2]
具体的更复杂的用法还是得去查相关资料. 这里只列举出与本代码相关的.
有了这些大小还是挺难理解的呀. 那么我们继续, 需要明确的是 k 是对通道进行操作, i 是对特征图的高, j 是对特征图的宽. 使用 3*3 的卷积核在一个通道上进行卷积, 每次执行 3*3=9 个像素操作, 共 3 个通道, 所以共对 9*3=27 个像素点进行操作. 而图像大小是 32*32, 共 1024 个像素. 再回去看这三行代码:
- cols = images_padded[:, k, i, j]
- channels = images.shape[1]
- # Reshape content into column shape
- cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)
images_padded 的大小是(1,3,34,34), 则 cols=images_padded 的大小是(1,27,1024)
channels 的大小是 3
最终 cols=cols.transpose(1,2,0).reshape(3*3*3,-1)的大小是(27,1024).
当 batchsize 的大小不是 1, 假设是 64 时, 那么最终输出的 cols 的大小就是:(27,1024*64)=(27,65536).
最后就是卷积层的实现了:
首先有一个 Layer 通用基类, 通过继承该基类可以实现不同的层, 例如卷积层, 池化层, 批量归一化层等等:
- class Layer(object):
- def set_input_shape(self, shape):
- """ Sets the shape that the layer expects of the input in the forward
- pass method """
- self.input_shape = shape
- def layer_name(self):
- """The name of the layer. Used in model summary."""
- return self.__class__.__name__
- def parameters(self):
- """The number of trainable parameters used by the layer"""
- return 0
- def forward_pass(self, X, training):
- """Propogates the signal forward in the network"""
- raise NotImplementedError()
- def backward_pass(self, accum_grad):
- """ Propogates the accumulated gradient backwards in the network.
- If the has trainable weights then these weights are also tuned in this method.
- As input (accum_grad) it receives the gradient with respect to the output of the layer and
- returns the gradient with respect to the output of the previous layer. """
- raise NotImplementedError()
- def output_shape(self):
- """The shape of the output produced by forward_pass"""
- raise NotImplementedError()
对于子类继承该基类必须要实现的方法, 如果没有实现使用 raise NotImplementedError()抛出异常.
接着就可以基于该基类实现 Conv2D 了:
- class Conv2D(Layer):
- """A 2D Convolution Layer.
- Parameters:
- -----------
- n_filters: int
- The number of filters that will convolve over the input matrix. The number of channels
- of the output shape.
- filter_shape: tuple
- A tuple (filter_height, filter_width).
- input_shape: tuple
- The shape of the expected input of the layer. (batch_size, channels, height, width)
- Only needs to be specified for first layer in the network.
- padding: string
- Either 'same' or 'valid'. 'same' results in padding being added so that the output height and width
- matches the input height and width. For 'valid' no padding is added.
- stride: int
- The stride length of the filters during the convolution over the input.
- """ def __init__(self, n_filters, filter_shape, input_shape=None, padding='same', stride=1):
- self.n_filters = n_filters
- self.filter_shape = filter_shape
- self.padding = padding
- self.stride = stride
- self.input_shape = input_shape
- self.trainable = True
- def initialize(self, optimizer):
- # Initialize the weights
- filter_height, filter_width = self.filter_shape
- channels = self.input_shape[0]
- limit = 1 / math.sqrt(np.prod(self.filter_shape))
- self.W = np.random.uniform(-limit, limit, size=(self.n_filters, channels, filter_height, filter_width))
- self.w0 = np.zeros((self.n_filters, 1))
- # Weight optimizers
- self.W_opt = copy.copy(optimizer)
- self.w0_opt = copy.copy(optimizer)
- def parameters(self):
- return np.prod(self.W.shape) + np.prod(self.w0.shape)
- def forward_pass(self, X, training=True):
- batch_size, channels, height, width = X.shape
- self.layer_input = X
- # Turn image shape into column shape
- # (enables dot product between input and weights)
- self.X_col = image_to_column(X, self.filter_shape, stride=self.stride, output_shape=self.padding)
- # Turn weights into column shape
- self.W_col = self.W.reshape((self.n_filters, -1))
- # Calculate output
- output = self.W_col.dot(self.X_col) + self.w0
- # Reshape into (n_filters, out_height, out_width, batch_size)
- output = output.reshape(self.output_shape() + (batch_size, ))
- # Redistribute axises so that batch size comes first
- return output.transpose(3,0,1,2)
- def backward_pass(self, accum_grad):
- # Reshape accumulated gradient into column shape
- accum_grad = accum_grad.transpose(1, 2, 3, 0).reshape(self.n_filters, -1)
- if self.trainable:
- # Take dot product between column shaped accum. gradient and column shape
- # layer input to determine the gradient at the layer with respect to layer weights
- grad_w = accum_grad.dot(self.X_col.T).reshape(self.W.shape)
- # The gradient with respect to bias terms is the sum similarly to in Dense layer
- grad_w0 = np.sum(accum_grad, axis=1, keepdims=True)
- # Update the layers weights
- self.W = self.W_opt.update(self.W, grad_w)
- self.w0 = self.w0_opt.update(self.w0, grad_w0)
- # Recalculate the gradient which will be propogated back to prev. layer
- accum_grad = self.W_col.T.dot(accum_grad)
- # Reshape from column shape to image shape
- accum_grad = column_to_image(accum_grad,
- self.layer_input.shape,
- self.filter_shape,
- stride=self.stride,
- output_shape=self.padding)
- return accum_grad
- def output_shape(self):
- channels, height, width = self.input_shape
- pad_h, pad_w = determine_padding(self.filter_shape, output_shape=self.padding)
- output_height = (height + np.sum(pad_h) - self.filter_shape[0]) / self.stride + 1
- output_width = (width + np.sum(pad_w) - self.filter_shape[1]) / self.stride + 1
- return self.n_filters, int(output_height), int(output_width)
假设输入还是 (1,3,32,32) 的维度, 使用 16 个 3*3 的卷积核进行卷积, 那么 self.W 的大小就是(16,3,3,3),self.w0 的大小就是(16,1).
self.X_col 的大小就是(27,1024),self.W_col 的大小是(16,27), 那么 output = self.W_col.dot(self.X_col) + self.w0 的大小就是(16,1024)
最后是这么使用的:
- image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
- input_shape=image.squeeze().shape
- conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='same', stride=1)
- conv2d.initialize(None)
- output=conv2d.forward_pass(image,training=True)
- print(output.shape)
输出结果:(1,16,32,32)
计算下参数:
print(conv2d.parameters())
输出结果: 448
也就是 448=3*3*3*16+16
再是一个 padding=valid 的:
- image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
- input_shape=image.squeeze().shape
- conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=1)
- conv2d.initialize(None)
- output=conv2d.forward_pass(image,training=True)
- print(output.shape)
- print(conv2d.parameters())
需要注意的是 cols 的大小变化了, 因为我们卷积之后的输出是(1,16,30,30)
输出:
cols 的大小:(27,900)
(1,16,30,30)
448
最后是带步长的:
- image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
- input_shape=image.squeeze().shape
- conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=2)
- conv2d.initialize(None)
- output=conv2d.forward_pass(image,training=True)
- print(output.shape)
- print(conv2d.parameters())
cols 的大小:(27,225)
(1,16,15,15)
448
最后补充下:
卷积层参数计算公式 :params = 卷积核高 * 卷积核宽 * 通道数目 * 卷积核数目 + 偏置项(卷积核数目)
卷积之后图像大小计算公式:
输出图像的高 =(输入图像的高 + padding(高)*2 - 卷积核高)/ 步长 + 1
输出图像的宽 =(输入图像的宽 + padding(宽)*2 - 卷积核宽)/ 步长 + 1
get_im2col_indices()函数中的变换操作是清楚了, 至于为什么这么变换的原因还需要好好去琢磨. 至于反向传播和优化 optimizer 等研究好了之后再更新了.
来源: https://www.cnblogs.com/xiximayou/p/12706576.html