当前位置：

首页
/
IT
/
程序
/
Python
/
[python 实现卷积神经网络] 卷积层 Conv2D 实现 (带 stride,padding)

[python 实现卷积神经网络] 卷积层 Conv2D 实现 (带 stride,padding)

关于卷积操作是如何进行的就不必多说了, 结合代码一步一步来看卷积层是怎么实现的.

先看一下其基本的组件函数, 首先是 determine_padding(filter_shape, output_shape="same"):

def determine_padding(filter_shape, output_shape="same"):
    # No padding
    if output_shape == "valid":
        return (0, 0), (0, 0)
    # Pad so that the output shape is the same as input shape (given that stride=1)
    elif output_shape == "same":
        filter_height, filter_width = filter_shape
        # Derived from:
        # output_height = (height + pad_h - filter_height) / stride + 1
        # In this case output_height = height and stride = 1. This gives the
        # expression for the padding below.
        pad_h1 = int(math.floor((filter_height - 1)/2))
        pad_h2 = int(math.ceil((filter_height - 1)/2))
        pad_w1 = int(math.floor((filter_width - 1)/2))
        pad_w2 = int(math.ceil((filter_width - 1)/2))
        return (pad_h1, pad_h2), (pad_w1, pad_w2)

说明: 根据卷积核的形状以及 padding 的方式来计算出 padding 的值, 包括上, 下, 左, 右, 其中 out_shape=valid 表示不填充.

补充:

math.floor(x)表示返回小于或等于 x 的最大整数.

math.ceil(x)表示返回大于或等于 x 的最大整数.

带入实际的参数来看下输出:

pad_h,pad_w=determine_padding((3,3), output_shape="same")

输出:(1,1),(1,1)

然后是 image_to_column(images, filter_shape, stride, output_shape='same')函数

def image_to_column(images, filter_shape, stride, output_shape='same'):
    filter_height, filter_width = filter_shape
    pad_h, pad_w = determine_padding(filter_shape, output_shape)# Add padding to the image
    images_padded = np.pad(images, ((0, 0), (0, 0), pad_h, pad_w), mode='constant')# Calculate the indices where the dot products are to be applied between weights
    # and the image
    k, i, j = get_im2col_indices(images.shape, filter_shape, (pad_h, pad_w), stride)
    # Get content from image at those indices
    cols = images_padded[:, k, i, j]
    channels = images.shape[1]
    # Reshape content into column shape
    cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)
    return cols

说明: 输入的 images 的形状是 [batchsize,channel,height,width], 类似于 pytorch 的图像格式的输入. 也就是说 images_padded 是在 height 和 width 上进行 padding 的. 在其中调用了 get_im2col_indices() 函数, 那我们接下来看看它是个什么样子的:

def get_im2col_indices(images_shape, filter_shape, padding, stride=1):
    # First figure out what the size of the output should be
    batch_size, channels, height, width = images_shape
    filter_height, filter_width = filter_shape
    pad_h, pad_w = padding
    out_height = int((height + np.sum(pad_h) - filter_height) / stride + 1)
    out_width = int((width + np.sum(pad_w) - filter_width) / stride + 1)
    i0 = np.repeat(np.arange(filter_height), filter_width)
    i0 = np.tile(i0, channels)
    i1 = stride * np.repeat(np.arange(out_height), out_width)
    j0 = np.tile(np.arange(filter_width), filter_height * channels)
    j1 = stride * np.tile(np.arange(out_width), out_height)
    i = i0.reshape(-1, 1) + i1.reshape(1, -1)
    j = j0.reshape(-1, 1) + j1.reshape(1, -1)
    k = np.repeat(np.arange(channels), filter_height * filter_width).reshape(-1, 1)return (k, i, j)

说明: 单独看很难理解, 我们还是带着带着实际的参数一步步来看.

get_im2col_indices((1,3,32,32), (3,3), ((1,1),(1,1)), stride=1)

说明: 看一下每一个变量的变化情况, out_width 和 out_height 就不多说, 是卷积之后的输出的特征图的宽和高维度.

i0:np.repeat(np.arange(3),3):[0 ,0,0,1,1,1,2,2,2]

i0:np.tile([0,0,0,1,1,1,2,2,2],3):[0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2], 大小为:(27,)

i1:1*np.repeat(np.arange(32),32):[0,0,0......,31,31,31], 大小为:(1024,)

j0:np.tile(np.arange(3),3*3):[0,1,2,0,1,2,......], 大小为:(27,)

j1:1*np.tile(np.arange(32),32):[0,1,2,3,......,0,1,2,......,29,30,31], 大小为(1024,)

i:i0.reshape(-1,1)+i1.reshape(1,-1): 大小(27,1024)

j:j0.reshape(-1,1)+j1.reshape(1,-1): 大小(27,1024)

k:np.repeat(np.arange(3),3*3).reshape(-1,1): 大小(27,1)

补充:

numpy.pad(array, pad_width, mode, **kwargs):array 是要要被填充的数据, 第二个参数指定填充的长度, mod 用于指定填充的数据, 默认是 0, 如果是 constant, 则需要指定填充的值.

numpy.arange(start, stop, step, dtype = None): 举例 numpy.arange(3), 输出[0,1,2]

numpy.repeat(array,repeats,axis=None): 举例 numpy.repeat([0,1,2],3), 输出:[0,0,0,1,1,1,2,2,2]

numpy.tile(array,reps): 举例 numpy.tile([0,1,2],3), 输出:[0,1,2,0,1,2,0,1,2]

具体的更复杂的用法还是得去查相关资料. 这里只列举出与本代码相关的.

有了这些大小还是挺难理解的呀. 那么我们继续, 需要明确的是 k 是对通道进行操作, i 是对特征图的高, j 是对特征图的宽. 使用 3*3 的卷积核在一个通道上进行卷积, 每次执行 3*3=9 个像素操作, 共 3 个通道, 所以共对 9*3=27 个像素点进行操作. 而图像大小是 32*32, 共 1024 个像素. 再回去看这三行代码:

cols = images_padded[:, k, i, j]
    channels = images.shape[1]
    # Reshape content into column shape
    cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)

images_padded 的大小是(1,3,34,34), 则 cols=images_padded 的大小是(1,27,1024)

channels 的大小是 3

最终 cols=cols.transpose(1,2,0).reshape(3*3*3,-1)的大小是(27,1024).

当 batchsize 的大小不是 1, 假设是 64 时, 那么最终输出的 cols 的大小就是:(27,1024*64)=(27,65536).

最后就是卷积层的实现了:

首先有一个 Layer 通用基类, 通过继承该基类可以实现不同的层, 例如卷积层, 池化层, 批量归一化层等等:

class Layer(object):
    def set_input_shape(self, shape):
        """ Sets the shape that the layer expects of the input in the forward
        pass method """
        self.input_shape = shape
    def layer_name(self):
        """The name of the layer. Used in model summary."""
        return self.__class__.__name__
    def parameters(self):
        """The number of trainable parameters used by the layer"""
        return 0
    def forward_pass(self, X, training):
        """Propogates the signal forward in the network"""
        raise NotImplementedError()
    def backward_pass(self, accum_grad):
        """ Propogates the accumulated gradient backwards in the network.
        If the has trainable weights then these weights are also tuned in this method.
        As input (accum_grad) it receives the gradient with respect to the output of the layer and
        returns the gradient with respect to the output of the previous layer. """
        raise NotImplementedError()
    def output_shape(self):
        """The shape of the output produced by forward_pass"""
        raise NotImplementedError()

对于子类继承该基类必须要实现的方法, 如果没有实现使用 raise NotImplementedError()抛出异常.

接着就可以基于该基类实现 Conv2D 了:

class Conv2D(Layer):
    """A 2D Convolution Layer.
    Parameters:
    -----------
    n_filters: int
        The number of filters that will convolve over the input matrix. The number of channels
        of the output shape.
    filter_shape: tuple
        A tuple (filter_height, filter_width).
    input_shape: tuple
        The shape of the expected input of the layer. (batch_size, channels, height, width)
        Only needs to be specified for first layer in the network.
    padding: string
        Either 'same' or 'valid'. 'same' results in padding being added so that the output height and width
        matches the input height and width. For 'valid' no padding is added.
    stride: int
        The stride length of the filters during the convolution over the input.
    """    def __init__(self, n_filters, filter_shape, input_shape=None, padding='same', stride=1):
        self.n_filters = n_filters
        self.filter_shape = filter_shape
        self.padding = padding
        self.stride = stride
        self.input_shape = input_shape
        self.trainable = True
    def initialize(self, optimizer):
        # Initialize the weights
        filter_height, filter_width = self.filter_shape
        channels = self.input_shape[0]
        limit = 1 / math.sqrt(np.prod(self.filter_shape))
        self.W  = np.random.uniform(-limit, limit, size=(self.n_filters, channels, filter_height, filter_width))
        self.w0 = np.zeros((self.n_filters, 1))
        # Weight optimizers
        self.W_opt  = copy.copy(optimizer)
        self.w0_opt = copy.copy(optimizer)
    def parameters(self):
        return np.prod(self.W.shape) + np.prod(self.w0.shape)
    def forward_pass(self, X, training=True):
        batch_size, channels, height, width = X.shape
        self.layer_input = X
        # Turn image shape into column shape
        # (enables dot product between input and weights)
        self.X_col = image_to_column(X, self.filter_shape, stride=self.stride, output_shape=self.padding)
        # Turn weights into column shape
        self.W_col = self.W.reshape((self.n_filters, -1))
        # Calculate output
        output = self.W_col.dot(self.X_col) + self.w0
        # Reshape into (n_filters, out_height, out_width, batch_size)
        output = output.reshape(self.output_shape() + (batch_size, ))
        # Redistribute axises so that batch size comes first
        return output.transpose(3,0,1,2)
    def backward_pass(self, accum_grad):
        # Reshape accumulated gradient into column shape
        accum_grad = accum_grad.transpose(1, 2, 3, 0).reshape(self.n_filters, -1)
        if self.trainable:
            # Take dot product between column shaped accum. gradient and column shape
            # layer input to determine the gradient at the layer with respect to layer weights
            grad_w = accum_grad.dot(self.X_col.T).reshape(self.W.shape)
            # The gradient with respect to bias terms is the sum similarly to in Dense layer
            grad_w0 = np.sum(accum_grad, axis=1, keepdims=True)
            # Update the layers weights
            self.W = self.W_opt.update(self.W, grad_w)
            self.w0 = self.w0_opt.update(self.w0, grad_w0)
        # Recalculate the gradient which will be propogated back to prev. layer
        accum_grad = self.W_col.T.dot(accum_grad)
        # Reshape from column shape to image shape
        accum_grad = column_to_image(accum_grad,
                                self.layer_input.shape,
                                self.filter_shape,
                                stride=self.stride,
                                output_shape=self.padding)
        return accum_grad
    def output_shape(self):
        channels, height, width = self.input_shape
        pad_h, pad_w = determine_padding(self.filter_shape, output_shape=self.padding)
        output_height = (height + np.sum(pad_h) - self.filter_shape[0]) / self.stride + 1
        output_width = (width + np.sum(pad_w) - self.filter_shape[1]) / self.stride + 1
        return self.n_filters, int(output_height), int(output_width)

假设输入还是 (1,3,32,32) 的维度, 使用 16 个 3*3 的卷积核进行卷积, 那么 self.W 的大小就是(16,3,3,3),self.w0 的大小就是(16,1).

self.X_col 的大小就是(27,1024),self.W_col 的大小是(16,27), 那么 output = self.W_col.dot(self.X_col) + self.w0 的大小就是(16,1024)

最后是这么使用的:

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='same', stride=1)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)

输出结果:(1,16,32,32)

计算下参数:

print(conv2d.parameters())

输出结果: 448

也就是 448=3*3*3*16+16

再是一个 padding=valid 的:

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=1)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)
print(conv2d.parameters())

需要注意的是 cols 的大小变化了, 因为我们卷积之后的输出是(1,16,30,30)

输出:

cols 的大小:(27,900)

(1,16,30,30)

448

最后是带步长的:

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=2)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)
print(conv2d.parameters())

cols 的大小:(27,225)

(1,16,15,15)

448

最后补充下:

卷积层参数计算公式 :params = 卷积核高 * 卷积核宽 * 通道数目 * 卷积核数目 + 偏置项(卷积核数目)

卷积之后图像大小计算公式:

输出图像的高 =(输入图像的高 + padding(高)*2 - 卷积核高)/ 步长 + 1

输出图像的宽 =(输入图像的宽 + padding(宽)*2 - 卷积核宽)/ 步长 + 1

get_im2col_indices()函数中的变换操作是清楚了, 至于为什么这么变换的原因还需要好好去琢磨. 至于反向传播和优化 optimizer 等研究好了之后再更新了.

来源: https://www.cnblogs.com/xiximayou/p/12706576.html

与本文相关文章

暂无,快来抢沙发吧！