Keras 入门 (五) 搭建 ResNet 对 CIFAR-10 进行图像分类

本文将会介绍如何利用 keras 来搭建著名的 ResNet 神经网络模型, 在 CIFAR-10 数据集进行图像分类.

数据集介绍

CIFAR-10 数据集是已经标注好的图像数据集, 由 Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton 三人收集, 其访问网址为: https://www.cs.toronto.edu/~kriz/cifar.html .

CIFAR-10 数据集包含 60000 张尺寸为 32x32 的彩色图片, 共分成 10 个分类 (类别之间互相独立), 每个类别一共 6000 张图片. 该数据集划分为训练集和测试集, 其中训练集 5000 张图片, 测试集 10000 张图片.

该数据集分为 5 个训练批次和 1 个测试批次, 每个批次一共 10000 张图片. 测试批次包含从每个分类中随机选取的 1000 张图片. 训练批次包含剩下的图片, 但是每个训练批次的某些类别的图片会比其他类别多.

下图为从每个类别中选取的 10 张示例图片:

本文中选用的 CIFAR-10 数据集下载网址为: https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz, 文件夹内容如下:

我们尝试着用 Python 程序读取里面的图片 (图片可视化),Python 程序代码如下:

# -*- coding: utf-8 -*-
import cv2
import pickle
# 读取文件
fpath = 'cifar-10-batches-py/data_batch_1'
with open(fpath, 'rb') as f:
    d = pickle.load(f, encoding='bytes')
data = d[b'data']
labels = d[b'labels']
data = data.reshape(data.shape[0], 3, 32, 32).transpose(0, 2, 3, 1)
# 保存第 image_no 张图片
strings=['airplane', 'automobile', 'bird', 'cat', 'deer',
         'dog', 'frog', 'horse', 'ship', 'truck']
image_no = 1000
label = strings[labels[image_no]]
image = data[image_no,:,:,:]
cv2.imwrite('%s.jpg' % label, image)

运行结果如下:

图片虽然比较模糊, 但还是可以看出这是一辆车, 属于 truck 类别.

ResNet 模型

图像分类中的经典模型为 CNN, 但 CNN 随着层数的增加, 显示出退化问题, 即深层次的网络反而不如稍浅层次的网络性能; 这并非是过拟合导致的, 因为在训练集上就显示出退化差距. 而 ResNet 能较好地解决这个问题.

ResNet 全名 Residual Network, 中文名为残差神经网络, 曾获得 2015 年 ImageNet 的冠军. ResNet 的主要思想在于残差块, Kaiming He 等设计了一种 skip connection(或者 shortcut connections) 结构, 使得网络具有更强的 identity mapping(恒等映射) 的能力, 从而拓展了网络的深度, 同时也提升了网络的性能. 残差块的结构如下:

F(x)=H(x)−x,x 为浅层的输出, H(x) 为深层的输出, F(x) 为夹在二者中间的的两层代表的变换, 当浅层的 x 代表的特征已经足够成熟, 如果任何对于特征 x 的改变都会让 loss 变大的话, F(x) 会自动趋向于学习成为 0,x 则从恒等映射的路径继续传递. 这样就在不增加计算成本的情况下实现了一开始的目的: 在前向过程中, 当浅层的输出已经足够成熟 (optimal), 让深层网络后面的层能够实现恒等映射的作用.

示例的残差块如下图:

左边针对的是 ResNet34 浅层网络, 右边针对的是 ResNet50/101/152 深层网络, 右边这个又被叫做 bottleneck.bottleneck 很好地减少了参数数量.

以上是关于 ResNet 的一些简单介绍, 更多细节有待于研究.

模型训练

我们利用 Keras 官方网站给出的 ResNet 模型对 CIFAR-10 进行图片分类.

项目结构如下图:

其中 load_data.py 脚本将数据集导入进来, 分为训练集和测试集, 完整代码如下:

# -*- coding: utf-8 -*-
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras.models import Model
import numpy as np
import os
# 使用 GPU, 自己根据机器配置调整, 默认不开启
# os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7,8"
from load_data import load_data
# Training parameters
batch_size = 32
epochs = 100
num_classes = 10
# Subtracting pixel mean improves accuracy
subtract_pixel_mean = True
n = 3
# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 1
# Computed depth from supplied model parameter n
depth = n * 6 + 2
# Model name, depth and version
model_type = 'ResNet%dv%d' % (depth, version)
# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = load_data()
print('load data successfully!')
# Input image dimensions.
input_shape = x_train.shape[1:]
# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
# If subtract pixel mean is enabled
if subtract_pixel_mean:
    x_train_mean = np.mean(x_train, axis=0)
    x_train -= x_train_mean
    x_test -= x_train_mean
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('Begin model training...')
# Learning Rate Schedule
def lr_schedule(epoch):
    lr = 1e-3
    if epoch> 180:
        lr *= 0.5e-3
    elif epoch> 160:
        lr *= 1e-3
    elif epoch> 120:
        lr *= 1e-2
    elif epoch> 80:
        lr *= 1e-1
    print('Learning rate:', lr)
    return lr
# resnet layer
def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))
    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x
def resnet_v1(input_shape, depth, num_classes=10):
    # ResNet Version 1 Model builder [a]
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
    # Start model definition.
    num_filters = 16
    num_res_blocks = int((depth - 2) / 6)
    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # Instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            if stack> 0 and res_block == 0:  # first layer but not first stack
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            if stack> 0 and res_block == 0:  # first layer but not first stack
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2
    # Add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)
    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model
model = resnet_v1(input_shape=input_shape, depth=depth, num_classes=num_classes)
model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=lr_schedule(0)),
              metrics=['accuracy'])
model.summary()
print(model_type)
# Prepare model model saving directory.
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'garbage_%s_model.{epoch:03d}.h5' % model_type
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
filepath = os.path.join(save_dir, model_name)
# Prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath=filepath,
                             monitor='val_acc',
                             verbose=1,
                             save_best_only=True)
lr_scheduler = LearningRateScheduler(lr_schedule)
lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)
callbacks = [checkpoint, lr_reducer, lr_scheduler]
# Run training, with data augmentation.
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
        # set input mean to 0 over the dataset
        featurewise_center=False,
        # set each sample mean to 0
        samplewise_center=False,
        # divide inputs by std of dataset
        featurewise_std_normalization=False,
        # divide each input by its std
        samplewise_std_normalization=False,
        # apply ZCA whitening
        zca_whitening=False,
        # epsilon for ZCA whitening
        zca_epsilon=1e-06,
        # randomly rotate images in the range (deg 0 to 180)
        rotation_range=0,
        # randomly shift images horizontally
        width_shift_range=0.1,
        # randomly shift images vertically
        height_shift_range=0.1,
        # set range for random shear
        shear_range=0.,
        # set range for random zoom
        zoom_range=0.,
        # set range for random channel shifts
        channel_shift_range=0.,
        # set mode for filling points outside the input boundaries
        fill_mode='nearest',
        # value used for fill_mode = "constant"
        cval=0.,
        # randomly flip images
        horizontal_flip=True,
        # randomly flip images
        vertical_flip=False,
        # set rescaling factor (applied before any other transformation)
        rescale=None,
        # set function that will be applied on each input
        preprocessing_function=None,
        # image data format, either "channels_first" or "channels_last"
        data_format=None,
        # fraction of images reserved for validation (strictly between 0 and 1)
        validation_split=0.0)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    steps_per_epoch=len(x_train) // batch_size,
                    validation_data=(x_test, y_test),
                    epochs=epochs, verbose=1, workers=4,
                    callbacks=callbacks)
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

输出的模型结构如下:

在 GPU 上进行模型训练, 训练结果如下:

Test loss: 0.4439272038936615
Test accuracy: 0.9128

总结

本项目已经开源, GitHub 地址为: https://github.com/percent4/resnet_4_cifar10 .

感谢大家阅读, 有问题请批评指正~

来源: https://www.cnblogs.com/jclian91/p/12290906.html

与本文相关文章

暂无,快来抢沙发吧！