YOLO, 是 You Only Look Once 的缩写, 一种基于深度卷积神经网络的物体检测算法, YOLO v3 https://pjreddie.com/media/files/papers/YOLOv3.pdf 是 YOLO 的第 3 个版本, 检测算法更快更准, 2018 年 4 月 8 日.
本文源码 https://github.com/SpikeKing/keras-yolo3-detection : https://github.com/SpikeKing/keras-yolo3-detection
欢迎 Follow 我的 GitHub https://github.com/SpikeKing : https://github.com/SpikeKing
YOLO
数据集
YOLO v3 已经提供 COCO http://cocodataset.org/ (Common Objects in Context)数据集的模型参数, 支持直接用于物体检测, 模型 248M, 下载:
wget https://pjreddie.com/media/files/yolov3.weights
将模型参数转换为 Keras 的模型参数, 模型 248.6M, 转换:
python convert.py -w yolov3.cfg model_data/yolov3.weights model_data/yolo_weights.h5
画出网络结构:
plot_model(model, to_file='./model_data/model.png', show_shapes=True, show_layer_names=True) # 网络图
COCO 含有 80 个类别:
- person(人)
- bicycle(自行车) car(汽车) motorbike(摩托车) aeroplane(飞机) bus(公共汽车) train(火车) truck(卡车) boat(船)
- traffic light(信号灯) fire hydrant(消防栓) stop sign(停车标志) parking meter(停车计费器) bench(长凳)
- bird(鸟) cat(猫) dog(狗) horse(马) sheep(羊) cow(牛) elephant(大象) bear(熊) zebra(斑马) giraffe(长颈鹿)
- backpack(背包) umbrella(雨伞) handbag(手提包) tie(领带) suitcase(手提箱)
- frisbee(飞盘) skis(滑雪板双脚) snowboard(滑雪板) sports ball(运动球) kite(风筝) baseball bat(棒球棒) baseball glove(棒球手套) skateboard(滑板) surfboard(冲浪板) tennis racket(网球拍)
- bottle(瓶子) wine glass(高脚杯) cup(茶杯) fork(叉子) knife(刀)
- spoon(勺子) bowl(碗)
- banana(香蕉) apple(苹果) sandwich(三明治) orange(橘子) broccoli(西兰花) carrot(胡萝卜) hot dog(热狗) pizza(披萨) donut(甜甜圈) cake(蛋糕)
- chair(椅子) sofa(沙发) pottedplant(盆栽植物) bed(床) diningtable(餐桌) toilet(厕所) tvmonitor(电视机)
- laptop(笔记本) mouse(鼠标) remote(遥控器) keyboard(键盘) cell phone(电话)
- microwave(微波炉) oven(烤箱) toaster(烤面包器) sink(水槽) refrigerator(冰箱)
- book(书) clock(闹钟) vase(花瓶) scissors(剪刀) teddy bear(泰迪熊) hair drier(吹风机) toothbrush(牙刷)
YOLO 的默认 anchors 是 9 个, 对应三个尺度, 每个尺度含有 3 个 anchors, 如下:
10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
检测器
YOLO 检测类的构造器:
anchors,model,classes 是参数文件, 其中, anchors 可以使用默认, 但是 model 与 classes 必须相互匹配;
score 和 iou 是检测参数, 即置信度阈值和交叉区域阈值, 置信度阈值避免误检, 交叉区域阈值避免物体重叠;
self.class_names,self.anchors, 读取类别和 anchors;
self.sess 是 TensorFlow 的上下文环境;
self.model_image_size
, 检测图片尺寸, 将原图片同比例 resize 检测尺寸, 空白填充;
self.generate()是参数流程, 输出框 (boxes), 置信度(scores) 和类别(classes);
源码:
- class YOLO(object):
- def __init__(self):
- self.anchors_path = 'configs/yolo_anchors.txt' # anchors
- self.model_path = 'model_data/yolo_weights.h5' # 模型文件
- self.classes_path = 'configs/coco_classes.txt' # 类别文件
- self.score = 0.3 # 置信度阈值
- # self.iou = 0.45
- self.iou = 0.20 # 交叉区域阈值
- self.class_names = self._get_class() # 获取类别
- self.anchors = self._get_anchors() # 获取 anchor
- self.sess = K.get_session()
- self.model_image_size = (416, 416) # fixed size or (None, None), hw
- self.boxes, self.scores, self.classes = self.generate()
- def _get_class(self):
- classes_path = os.path.expanduser(self.classes_path)
- with open(classes_path) as f:
- class_names = f.readlines()
- class_names = [c.strip() for c in class_names]
- return class_names
- def _get_anchors(self):
- anchors_path = os.path.expanduser(self.anchors_path)
- with open(anchors_path) as f:
- anchors = f.readline()
- anchors = [float(x) for x in anchors.split(',')]
- return np.array(anchors).reshape(-1, 2)
参数流程: 输出框 (boxes), 置信度(scores) 和类别(classes)
在 yolo_body 网络中, 加载 yolo_model 参数;
为不同的框, 生成不同的颜色, 随机;
将模型的输出, 经过置信度和交叉区域, 过滤框;
源码:
- def generate(self):
- model_path = os.path.expanduser(self.model_path) # 转换~
- assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'
- num_anchors = len(self.anchors) # anchors 的数量
- num_classes = len(self.class_names) # 类别数
- # 加载模型参数
- self.yolo_model = yolo_body(Input(shape=(None, None, 3)), 3, num_classes)
- self.yolo_model.load_weights(model_path)
- print('{} model, {} anchors, and {} classes loaded.'.format(model_path, num_anchors, num_classes))
- # 不同的框, 不同的颜色
- hsv_tuples = [(float(x) / len(self.class_names), 1., 1.)
- for x in range(len(self.class_names))] # 不同颜色
- self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
- self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors)) # RGB
- np.random.seed(10101)
- np.random.shuffle(self.colors)
- np.random.seed(None)
- # 根据检测参数, 过滤框
- self.input_image_shape = K.placeholder(shape=(2,))
- boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors, len(self.class_names),
- self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou)
- return boxes, scores, classes
检测方法 detect_image
第 1 步, 图像处理:
将图像等比例转换为检测尺寸, 检测尺寸需要是 32 的倍数, 周围进行填充;
将图片增加 1 维, 符合输入参数格式;
- if self.model_image_size != (None, None): # 416x416, 416=32*13, 必须为 32 的倍数, 最小尺度是除以 32
- assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required'
- assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required'
- boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) # 填充图像
- else:
- new_image_size = (image.width - (image.width % 32), image.height - (image.height % 32))
- boxed_image = letterbox_image(image, new_image_size)
- image_data = np.array(boxed_image, dtype='float32')
- print('detector size {}'.format(image_data.shape))
- image_data /= 255. # 转换 0~1
- image_data = np.expand_dims(image_data, 0) # 添加批次维度, 将图片增加 1 维
第 2 步, feed 数据, 图像, 图像尺寸;
- out_boxes, out_scores, out_classes = self.sess.run(
- [self.boxes, self.scores, self.classes],
- feed_dict={
- self.yolo_model.input: image_data,
- self.input_image_shape: [image.size[1], image.size[0]],
- K.learning_phase(): 0
- })
第 3 步, 绘制边框, 自动设置边框宽度, 绘制边框和类别文字, 使用 Pillow.
- font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
- size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) # 字体
- thickness = (image.size[0] + image.size[1]) // 512 # 厚度
- for i, c in reversed(list(enumerate(out_classes))):
- predicted_class = self.class_names[c] # 类别
- box = out_boxes[i] # 框
- score = out_scores[i] # 执行度
- label = '{} {:.2f}'.format(predicted_class, score) # 标签
- draw = ImageDraw.Draw(image) # 画图
- label_size = draw.textsize(label, font) # 标签文字
- top, left, bottom, right = box
- top = max(0, np.floor(top + 0.5).astype('int32'))
- left = max(0, np.floor(left + 0.5).astype('int32'))
- bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
- right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
- print(label, (left, top), (right, bottom)) # 边框
- if top - label_size[1]>= 0: # 标签文字
- text_origin = np.array([left, top - label_size[1]])
- else:
- text_origin = np.array([left, top + 1])
- # My kingdom for a good redistributable image drawing library.
- for i in range(thickness): # 画框
- draw.rectangle(
- [left + i, top + i, right - i, bottom - i],
- outline=self.colors[c])
- draw.rectangle( # 文字背景
- [tuple(text_origin), tuple(text_origin + label_size)],
- fill=self.colors[c])
- draw.text(text_origin, label, fill=(0, 0, 0), font=font) # 文案
- del draw
目标检测
使用 YOLO 检测器, 检测图像:
- def detect_img_for_test(yolo):
- img_path = './dataset/a4386X6Te9ajq866zgOtWKLx18XGW.jpg'
- image = Image.open(img_path)
- r_image = yolo.detect_image(image)
- r_image.show()
- yolo.close_session()
- if __name__ == '__main__':
- detect_img_for_test(YOLO())
效果:
output
参考 1 https://arxiv.org/abs/1804.02767 , 参考 2 https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b , 参考 3 https://github.com/qqwweee/keras-yolo3 ,Thx@qqwweee
OK, that's all! Enjoy it!
来源: http://www.jianshu.com/p/44450ff2569a