当前位置：

首页
/
IT
/
程序
/
Python
/
物体检测丨浅析 One stage detector「YOLOv1,v2,v3,SSD」

物体检测丨浅析 One stage detector「YOLOv1,v2,v3,SSD」

之前做 object detection 用到的都是 two stage,one stage 如 YOLO,SSD 很少接触, 这里开一篇 blog 简单回顾该系列的发展. 很抱歉, 我本人只能是蜻蜓点水, 很多细节也没有弄清楚. 有需求的朋友请深入论文和代码, 我在末尾也列出了很多优秀的参考文章.

YOLOv1
You Only Look Once: Unified, Real-Time Object Detection

核心思想

用一个 CNN 实现 end-to-end, 将目标检测作为回归问题解决.

将输入图片分割为 $S\times S$ 网格, 如果物体的中心落入网格中央, 这个网格将负责检测这个物体. 因此网络学会了去预测中心落在该网格中的物体.

每个网格预测 $B$ 个 bounding boxes 和 confidence scores.confidence scores 包含两方面:

这个 boundingbox 包含物体的可能性 $Pr(Object)$:bb 包含物体时 $Pr(Object)=1$ 否则为 0.

这个 boundingbox 的准确度 $IOU^{truth}_{pred}$:pred 和 gt 的 IoU.

因此, confidence scores 可定义为 $Pr(Object)*IoU^{truth}_{pred} $

每个 bbox 包含 5 个 predictions:$x,y,w,h 和 confidence$:$(x,y)$ 表示 bbox 中心坐标,$h,w$ 表示 bbox 长宽,$confidence$ 表示 pred 和 gt box 的 IoU.

每个网格预测 $C$ 个类别概率 $Pr(Class_i|Object)$, 表示该网格负责预测的边界框目标属于各类的概率. 我们不考虑 box 的数量即 $B$.

测试阶段我们将类别概率和 confidence score 相乘, 得每个 box 的类特定 confidence score:

\[Pr(Class_i|Object)*Pr(Object)*IoU^{truth}_{pred}=Pr(Class_i)*IoU^{truth}_{pred}\]

表示 box 中类别出现的概率和预测 box 与目标的拟合程度.

将图片分解为 $S\times S $个 gird, 每个 grid 预测 $B$个 bbox,confidence 和 $C$个类概率, 预测值为 $S\times S \times (B*5+C)$

网络架构

网络结构参考 GooLeNet, 包含 24 个卷积层和 2 个激活层, 卷积层使用 1x1 卷积降维然后跟 3x3 卷积. 对于卷积层和全连接层, 采用 Leaky ReLU:$max(x,0.1x)$, 最后一层采用线性激活层.

网络输出维度为 30($B=2$), 前 20 个元素是类别概率值, 然后 2 个是边界框置信度, 最后 8 个是 bbox 的 $(x,y,w,h)$.

Loss:YOLO 把分类问题转化为回归问题

第一项是 bbox 中心坐标误差项; 第二项是 bbox 高与宽误差项;

第三项是包含目标 bbox 置信度误差项; 第四项是不包含目标 bbox 置信度误差项;

最后一项是包含目标的 grid 分类误差项.

将 Loss 对应到 predtion 张量上:

实验结果

SSD
SSD:Single Shot MultiBox Detector

核心思想

速度比 YOLO 快, 精度可以跟 Faster RCNN 媲美.

采用多尺度特征图用于检测: 大特征图检测小目标, 小特征图检测大目标.

采用卷积进行检测: 与 YOLO 最后采用全连接层不同, SSD 直接采用卷积提取检测结果.

设置先验框: 借鉴 Faster RCNN 中 anchor 理念, 为每个网格设置不同长宽比的 anchor,bbox 以 anchor 为基准.

网络架构

在 VGG16 基础上增加了卷积层获得更多特征图用于检测.

上是 SSD, 下是 YOLO

实验结果

YOLOv2
YOLO9000: Better, Faster, Stronger

核心思想

YOLOv1 虽然检测速度快, 但检测精度不如 RCNN,YOLOv1 定位不够准确, 召回率也低. 于是 YOLOv2 提出了几种改进策略来提升 YOLO 模型的定位准确度和召回率, 并保持检测速度.

Better

Batch Normalization: 加快收敛并起到正则化效果, 防止过拟合.

High Resolution Classifier: 在 ImageNet 数据上使用 $448\times448$ 输入来 finetune.

Convolutional With Anchor Boxes: 借鉴 Faster R-CNN 中 RPN 的 anchor boxes 策略, 预测 offset 而不是 coordinate.

Dimension Clusters: 采用 k-means 来替代人工选取 anchor. 并使用下式来度量距离.

\[d(box,centroid)=1-IOU(box, centroid)\]

Direct location prediction: 改变了预测 bbox 的计算公式

Fine-Grained Features: 小物体需要更精细的特征图. 采用 passthrough 层将高分辨率特征 concat 低分辨率特征, 类似于 ResNet.

Multi-Scale Training: 每隔 10batch, 网络随机选择新的图像尺寸.

Faster
Darknet-19
Training for classification
Training for detection
Stronger
Hierarchical classification
Dataset combination with WordTree
Joint classification and detection

网络架构

实验结果

YOLOv3
YOLOv3: An Incremental Improvement

核心思想

Bounding Box Prediction: 和 v2 一样使用聚类来获得 anchor 并预测 bbox 坐标.

Class Prediction: 不使用 softmax, 使用二元交叉熵进行类别预测.

Predictions Across Scales: 跨尺度预测, 类似 FPN 使用 3 个尺度, 预测为 $N\times N\times[3*(4+1+80)]$,4 个 box offsets,1 个 obj prediction 和 80 个类 prediction.

Feature Extractor:Darknet-53, 加入了 Residual.

网络架构

实验结果

参考

paper
[1]Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
[2]Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.
[3]Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271.
[4]Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
blog

目标检测 | YOLO 原理与实现 https://zhuanlan.zhihu.com/p/32525231

目标检测 | YOLOv2 原理与实现 (附 YOLOv3) https://zhuanlan.zhihu.com/p/35325884

目标检测 | SSD 原理与实现 https://zhuanlan.zhihu.com/p/33544892

你真的读懂 yolo 了吗? https://zhuanlan.zhihu.com/p/37850811

artifical-intelligence

[YOLO] yolo v1 到 yolo v3 https://zhuanlan.zhihu.com/p/37668951

What do we learn from single shot object detectors (SSD, YOLOv3), FPN & Focal loss (RetinaNet)?

YOLO 的发展 https://zhuanlan.zhihu.com/p/41438057

来源: https://www.cnblogs.com/vincent1997/p/10945551.html

与本文相关文章

暂无,快来抢沙发吧！