聚类 kmeans 算法在 yolov3 中的应用

yolov3 kmeans

yolov3 在做 boundingbox 预测的时候, 用到了 anchor boxes. 这个 anchors 的含义即最有可能的 object 的 width,height. 事先通过聚类得到. 比如某一个像素单元, 我想对这个像素单元预测出一个 object, 围绕这个像素单元, 可以预测出无数种 object 的形状, 并不是随便预测的, 要参考 anchor box 的大小, 即从已标注的数据中通过聚类统计到的最有可能的 object 的形状.

.cfg 文件内的配置如下:

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319

在用我们自己的数据做训练的时候, 要先修改 anchors, 匹配我们自己的数据. anchors 大小通过聚类得到.

通俗地说, 聚类就是把挨得近的数据点划分到一起.

kmeans 算法的思想很简单

随便指定 k 个 cluster

把点划分到与之最近的一个 cluster

上面得到的 cluster 肯定是不好的, 因为一开始的 cluster 是乱选的嘛

更新每个 cluster 为当前 cluster 的点的均值.

这时候 cluster 肯定变准了, 为什么呢? 比如当前这个 cluster 里有 3 个点, 2 个点靠的很近, 还有 1 个点离得稍微远点, 那取均值的话, 那相当于靠的很近的 2 个点有更多投票权, 新算出来的 cluster 的中心会更加靠近这两个点. 你要是非要抬杠: 那万一一开始我随机指定的 cluster 中心点就特别准呢, 重新取均值反而把中心点弄的不准了? 事实上这是 kmeans 的一个缺陷: 比较依赖初始的 k 个 cluster 的位置. 选择不恰当的 k 值可能会导致糟糕的聚类结果. 这也是为什么要进行特征检查来决定数据集的聚类数目了.

重新执行上述过程

把点划分到与之最近的一个 cluster

更新每个 cluster 为当前 cluster 的点的均值

不断重复上述过程, 直至 cluster 中心变化很小

yolov3 要求的 label 文件格式

<object-class>
  <x_center>
    <y_center>
      <width>
        <height>
          Where:
          <object-class>
            - integer object number from 0 to (classes-1)
            <x_center>
              <y_center>
                <width>
                  <height>
                    - float values relative to width and height of image, it can be equal
                    from (0.0 to 1.0] > for example:
                    <x>
                      =
                      <absolute_x>
                        /
                        <image_width>
                          or
                          <height>
                            =
                            <absolute_height>
                              /
                              <image_height>
                                atention:
                                <x_center>
                                  <y_center>
                                    - are center of rectangle (are not top-left corner)

举例:

1 0.716797 0.395833 0.216406 0.147222

所有的值都是比例.(中心点 x, 中心点 y, 目标宽, 目标高)

kmeans 实现

一般来说, 计算样本点到质心的距离的时候直接算的是两点之间的距离, 然后将样本点划归为与之距离最近的一个质心.

在 yolov3 中样本点的数据是有具体的业务上的含义的, 我们其实最终目的是想知道最有可能的 object 对应的 bounding box 的形状是什么样子的. 所以这个距离的计算我们并不是直接算两点之间的距离, 我们计算两个 box 的 iou, 即 2 个 box 的相似程度. d=1-iou(box1,box_cluster). 这样 d 越小, 说明 box1 与 box_cluster 越类似. 将 box 划归为 box_cluster.

数据加载

f = open(args.filelist)
    lines = [line.rstrip('\n') for line in f.readlines()]
    annotation_dims = []
    size = np.zeros((1,1,3))
    for line in lines:
        #line = line.replace('images','labels')
        #line = line.replace('img1','labels')
        line = line.replace('JPEGImages','labels')
        line = line.replace('.jpg','.txt')
        line = line.replace('.png','.txt')
        print(line)
        f2 = open(line)
        for line in f2.readlines():
            line = line.rstrip('\n')
            w,h = line.split(' ')[3:]
            #print(w,h)
            annotation_dims.append(tuple(map(float,(w,h))))
    annotation_dims = np.array(annotation_dims)

看着一大段, 其实重点就一句

w,h = line.split(' ')[3:]
annotation_dims.append(tuple(map(float,(w,h))))

这里涉及到了 python 的语法, map 用法 https://www.runoob.com/python/python-func-map.html

这样就生成了一个 N*2 矩阵. N 代表你的样本个数.

定义样本点到质心点的距离

计算样本 x 代表的 box 和 k 个质心 box 的 IOU.(即比较 box 之间的形状相似程度).

这里涉及到一个 IOU 的概念: 即交并集比例. 交叉面积 / 总面积.

def IOU(x,centroids):
    similarities = []
    k = len(centroids)
    for centroid in centroids:
        c_w,c_h = centroid
        w,h = x
        if c_w>=w and c_h>=h:     #box(c_w,c_h) 完全包含 box(w,h)
            similarity = w*h/(c_w*c_h)
        elif c_w>=w and c_h<=h:   #box(c_w,c_h) 宽而扁平
            similarity = w*c_h/(w*h + (c_w-w)*c_h)
        elif c_w<=w and c_h>=h:
            similarity = c_w*h/(w*h + c_w*(c_h-h))
        else: #means both w,h are bigger than c_w and c_h respectively
            similarity = (c_w*c_h)/(w*h)
        similarities.append(similarity) # will become (k,) shape
    return np.array(similarities)

kmeans 实现

def kmeans(X,centroids,eps,anchor_file):
    N = X.shape[0]
    iterations = 0
    k,dim = centroids.shape
    prev_assignments = np.ones(N)*(-1)
    iter = 0
    old_D = np.zeros((N,k)) #距离矩阵  N 个点, 每个点到 k 个质心 共计 N*K 个距离
    while True:
        D = []
        iter+=1
        for i in range(N):
            d = 1 - IOU(X[i],centroids)  #d 是一个 k 维的
            D.append(d)
        D = np.array(D) # D.shape = (N,k)
        print("iter {}: dists = {}".format(iter,np.sum(np.abs(old_D-D))))
        #assign samples to centroids
        assignments = np.argmin(D,axis=1) #返回每一行的最小值的下标. 即当前样本应该归为 k 个质心中的哪一个质心.
        if (assignments == prev_assignments).all() :  #质心已经不再变化
            print("Centroids =",centroids)
            write_anchors_to_file(centroids,X,anchor_file)
            return
        #calculate new centroids
        centroid_sums=np.zeros((k,dim),np.float)  #(k,2)
        for i in range(N):
            centroid_sums[assignments[i]]+=X[i]        #将每一个样本划分到对应质心
        for j in range(k):
            centroids[j] = centroid_sums[j]/(np.sum(assignments==j)) #更新质心
        prev_assignments = assignments.copy()
        old_D = D.copy()

计算每个样本点到每一个 cluster 质心的距离 d = 1- IOU(X[i],centroids) 表示样本点到每个 cluster 质心的距离.

np.argmin(D,axis=1) 得到每一个样本点离哪个 cluster 质心最近

argmin 函数用法参考 https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmin.html

计算每一个 cluster 中的样本点总和, 取平均, 更新 cluster 质心.

for i in range(N):
    centroid_sums[assignments[i]]+=X[i]        #将每一个样本划分到对应质心
for j in range(k):
    centroids[j] = centroid_sums[j]/(np.sum(assignments==j)) #更新质心

不断重复上述过程, 直到质心不再变化聚类完成.

保存聚类得到的 anchor box 大小

def write_anchors_to_file(centroids,X,anchor_file):
    f = open(anchor_file,'w')
    anchors = centroids.copy()
    print(anchors.shape)
    for i in range(anchors.shape[0]):
        anchors[i][0]*=width_in_cfg_file/32.
        anchors[i][1]*=height_in_cfg_file/32.
    widths = anchors[:,0]
    sorted_indices = np.argsort(widths)
    print('Anchors =', anchors[sorted_indices])
    for i in sorted_indices[:-1]:
        f.write('%0.2f,%0.2f,'%(anchors[i,0],anchors[i,1]))
    #there should not be comma after last anchor, that's why
    f.write('%0.2f,%0.2f\n'%(anchors[sorted_indices[-1:],0],anchors[sorted_indices[-1:],1]))
    f.write('%f\n'%(avg_IOU(X,centroids)))
    print()

由于 yolo 要求的 label 文件中, 填写的是相对于 width,height 的比例. 所以得到的 anchor box 的大小要乘以模型输入图片的尺寸.

上述代码里

anchors[i][0]*=width_in_cfg_file/32.
        anchors[i][1]*=height_in_cfg_file/32.

这里除以 32 是 yolov2 的算法要求. yolov3 实际上不需要. 参见以下链接 https://github.com/pjreddie/darknet/issues/911

for Yolo v2: width=704 height=576 in cfg-file
./darknet detector calc_anchors data/hand.data -num_of_clusters 5 -width 22 -height 18 -show
for Yolo v3: width=704 height=576 in cfg-file
./darknet detector calc_anchors data/hand.data -num_of_clusters 9 -width 704 -height 576 -show

And you can use any images with any sizes.

完整代码见 https://github.com/AlexeyAB/darknet/tree/master/scripts

用法: python3 gen_anchors.py -filelist ../build/darknet/x64/data/park_train.txt

来源: https://www.cnblogs.com/sdu20112013/p/10937717.html

与本文相关文章

暂无,快来抢沙发吧！