模型无法检测较远距离的物体

模型无法检测较远距离的物体

本文介绍了SSD mobilenet 模型无法检测较远距离的物体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用自定义数据集(电池)训练了一个 SSD Mobilenet 模型.下面给出了电池的示例图片,并附上了我用来训练模型的配置文件.

当物体靠近相机(用网络摄像头测试)时,它以超过 0.95 的概率准确检测到物体,但是当我将物体移到更远的距离时,它没有被检测到.调试时,发现对象被检测到但概率较低0.35.最小阈值设置为 0.5.如果我将阈值 0.5 更改为 0.2,则会检测到对象,但会出现更多错误检测.

参考这个

在上图中,(b)和(c)是两个不同布局的特征图.真实图像中的狗与 4x4 特征图上的红色锚框匹配,而猫与 8x8 特征图上的蓝色锚框匹配.现在,如果您要检测的对象是猫的耳朵,那么将没有与该对象匹配的锚框.所以直觉是:如果没有锚框匹配一个对象,那么该对象根本不会被检测到.要成功检测猫的耳朵,您可能需要一个 16x16 的特征图.

以下是对 feature_map_layout 进行更改的方法.此参数在每个特定的特征提取器实现中配置.假设您使用 ssd_mobilenet_v1_feature_extractor,那么您可以在 这个 文件.

feature_map_layout = {'from_layer': ['Conv2d_11_pointwise', 'Conv2d_13_pointwise', '', '','', ''],'layer_depth': [-1, -1, 512, 256, 256, 128],'use_explicit_padding': self._use_explicit_padding,'use_depthwise': self._use_depthwise,}

这里有 6 个不同尺度的特征图.前两层直接取自 mobilenet 层(因此深度均为 -1),而其余四层则来自额外的卷积运算.可以看出,最底层的特征图来自mobilenet的Conv2d_11_pointwise层.一般层数越低,特征图特征越精细,对小物体的检测效果越好.所以你可以把这个Conv2d_11_pointwise改成Conv2d_5_pointwise(为什么会这样?可以从张量流图中找到,这一层比Conv2d_11_pointwise),它应该有助于检测较小的物体.

但是更好的准确性是有额外成本的,这里的额外成本是检测速度会下降一点,因为有更多的锚框需要处理.(更大的特征图).此外,由于我们选择了 Conv2d_5_pointwise 而不是 Conv2d_11_pointwise,我们失去了 Conv2d_11_pointwise 的检测能力.

如果您不想更改图层而只是添加一个额外的特征图,例如使其总共有 7 个特征图,您也必须将配置文件中的 num_layers 更改为 7.你可以把这个参数看成检测网络的分辨率,层数越低,分辨率越精细.

现在,如果您已经执行了上述操作,还有一件事可以帮助您添加更多带有小对象的图像.如果这不可行,至少您可以尝试添加数据增强操作,例如 random_image_scale

I have trained an SSD Mobilenet model with custom dataset(Battery). Sample image of the battery is given below and also attached the config file which I used to train the model.

When the object is closer to the camera(tested with webcam) it detects the object accurately with probability over 0.95 but when I move the object to a longer distance it is not getting detected. Upon debugging, Found that the object gets detected but with the lower probability 0.35. The minimum threshold is set to 0.5. If I change the threshold 0.5 to 0.2, object is getting detected but there are more false detections.

Referring to this link, SSD does not perform very well for small objects and an alternate solution is to use FasterRCNN, but this model is very slow in real-time. I would like the battery to be detected from longer distance too using SSD.

Please help me with the following

  1. If we want to detect longer distance objects with higher probability, do we need to change the aspect ratios and scale params in the config?
  2. If we want to aspect ratios, how to choose those values with respective to the object?
解决方案

Changing aspect ratios and scales won't help improve the detection accuracy of small objects (since the original scale is already small enough, e.g. min_scale = 0.2). The most important parameter you need to change is feature_map_layout. feature_map_layout determines the number of feature maps (and their sizes) and their corresponding depth (channels). But sadly this parameter cannot be configured in the pipeline_config file, you will have to modify it directly in the feature extractor.

Here is why this feature_map_layout is important in detecting small objects.

In the above figure, (b) and (c) are two feature maps of different layouts. The dog in the groundtruth image matches the red anchor box on the 4x4 feature map, while the cat matches the blue one on the 8x8 feature map. Now if the object you want to detect is the cat's ear, then there would be no anchor boxes to match the object. So the intuition is: If no anchor boxes match an object, then the object simply won't be detected. To successfully detect the cat's ear, what you need is probably a 16x16 feature map.

Here is how you can make the change to feature_map_layout. This parameter is configured in each specific feature extractor implementation. Suppose you use ssd_mobilenet_v1_feature_extractor, then you can find it in this file.

feature_map_layout = {
    'from_layer': ['Conv2d_11_pointwise', 'Conv2d_13_pointwise', '', '',
                   '', ''],
    'layer_depth': [-1, -1, 512, 256, 256, 128],
    'use_explicit_padding': self._use_explicit_padding,
    'use_depthwise': self._use_depthwise,
}

Here the there are 6 feature maps of different scales. The first two layers are taken directly from mobilenet layers (hence the depth are both -1) while the rest four result from extra convolutional operations. It can be seen that the lowest level feature map comes from the layer Conv2d_11_pointwise of mobilenet. Generally the lower the layer, the finer the feature map features, and the better for detecting small objects. So you can change this Conv2d_11_pointwise to Conv2d_5_pointwise (why this? It can be found from the tensorflow graph, this layer has bigger feature map than layer Conv2d_11_pointwise), it should help detect smaller objects.

But better accuracy comes at extra cost, the extra cost here is the detect speed will drop a little because there are more anchor boxes to take care of. (Bigger feature maps). Also since we choose Conv2d_5_pointwise over Conv2d_11_pointwise, we lose the detection power of Conv2d_11_pointwise.

If you don't want to change the layer but simply add an extra feature map, e.g. making it 7 feature maps in total, you will have to change num_layers int the config file to 7 too. You can think of this parameter as the resolution of the detection network, the more lower level layers, the finer the resolution will be.

Now if you have performed above operations, one more thing to help is to add more images with small objects. If this is not feasible, at least you can try adding data augmentation operations like random_image_scale

这篇关于SSD mobilenet 模型无法检测较远距离的物体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-22 16:32