问题描述
我们知道对象检测框架(如 faster-rcnn
和 mask-rcnn
)具有 roi池层
或 roi align层
.但是,为什么ssd和yolo框架没有这样的层呢?
首先,我们应该了解 roi池
的目的:从提案区域中获取固定大小的要素表示在功能图上.由于建议的区域可能具有各种大小,因此,如果我们直接使用区域中的特征,它们的形状将有所不同,因此无法馈入完全连接的层进行预测.(我们已经知道完全连接的层需要固定的形状输入).要进一步阅读,请.
那么为什么 YOLO 和 SSD 不使用 roi pooling
?仅仅是因为他们不使用区域建议!它们的设计与 R-CNN,Fast R-CNN,Faster R-CNN 之类的模型本质上不同,实际上 YOLO 和 SSD 已归类作为一级
检测器,而r-cnn系列( R-CNN,Fast R-CNN,Faster R-CNN )被称为两级
>检测器的原因很简单,因为它们先提出区域,然后执行分类和回归.
对于一级
检测器,它们直接从特征图执行预测(分类和回归).他们的方法是将图像划分为网格,每个网格将使用置信度得分和类别得分来预测固定数量的边界框.原始的 YOLO 使用单个比例尺特征图,而 SSD 使用多比例尺特征图,如以下
我们可以通过 YOLO和SSD 看到,最终输出是一个固定形状的张量.因此,它们的行为与线性回归
之类的问题非常相似,因此被称为一级
检测器.
We know that the object detection framework like faster-rcnn
and mask-rcnn
has an roi pooling layer
or roi align layer
. But why ssd and yolo framework has no such layers?
First of all we should understand what is the purpose of roi pooling
: to have fixed size feature representation from proposal regions on the feature maps. Because the proposed regions could come as in various sizes, if we directly use the features from the regions, they are in different shapes and therefore cannot be fed to fully-connected layers for prediction. (As we already knew fully-connected layers require fixed shape inputs). For further reading, here is a nice answer.
So we understood that roi
pooling essentially requires two inputs, proposed regions and feature maps. As is clearly described in the following .
So why don't YOLO and SSD use roi pooling
? Simply because they don't use region proposals! They are designed inherently different from models like R-CNN, Fast R-CNN, Faster R-CNN, in fact YOLO and SSD are categoried as one-stage
detectors while r-cnn series (R-CNN, Fast R-CNN, Faster R-CNN) are called two-stage
detectors simply because they propose regions first and then perform classification and regression.
For one-stage
detecors, they perform predictions (classification and regression )directly from feature maps. Their method is to divide images in grids and each grid will predict a fixed amount of bounding boxes with confidence scores and class scores. The original YOLO used a single scale feature map while SSD used multi-scale feature maps, as clearly shown in the following
We can see with YOLO and SSD , the final output is a fixed shaped tensor. Therefore they behave very similiar to problems like linear regression
, hence they are called one-stage
detectors.
这篇关于为什么ssd和yolo没有roi池层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!