二.训练阶段的改动
本文改动的是以faster_rcnn_end_to_end模式的流程进行,alt_opt模式没有尝试但应该是类似的。在训练时,我们调用的是train.py它直接解析train.prototxt,其第一个步骤就是载入的数据,这里使用的就是上面提到的roi_data_layer进行的数据封装。
因此首先打开对应的文件夹roi_data_layer下的layer.py寻找需要改动的位置。因为一个层都是要先进行初始化的,因此首先查看setup函数。这里主要就是设定blob的维度。
主要的改动就是将上面的提到95左右的
top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,max(cfg.TRAIN.SCALES), cfg.TRAIN.MAX_SIZE)
改为
top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,cfg.TRAIN.MAX_SIZE, cfg.TRAIN.MAX_SIZE)
这里主要就是将未来data blob的长和宽确定为一个固定的值,这里可以在fast_rcnn/config.py下重新设计一个参数如train_target_size来表示,但我为了偷懒就直接使用了原有的 cfg.TRAIN.MAX_SIZE。
但是我们知道光改动这里并不能真正起作用,接下来是运行forward函数,其中调用了get_next_minibatch函数获取数据。打开同一个文件夹下的minibatch.py文件
首先是layer.py中最终调用的get_minibatch()函数:
这里要把20行的
random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES), size=num_images)
改为
random_scale_inds = npr.randint(0, high=1, size=num_images)
因为我也是尝试改动,所以就没有使用图像金塔功能(论文的实验也没有使用),所以之间high设为1就行了,实际这是一个正在(0,high)产生num_images个随机数的函数,这里我们的num_images(RPN只允许一张)和high都是1,所以就只能是0了。
从下文看就是一次只能读取一张图片训练。接下就是运行第29行:
_get_image_blob(roidb, random_scale_inds)
调用了_get_image_blob函数在129行
def _get_image_blob(roidb, scale_inds):
#Builds an input blob from the images in the roidb at the specified scales.
num_images = len(roidb)
processed_ims = []
im_scales = []
for i in xrange(num_images):
im = cv2.imread(roidb[i]['image'])
if roidb[i]['flipped']:
im = im[:, ::-1, :]
target_size = cfg.TRAIN.SCALES[scale_inds[i]]
im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size,cfg.TRAIN.MAX_SIZE)
im_scales.append(im_scale)
processed_ims.append(im)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob, im_scales
这里可以看到这个函数的功能就是对于num_images张图片遍历(实际就是1张),然后获得一个放缩后的图片和对应的放缩尺寸。所以要将target_size设置为我们想要那个参数,我这里就是 cfg.TRAIN.MAX_SIZE。这里就是最重要的图片放缩函数prep_im_for_blob().它在utils/bolb.py中,之前已经展示过:
def prep_im_for_blob(im, pixel_means, target_size, max_size):
"""Mean subtract and scale an image for use in a blob."""
im = im.astype(np.float32, copy=False)
im -= pixel_means
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
if np.round(im_scale * im_size_max) > max_size:
im_scale = float(max_size) / float(im_size_max)
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
return im, im_scale
整体改为:
def prep_im_for_blob(im, pixel_means, im_info):
"""Mean subtract and scale an image for use in a blob."""
im = im.astype(np.float32, copy=False)
im -= pixel_means
im_shape = im.shape[0:2]
fy_scale, fx_scale = im_info / im_shape
im = cv2.resize(im, None, None, fx=fx_scale, fy=fy_scale,
interpolation=cv2.INTER_LINEAR)
im_scales = np.array([fx_scale, fy_scale])
return im, im_scales
改动思路很简单,就是为了让图片的长宽一致,我对于原图的长和宽分别除以不同的scale值进行放缩。这样对应的函数调用就要改为:
im_info = np.array([target_size, target_size], dtype = np.float32)
im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, im_info)
这里用im_info数组代替原本的两个参数传入了函数,使得长宽的target一致。我们要记住的就是此时的im_scale不再是一个变量,而是一个两个元素的数组。
返回到get_next_minibatch函数中接下的代码为:
if cfg.TRAIN.HAS_RPN:
assert len(im_scales) == 1, "Single batch only"
assert len(roidb) == 1, "Single batch only"
# gt boxes: (x1, y1, x2, y2, cls)
gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0]
gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
blobs['gt_boxes'] = gt_boxes
blobs['im_info'] = np.array(
[[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],
dtype=np.float32)
else: # not using RPN
# Now, build the region of interest and label blobs
改为
if cfg.TRAIN.HAS_RPN:
assert len(im_scales) == 1, "Single batch only"
assert len(roidb) == 1, "Single batch only"
# gt boxes: (x1, y1, x2, y2, cls)
gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
gt_boxes[:, [0,2]]= roidb[0]['boxes'][gt_inds][:, [0,2]] * im_scales[0][0]
gt_boxes[:, [1,3]] = roidb[0]['boxes'][gt_inds][:, [1,3]] * im_scales[0][1]
gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
blobs['gt_boxes'] = gt_boxes
blobs['im_info'] = np.array([[cfg.TRAIN.MAX_SIZE,cfg.TRAIN.MAX_SIZE]], dtype = np.float32)
else: # not using RPN
# Now, build the region of interest and label blobs
因为我是faster rcnn这里的cfg.TRAIN.HAS_RPN肯定是true(设置在yml文件中), 这里的修改就是将gt_boxes由统一尺度放缩变为长和宽两个维度上分别进行放缩。以及重新设置一下im_info的数据维度,因为图片大小都统一了就没必要记录这么多参数了。代码中else后面的函数实际上也有涉及到尺度的问题,主要是rois的尺度变化,但却是不使用RPN情况下,使用selective search直接寻找ROIs的相关代码,我估计应该是作者在原fast rcnn基础上升级faster rcnn,保留了原版功能,因此我们可以不对这一部分代码进行改变,想改也可以依照我前面的思路加入一些代码即可。
这里返回layer.py,注意我们改变了im_info的格式,所以要在89行把top[idx].reshape(1,4)变成top[idx].reshape(1,2)。这时网络就可以接着训练
到了RPN阶段的anchor_target_layer和proposal_laryer时,两个层都会有一个读im_info的操作:im_info = bottom[2].data[0, :]
注意这里只要第一组数据,而我们知道faster rcnn原代码中不同的图片的im_info是不同的,这里只取第一张进行操作,显然不合适,所以它就强行规定了faster rcnn的ims_per_batch为1。这样就只有一张图片。而minibacth.py中看到原本im_info为一个13的数组,现在则是一个12的数组,所以在proposal_layer.py第126行,原本有一个操作为 keep = _filter_boxes(proposals, min_size * im_info[3])意思是proposals的边框不要比设定的长度短,在这里我们就直接设定为
keep = _filter_boxes(proposals, min_size)即可(会有一定偏差,但这样引入的参数比较简单)。
至此训练阶段更改成功,截图纪念
可以看到所有图片都被resize到了672672大小,特征图经16倍的缩小为4242
测试部分更改只需要改变test.py一个文件即可,附在下面
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Test a Fast R-CNN network on an imdb (image database)."""
from fast_rcnn.config import cfg, get_output_dir
from fast_rcnn.bbox_transform import clip_boxes, bbox_transform_inv
import argparse
from utils.timer import Timer
import numpy as np
import cv2
import caffe
from fast_rcnn.nms_wrapper import nms
import cPickle
from utils.blob import im_list_to_blob
import os
def _get_image_blob(im,target_size):
"""Converts an image into a network input.
Arguments:
im (ndarray): a color image in BGR order
Returns:
blob (ndarray): a data blob holding an image pyramid
im_scale_factors (list): list of image scales (relative to im) used
in the image pyramid
"""
'''
im_orig = im.astype(np.float32, copy=True)
im_orig -= cfg.PIXEL_MEANS
im_shape = im_orig.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
processed_ims = []
im_scale_factors = []
for target_size in cfg.TEST.SCALES:
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
im_scale_factors.append(im_scale)
processed_ims.append(im)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob, np.array(im_scale_factors)
'''
processed_ims = []
im_scale_factors = []
im = im.astype(np.float32, copy = False)
im = im - cfg.PIXEL_MEANS
im_shape = im.shape[0:2]
im_scale=np.hstack([float(target_size) / im_shape[1],float(target_size) / im_shape[0]])
im = cv2.resize(im, None, None, fx = float(target_size) / im_shape[1], \
fy = float(target_size) / im_shape[0], interpolation = cv2.INTER_LINEAR)
processed_ims.append(im)
im_scale_factors.append(im_scale)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob,np.array(im_scale_factors)
def _get_rois_blob(im_rois, im_scale_factors):
"""Converts RoIs into network inputs.
Arguments:
im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
im_scale_factors (list): scale factors as returned by _get_image_blob
Returns:
blob (ndarray): R x 5 matrix of RoIs in the image pyramid
"""
rois, levels = _project_im_rois(im_rois,im_scale_factors)
rois_blob = np.hstack((levels, rois))
return rois_blob.astype(np.float32, copy=False)
def _project_im_rois(im_rois,scales):
"""Project image RoIs into the image pyramid built by _get_image_blob.
Arguments:
im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
scales (list): scale factors as returned by _get_image_blob
Returns:
rois (ndarray): R x 4 matrix of projected RoI coordinates
levels (list): image pyramid levels used by each projected RoI
"""
im_rois = im_rois.astype(np.float, copy=False)
if len(scales) > 1:
widths = im_rois[:, 2] - im_rois[:, 0] + 1
heights = im_rois[:, 3] - im_rois[:, 1] + 1
areas = widths * heights
scaled_areas = areas[:, np.newaxis] * (scales[np.newaxis, :] ** 2)
diff_areas = np.abs(scaled_areas - 224 * 224)
levels = diff_areas.argmin(axis=1)[:, np.newaxis]
else:
levels = np.zeros((im_rois.shape[0], 1), dtype=np.int)
rois[:,[0,2]] = im_rois[:,[0,2]] * scales[levels][0]
rois[:,[1,3]] = im_rois[:,[1,3]] * scales[levels][1]
return rois, levels
def _get_blobs(im, rois,target_size):
"""Convert an image and RoIs within that image into network inputs."""
blobs = {'data' : None, 'rois' : None}
blobs['data'], im_scale_factors = _get_image_blob(im,target_size)
if not cfg.TEST.HAS_RPN:
blobs['rois'] = _get_rois_blob(rois, im_scale_factors)
return blobs, im_scale_factors
def im_detect(net, im, boxes=None):
"""Detect object classes in an image given object proposals.
Arguments:
net (caffe.Net): Fast R-CNN network to use
im (ndarray): color image to test (in BGR order)
boxes (ndarray): R x 4 array of object proposals or None (for RPN)
Returns:
scores (ndarray): R x K array of object class scores (K includes
background as object category 0)
boxes (ndarray): R x (4*K) array of predicted bounding boxes
"""
blobs, im_scales = _get_blobs(im, boxes,target_size = cfg.TEST.SCALES[0])
# When mapping from image ROIs to feature map ROIs, there's some aliasing
# (some distinct image ROIs get mapped to the same feature ROI).
# Here, we identify duplicate feature ROIs, so we only compute features
# on the unique subset.
if cfg.DEDUP_BOXES > 0 and not cfg.TEST.HAS_RPN:
v = np.array([1, 1e3, 1e6, 1e9, 1e12])
hashes = np.round(blobs['rois'] * cfg.DEDUP_BOXES).dot(v)
_, index, inv_index = np.unique(hashes, return_index=True,
return_inverse=True)
blobs['rois'] = blobs['rois'][index, :]
boxes = boxes[index, :]
if cfg.TEST.HAS_RPN:
im_blob = blobs['data']
blobs['im_info'] = np.array([[cfg.TEST.SCALES[0],cfg.TEST.SCALES[0]]],
dtype=np.float32)
# reshape network inputs
net.blobs['data'].reshape(*(blobs['data'].shape))
if cfg.TEST.HAS_RPN:
net.blobs['im_info'].reshape(*(blobs['im_info'].shape))
else:
net.blobs['rois'].reshape(*(blobs['rois'].shape))
# do forward
forward_kwargs = {'data': blobs['data'].astype(np.float32, copy=False)}
if cfg.TEST.HAS_RPN:
forward_kwargs['im_info'] = blobs['im_info'].astype(np.float32, copy=False)
else:
forward_kwargs['rois'] = blobs['rois'].astype(np.float32, copy=False)
blobs_out = net.forward(**forward_kwargs)
if cfg.TEST.HAS_RPN:
assert len(im_scales) == 1, "Only single-image batch implemented"
rois = net.blobs['rois'].data.copy()
# unscale back to raw image space
a= rois[:,[1,3]] / im_scales[0][0]
b = rois[:,[2,4]] / im_scales[0][1]
boxes=np.hstack([a[:,[0]],b[:,[0]],a[:,[1]],b[:,[1]]])
if cfg.TEST.SVM:
# use the raw scores before softmax under the assumption they
# were trained as linear SVMs
scores = net.blobs['cls_score'].data
else:
# use softmax estimated probabilities
scores = blobs_out['cls_prob']
if cfg.TEST.BBOX_REG:
# Apply bounding-box regression deltas
box_deltas = blobs_out['bbox_pred']
pred_boxes = bbox_transform_inv(boxes, box_deltas)
pred_boxes = clip_boxes(pred_boxes, im.shape)
else:
# Simply repeat the boxes, once for each class
pred_boxes = np.tile(boxes, (1, scores.shape[1]))
if cfg.DEDUP_BOXES > 0 and not cfg.TEST.HAS_RPN:
# Map scores and predictions back to the original set of boxes
scores = scores[inv_index, :]
pred_boxes = pred_boxes[inv_index, :]
return scores, pred_boxes
def vis_detections(im, class_name, dets, thresh=0.3):
"""Visual debugging of detections."""
import matplotlib.pyplot as plt
im = im[:, :, (2, 1, 0)]
for i in xrange(np.minimum(10, dets.shape[0])):
bbox = dets[i, :4]
score = dets[i, -1]
if score > thresh:
plt.cla()
plt.imshow(im)
plt.gca().add_patch(
plt.Rectangle((bbox[0], bbox[1]),
bbox[2] - bbox[0],
bbox[3] - bbox[1], fill=False,
edgecolor='g', linewidth=3)
)
plt.title('{} {:.3f}'.format(class_name, score))
plt.show()
def apply_nms(all_boxes, thresh):
"""Apply non-maximum suppression to all predicted boxes output by the
test_net method.
"""
num_classes = len(all_boxes)
num_images = len(all_boxes[0])
nms_boxes = [[[] for _ in xrange(num_images)]
for _ in xrange(num_classes)]
for cls_ind in xrange(num_classes):
for im_ind in xrange(num_images):
dets = all_boxes[cls_ind][im_ind]
if dets == []:
continue
# CPU NMS is much faster than GPU NMS when the number of boxes
# is relative small (e.g., < 10k)
# TODO(rbg): autotune NMS dispatch
keep = nms(dets, thresh, force_cpu=True)
if len(keep) == 0:
continue
nms_boxes[cls_ind][im_ind] = dets[keep, :].copy()
return nms_boxes
def test_net(net, imdb, max_per_image=100, thresh=0.05, vis=False):
"""Test a Fast R-CNN network on an image database."""
num_images = len(imdb.image_index)
# all detections are collected into:
# all_boxes[cls][image] = N x 5 array of detections in
# (x1, y1, x2, y2, score)
all_boxes = [[[] for _ in xrange(num_images)]
for _ in xrange(imdb.num_classes)]
output_dir = get_output_dir(imdb, net)
# timers
_t = {'im_detect' : Timer(), 'misc' : Timer()}
if not cfg.TEST.HAS_RPN:
roidb = imdb.roidb
for i in xrange(num_images):
# filter out any ground truth boxes
if cfg.TEST.HAS_RPN:
box_proposals = None
else:
# The roidb may contain ground-truth rois (for example, if the roidb
# comes from the training or val split). We only want to evaluate
# detection on the *non*-ground-truth rois. We select those the rois
# that have the gt_classes field set to 0, which means there's no
# ground truth.
box_proposals = roidb[i]['boxes'][roidb[i]['gt_classes'] == 0]
im = cv2.imread(imdb.image_path_at(i))
_t['im_detect'].tic()
scores, boxes = im_detect(net, im, box_proposals)
_t['im_detect'].toc()
_t['misc'].tic()
# skip j = 0, because it's the background class
for j in xrange(1, imdb.num_classes):
inds = np.where(scores[:, j] > thresh)[0]
cls_scores = scores[inds, j]
cls_boxes = boxes[inds, j*4:(j+1)*4]
cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
keep = nms(cls_dets, cfg.TEST.NMS)
cls_dets = cls_dets[keep, :]
if vis:
vis_detections(im, imdb.classes[j], cls_dets)
all_boxes[j][i] = cls_dets
# Limit to max_per_image detections *over all classes*
if max_per_image > 0:
image_scores = np.hstack([all_boxes[j][i][:, -1]
for j in xrange(1, imdb.num_classes)])
if len(image_scores) > max_per_image:
image_thresh = np.sort(image_scores)[-max_per_image]
for j in xrange(1, imdb.num_classes):
keep = np.where(all_boxes[j][i][:, -1] >= image_thresh)[0]
all_boxes[j][i] = all_boxes[j][i][keep, :]
_t['misc'].toc()
print 'im_detect: {:d}/{:d} {:.3f}s {:.3f}s' \
.format(i + 1, num_images, _t['im_detect'].average_time,
_t['misc'].average_time)
det_file = os.path.join(output_dir, 'detections.pkl')
with open(det_file, 'wb') as f:
cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL)
print 'Evaluating detections'
imdb.evaluate_detections(all_boxes, output_dir)