二.训练阶段的改动

本文改动的是以faster_rcnn_end_to_end模式的流程进行,alt_opt模式没有尝试但应该是类似的。在训练时,我们调用的是train.py它直接解析train.prototxt,其第一个步骤就是载入的数据,这里使用的就是上面提到的roi_data_layer进行的数据封装。
因此首先打开对应的文件夹roi_data_layer下的layer.py寻找需要改动的位置。因为一个层都是要先进行初始化的,因此首先查看setup函数。这里主要就是设定blob的维度。
主要的改动就是将上面的提到95左右的
top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,max(cfg.TRAIN.SCALES), cfg.TRAIN.MAX_SIZE)
改为
top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,cfg.TRAIN.MAX_SIZE, cfg.TRAIN.MAX_SIZE)
这里主要就是将未来data blob的长和宽确定为一个固定的值,这里可以在fast_rcnn/config.py下重新设计一个参数如train_target_size来表示,但我为了偷懒就直接使用了原有的 cfg.TRAIN.MAX_SIZE。
但是我们知道光改动这里并不能真正起作用,接下来是运行forward函数,其中调用了get_next_minibatch函数获取数据。打开同一个文件夹下的minibatch.py文件
首先是layer.py中最终调用的get_minibatch()函数:
这里要把20行的
random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES), size=num_images)
改为
random_scale_inds = npr.randint(0, high=1, size=num_images)
因为我也是尝试改动,所以就没有使用图像金塔功能(论文的实验也没有使用),所以之间high设为1就行了,实际这是一个正在(0,high)产生num_images个随机数的函数,这里我们的num_images(RPN只允许一张)和high都是1,所以就只能是0了。
从下文看就是一次只能读取一张图片训练。接下就是运行第29行:
_get_image_blob(roidb, random_scale_inds)
调用了_get_image_blob函数在129行

def _get_image_blob(roidb, scale_inds):
    #Builds an input blob from the images in the roidb at the specified  scales.
    num_images = len(roidb)
    processed_ims = []
    im_scales = []
    for i in xrange(num_images):
        im = cv2.imread(roidb[i]['image'])
        if roidb[i]['flipped']:
            im = im[:, ::-1, :]
        target_size = cfg.TRAIN.SCALES[scale_inds[i]]
        im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size,cfg.TRAIN.MAX_SIZE)
        im_scales.append(im_scale)
        processed_ims.append(im)
    # Create a blob to hold the input images
    blob = im_list_to_blob(processed_ims)
    return blob, im_scales

这里可以看到这个函数的功能就是对于num_images张图片遍历(实际就是1张),然后获得一个放缩后的图片和对应的放缩尺寸。所以要将target_size设置为我们想要那个参数,我这里就是 cfg.TRAIN.MAX_SIZE。这里就是最重要的图片放缩函数prep_im_for_blob().它在utils/bolb.py中,之前已经展示过:

def prep_im_for_blob(im, pixel_means, target_size, max_size):
    """Mean subtract and scale an image for use in a blob."""
    im = im.astype(np.float32, copy=False)
    im -= pixel_means
    im_shape = im.shape
    im_size_min = np.min(im_shape[0:2])
    im_size_max = np.max(im_shape[0:2])
    im_scale = float(target_size) / float(im_size_min)
    # Prevent the biggest axis from being more than MAX_SIZE
    if np.round(im_scale * im_size_max) > max_size:
        im_scale = float(max_size) / float(im_size_max)
    im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
                    interpolation=cv2.INTER_LINEAR)
    return im, im_scale

整体改为:

def prep_im_for_blob(im, pixel_means, im_info):
    """Mean subtract and scale an image for use in a blob."""
    im = im.astype(np.float32, copy=False)
    im -= pixel_means
    im_shape = im.shape[0:2]
    fy_scale, fx_scale =  im_info / im_shape
    im = cv2.resize(im, None, None, fx=fx_scale, fy=fy_scale,
                    interpolation=cv2.INTER_LINEAR)
    im_scales = np.array([fx_scale, fy_scale])
    return im, im_scales

改动思路很简单,就是为了让图片的长宽一致,我对于原图的长和宽分别除以不同的scale值进行放缩。这样对应的函数调用就要改为:

        im_info = np.array([target_size, target_size], dtype = np.float32)
        im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, im_info)

这里用im_info数组代替原本的两个参数传入了函数,使得长宽的target一致。我们要记住的就是此时的im_scale不再是一个变量,而是一个两个元素的数组。
返回到get_next_minibatch函数中接下的代码为:

    if cfg.TRAIN.HAS_RPN:
        assert len(im_scales) == 1, "Single batch only"
        assert len(roidb) == 1, "Single batch only"
        # gt boxes: (x1, y1, x2, y2, cls)
        gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
        gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
        gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0]
        gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
        blobs['gt_boxes'] = gt_boxes
        blobs['im_info'] = np.array(
            [[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],
            dtype=np.float32)
    else: # not using RPN
        # Now, build the region of interest and label blobs

改为

    if cfg.TRAIN.HAS_RPN:
        assert len(im_scales) == 1, "Single batch only"
        assert len(roidb) == 1, "Single batch only"
        # gt boxes: (x1, y1, x2, y2, cls)
        gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
        gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
        gt_boxes[:, [0,2]]= roidb[0]['boxes'][gt_inds][:, [0,2]] * im_scales[0][0]
        gt_boxes[:, [1,3]] = roidb[0]['boxes'][gt_inds][:, [1,3]] * im_scales[0][1]
        gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
        blobs['gt_boxes'] = gt_boxes
        blobs['im_info'] = np.array([[cfg.TRAIN.MAX_SIZE,cfg.TRAIN.MAX_SIZE]], dtype = np.float32)
   else: # not using RPN
            # Now, build the region of interest and label blobs

因为我是faster rcnn这里的cfg.TRAIN.HAS_RPN肯定是true(设置在yml文件中), 这里的修改就是将gt_boxes由统一尺度放缩变为长和宽两个维度上分别进行放缩。以及重新设置一下im_info的数据维度,因为图片大小都统一了就没必要记录这么多参数了。代码中else后面的函数实际上也有涉及到尺度的问题,主要是rois的尺度变化,但却是不使用RPN情况下,使用selective search直接寻找ROIs的相关代码,我估计应该是作者在原fast rcnn基础上升级faster rcnn,保留了原版功能,因此我们可以不对这一部分代码进行改变,想改也可以依照我前面的思路加入一些代码即可。
这里返回layer.py,注意我们改变了im_info的格式,所以要在89行把top[idx].reshape(1,4)变成top[idx].reshape(1,2)。这时网络就可以接着训练
到了RPN阶段的anchor_target_layer和proposal_laryer时,两个层都会有一个读im_info的操作:im_info = bottom[2].data[0, :]
注意这里只要第一组数据,而我们知道faster rcnn原代码中不同的图片的im_info是不同的,这里只取第一张进行操作,显然不合适,所以它就强行规定了faster rcnn的ims_per_batch为1。这样就只有一张图片。而minibacth.py中看到原本im_info为一个13的数组,现在则是一个12的数组,所以在proposal_layer.py第126行,原本有一个操作为 keep = _filter_boxes(proposals, min_size * im_info[3])意思是proposals的边框不要比设定的长度短,在这里我们就直接设定为
keep = _filter_boxes(proposals, min_size)即可(会有一定偏差,但这样引入的参数比较简单)。
至此训练阶段更改成功,截图纪念
faster rcn固定输入图片尺寸(二)-LMLPHP
可以看到所有图片都被resize到了672672大小,特征图经16倍的缩小为4242
测试部分更改只需要改变test.py一个文件即可,附在下面

# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------

"""Test a Fast R-CNN network on an imdb (image database)."""

from fast_rcnn.config import cfg, get_output_dir
from fast_rcnn.bbox_transform import clip_boxes, bbox_transform_inv
import argparse
from utils.timer import Timer
import numpy as np
import cv2
import caffe
from fast_rcnn.nms_wrapper import nms
import cPickle
from utils.blob import im_list_to_blob
import os

def _get_image_blob(im,target_size):
    """Converts an image into a network input.

    Arguments:
        im (ndarray): a color image in BGR order

    Returns:
        blob (ndarray): a data blob holding an image pyramid
        im_scale_factors (list): list of image scales (relative to im) used
            in the image pyramid
    """
    '''
    im_orig = im.astype(np.float32, copy=True)
    im_orig -= cfg.PIXEL_MEANS

    im_shape = im_orig.shape
    im_size_min = np.min(im_shape[0:2])
    im_size_max = np.max(im_shape[0:2])

    processed_ims = []
    im_scale_factors = []

    for target_size in cfg.TEST.SCALES:
        im_scale = float(target_size) / float(im_size_min)
        # Prevent the biggest axis from being more than MAX_SIZE
        if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
            im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
        im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
                        interpolation=cv2.INTER_LINEAR)
        im_scale_factors.append(im_scale)
        processed_ims.append(im)

    # Create a blob to hold the input images
    blob = im_list_to_blob(processed_ims)

    return blob, np.array(im_scale_factors)
    '''
    processed_ims = []
    im_scale_factors = []
    im = im.astype(np.float32, copy = False)
    im = im - cfg.PIXEL_MEANS
    im_shape = im.shape[0:2]
    im_scale=np.hstack([float(target_size) / im_shape[1],float(target_size) / im_shape[0]])
    im = cv2.resize(im, None, None, fx = float(target_size) / im_shape[1], \
                    fy = float(target_size) / im_shape[0], interpolation = cv2.INTER_LINEAR)
    processed_ims.append(im)
    im_scale_factors.append(im_scale)
    # Create a blob to hold the input images
    blob = im_list_to_blob(processed_ims)

    return blob,np.array(im_scale_factors)

def _get_rois_blob(im_rois, im_scale_factors):
    """Converts RoIs into network inputs.

    Arguments:
        im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
        im_scale_factors (list): scale factors as returned by _get_image_blob

    Returns:
        blob (ndarray): R x 5 matrix of RoIs in the image pyramid
    """
    rois, levels = _project_im_rois(im_rois,im_scale_factors)
    rois_blob = np.hstack((levels, rois))
    return rois_blob.astype(np.float32, copy=False)

def _project_im_rois(im_rois,scales):
    """Project image RoIs into the image pyramid built by _get_image_blob.

    Arguments:
        im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
        scales (list): scale factors as returned by _get_image_blob

    Returns:
        rois (ndarray): R x 4 matrix of projected RoI coordinates
        levels (list): image pyramid levels used by each projected RoI
    """
    im_rois = im_rois.astype(np.float, copy=False)
    if len(scales) > 1:
        widths = im_rois[:, 2] - im_rois[:, 0] + 1
        heights = im_rois[:, 3] - im_rois[:, 1] + 1

        areas = widths * heights
        scaled_areas = areas[:, np.newaxis] * (scales[np.newaxis, :] ** 2)
        diff_areas = np.abs(scaled_areas - 224 * 224)
        levels = diff_areas.argmin(axis=1)[:, np.newaxis]
    else:
        levels = np.zeros((im_rois.shape[0], 1), dtype=np.int)
    rois[:,[0,2]] = im_rois[:,[0,2]] * scales[levels][0]
    rois[:,[1,3]] = im_rois[:,[1,3]] * scales[levels][1]
    return rois, levels

def _get_blobs(im, rois,target_size):
    """Convert an image and RoIs within that image into network inputs."""
    blobs = {'data' : None, 'rois' : None}
    blobs['data'], im_scale_factors = _get_image_blob(im,target_size)
    if not cfg.TEST.HAS_RPN:
        blobs['rois'] = _get_rois_blob(rois, im_scale_factors)
    return blobs, im_scale_factors

def im_detect(net, im, boxes=None):
    """Detect object classes in an image given object proposals.

    Arguments:
        net (caffe.Net): Fast R-CNN network to use
        im (ndarray): color image to test (in BGR order)
        boxes (ndarray): R x 4 array of object proposals or None (for RPN)

    Returns:
        scores (ndarray): R x K array of object class scores (K includes
            background as object category 0)
        boxes (ndarray): R x (4*K) array of predicted bounding boxes
    """
    blobs, im_scales = _get_blobs(im, boxes,target_size = cfg.TEST.SCALES[0])

    # When mapping from image ROIs to feature map ROIs, there's some aliasing
    # (some distinct image ROIs get mapped to the same feature ROI).
    # Here, we identify duplicate feature ROIs, so we only compute features
    # on the unique subset.
    if cfg.DEDUP_BOXES > 0 and not cfg.TEST.HAS_RPN:
        v = np.array([1, 1e3, 1e6, 1e9, 1e12])
        hashes = np.round(blobs['rois'] * cfg.DEDUP_BOXES).dot(v)
        _, index, inv_index = np.unique(hashes, return_index=True,
                                        return_inverse=True)
        blobs['rois'] = blobs['rois'][index, :]
        boxes = boxes[index, :]

    if cfg.TEST.HAS_RPN:
        im_blob = blobs['data']
        blobs['im_info'] = np.array([[cfg.TEST.SCALES[0],cfg.TEST.SCALES[0]]],
            dtype=np.float32)

    # reshape network inputs
    net.blobs['data'].reshape(*(blobs['data'].shape))
    if cfg.TEST.HAS_RPN:
        net.blobs['im_info'].reshape(*(blobs['im_info'].shape))
    else:
        net.blobs['rois'].reshape(*(blobs['rois'].shape))

    # do forward
    forward_kwargs = {'data': blobs['data'].astype(np.float32, copy=False)}
    if cfg.TEST.HAS_RPN:
        forward_kwargs['im_info'] = blobs['im_info'].astype(np.float32, copy=False)
    else:
        forward_kwargs['rois'] = blobs['rois'].astype(np.float32, copy=False)
    blobs_out = net.forward(**forward_kwargs)

    if cfg.TEST.HAS_RPN:
        assert len(im_scales) == 1, "Only single-image batch implemented"
        rois = net.blobs['rois'].data.copy()
        # unscale back to raw image space
        a= rois[:,[1,3]] / im_scales[0][0]
        b = rois[:,[2,4]] / im_scales[0][1]
        boxes=np.hstack([a[:,[0]],b[:,[0]],a[:,[1]],b[:,[1]]])
    if cfg.TEST.SVM:
        # use the raw scores before softmax under the assumption they
        # were trained as linear SVMs
        scores = net.blobs['cls_score'].data
    else:
        # use softmax estimated probabilities
        scores = blobs_out['cls_prob']

    if cfg.TEST.BBOX_REG:
        # Apply bounding-box regression deltas
        box_deltas = blobs_out['bbox_pred']
        pred_boxes = bbox_transform_inv(boxes, box_deltas)
        pred_boxes = clip_boxes(pred_boxes, im.shape)
    else:
        # Simply repeat the boxes, once for each class
        pred_boxes = np.tile(boxes, (1, scores.shape[1]))

    if cfg.DEDUP_BOXES > 0 and not cfg.TEST.HAS_RPN:
        # Map scores and predictions back to the original set of boxes
        scores = scores[inv_index, :]
        pred_boxes = pred_boxes[inv_index, :]

    return scores, pred_boxes

def vis_detections(im, class_name, dets, thresh=0.3):
    """Visual debugging of detections."""
    import matplotlib.pyplot as plt
    im = im[:, :, (2, 1, 0)]
    for i in xrange(np.minimum(10, dets.shape[0])):
        bbox = dets[i, :4]
        score = dets[i, -1]
        if score > thresh:
            plt.cla()
            plt.imshow(im)
            plt.gca().add_patch(
                plt.Rectangle((bbox[0], bbox[1]),
                              bbox[2] - bbox[0],
                              bbox[3] - bbox[1], fill=False,
                              edgecolor='g', linewidth=3)
                )
            plt.title('{}  {:.3f}'.format(class_name, score))
            plt.show()

def apply_nms(all_boxes, thresh):
    """Apply non-maximum suppression to all predicted boxes output by the
    test_net method.
    """
    num_classes = len(all_boxes)
    num_images = len(all_boxes[0])
    nms_boxes = [[[] for _ in xrange(num_images)]
                 for _ in xrange(num_classes)]
    for cls_ind in xrange(num_classes):
        for im_ind in xrange(num_images):
            dets = all_boxes[cls_ind][im_ind]
            if dets == []:
                continue
            # CPU NMS is much faster than GPU NMS when the number of boxes
            # is relative small (e.g., < 10k)
            # TODO(rbg): autotune NMS dispatch
            keep = nms(dets, thresh, force_cpu=True)
            if len(keep) == 0:
                continue
            nms_boxes[cls_ind][im_ind] = dets[keep, :].copy()
    return nms_boxes
def test_net(net, imdb, max_per_image=100, thresh=0.05, vis=False):
    """Test a Fast R-CNN network on an image database."""
    num_images = len(imdb.image_index)
    # all detections are collected into:
    #    all_boxes[cls][image] = N x 5 array of detections in
    #    (x1, y1, x2, y2, score)
    all_boxes = [[[] for _ in xrange(num_images)]
                 for _ in xrange(imdb.num_classes)]
    output_dir = get_output_dir(imdb, net)
    # timers
    _t = {'im_detect' : Timer(), 'misc' : Timer()}
    if not cfg.TEST.HAS_RPN:
        roidb = imdb.roidb
    for i in xrange(num_images):
        # filter out any ground truth boxes
        if cfg.TEST.HAS_RPN:
            box_proposals = None
        else:
            # The roidb may contain ground-truth rois (for example, if the roidb
            # comes from the training or val split). We only want to evaluate
            # detection on the *non*-ground-truth rois. We select those the rois
            # that have the gt_classes field set to 0, which means there's no
            # ground truth.
            box_proposals = roidb[i]['boxes'][roidb[i]['gt_classes'] == 0]
        im = cv2.imread(imdb.image_path_at(i))
        _t['im_detect'].tic()
        scores, boxes = im_detect(net, im, box_proposals)
        _t['im_detect'].toc()
        _t['misc'].tic()
        # skip j = 0, because it's the background class
        for j in xrange(1, imdb.num_classes):
            inds = np.where(scores[:, j] > thresh)[0]
            cls_scores = scores[inds, j]
            cls_boxes = boxes[inds, j*4:(j+1)*4]
            cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \
                .astype(np.float32, copy=False)
            keep = nms(cls_dets, cfg.TEST.NMS)
            cls_dets = cls_dets[keep, :]
            if vis:
                vis_detections(im, imdb.classes[j], cls_dets)
            all_boxes[j][i] = cls_dets
        # Limit to max_per_image detections *over all classes*
        if max_per_image > 0:
            image_scores = np.hstack([all_boxes[j][i][:, -1]
                                      for j in xrange(1, imdb.num_classes)])
            if len(image_scores) > max_per_image:
                image_thresh = np.sort(image_scores)[-max_per_image]
                for j in xrange(1, imdb.num_classes):
                    keep = np.where(all_boxes[j][i][:, -1] >= image_thresh)[0]
                    all_boxes[j][i] = all_boxes[j][i][keep, :]
        _t['misc'].toc()
        print 'im_detect: {:d}/{:d} {:.3f}s {:.3f}s' \
              .format(i + 1, num_images, _t['im_detect'].average_time,
                      _t['misc'].average_time)
    det_file = os.path.join(output_dir, 'detections.pkl')
    with open(det_file, 'wb') as f:
        cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL)
    print 'Evaluating detections'
    imdb.evaluate_detections(all_boxes, output_dir)

03-09 04:14