SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network | RPN

RPN

关注(28)粉丝(399)

SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network

High Performance Visual Tracking with Siamese Region Proposal Network

2018-11-26 18:32:02

Paper：http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf

PyTorch Code：https://github.com/songdejia/siamese-RPN-pytorch

Train Code: https://github.com/MathsXDC/DaSiamRPNWithOfflineTraining

TensorFlow Code：https://github.com/makalo/Siamese-RPN-tensorflow

Reference Code：https://github.com/zkisthebest/Siamese-RPN

Another Implementation based on PyTorch with deeper and wider backbone network (SiamDW, CVPR-2019): https://github.com/researchmm/SiamDW (all the train and test code !!!)

1. Background and Motivation :

现有的跟踪方法主要分为两种：

1). 相关滤波跟踪方法；也有将 deep feature 结合到 CF 方法中，但是速度不够快；

2). 完全基于深度网络的跟踪方法，由于没有用到 domain-specific information，效果并不是很突出。

本文将 RPN 引入到跟踪过程中，极大地改善了跟踪效果。主要包含两个分支：

1). Template branch;

2). Detection branch;

在测试阶段，作者将其看做是：local one-shot detection framework，第一帧中的 BBox 仅提供 exemplar。作者将 template branch 重新看做是参数来预测 detection kernels，类似于 meta-learner。meata-learner 和 detection branch 都仅仅用 RPN 的监督来进行端到端的训练。在 online tracking 过程中，Template branch 会被修剪以达到加速的目的。本文所提出的方法也是第一次将 online tracking 看做是 one-shot detection 任务。

本文所提出的 Siamese RPN 的流程图如下所示：

SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network-LMLPHP

2. Siamese-RPN framework：

2.1 Siamese feature extraction subnetwork

在孪生网络中，作者采用不带 padding 的全卷积网络。骨干网络是修改后的 AlexNet，Siamese tracker 的示意图如下：

SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network-LMLPHP

2.2 Region Proposal Subnetwork

该 RPN 子网络包含两个部分：pair-wise correlation section 以及 supervision section。

Supervision section 包含两个分支：一个是用于前景和背景分类的分支，另一个分支用于 proposal 回归。

如果有 k 个 anchors，网络需要输出 2k channel 以进行分类，4k channels 以进行回归。所以，pair-wise correlation 首先增加 channel 个数为两个部分。另一个分支也分为两路，即：reg 和 cls。Template 分支输出的 feature 可以看做是 “kernel”，在 search region 的 feature 上进行卷积操作。在 classification 和 regression branch 上都要进行 correlation 操作：

SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network-LMLPHP

当进行训练时，作者采用 Faster RCNN 的损失函数。用交叉熵损失函数来训练 classification 分支，L1 loss 用于 regression 分支的训练。

Ax, Ay, Aw, Ah 代表 anchor boxes 的中心点和形状，Tx, Ty, Tw, Th 代表 GT boxes，所以，归一化的距离可以表达为：

SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network-LMLPHP

然后，其通过 L1 loss，具体表达形式为：

SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network-LMLPHP

最终，作者优化的损失函数为：

SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network-LMLPHP

其中，Lcls 是交叉熵损失，Lreg 是：

SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network-LMLPHP

2.3 Training Phase

在训练阶段，ImageNet VID 和 Youtube-BB 被用于采集 sample pairs 来进行相似度匹配的训练。

anchors 的选择是基于 IoU 进行的，当 IoU 大于设定的阈值（文中设置为 0.6），并且是正样本的时候，被当做是 anchors。负样本则认为是那些 IoU 低于 0.3 的。

对于一个 training pair，作者设置最多 16 个正样本，总共 64 个样本。

3. Tracking as one-shot detection:

==

05-12 13:35