问题描述
在论文中 Girshick, R Fast-RCNN(ICCV 2015),3.1 Truncated SVD for fast detection"部分,作者建议使用SVD 技巧来减少全连接层的大小和计算时间.
In the paper Girshick, R Fast-RCNN (ICCV 2015), section "3.1 Truncated SVD for faster detection", the author proposes to use SVD trick to reduce the size and computation time of a fully connected layer.
给定一个训练模型(deploy.prototxt
和 weights.caffemodel
),我如何使用这个技巧来替换一个全连接层截断的?
Given a trained model (deploy.prototxt
and weights.caffemodel
), how can I use this trick to replace a fully connected layer with a truncated one?
推荐答案
一些线性代数背景
奇异值分解 (SVD) 是任何矩阵 W
的分解成三个矩阵:
Some linear-algebra background
Singular Value Decomposition (SVD) is a decomposition of any matrix W
into three matrices:
W = U S V*
其中U
和V
是正交矩阵,S
是对角线,元素在对角线上的大小递减.SVD 的一个有趣特性是它允许使用较低秩矩阵轻松逼近 W
:假设您将 S
截断为只有它的 k
> 前导元素(而不是对角线上的所有元素)然后
Where U
and V
are ortho-normal matrices, and S
is diagonal with elements in decreasing magnitude on the diagonal.One of the interesting properties of SVD is that it allows to easily approximate W
with a lower rank matrix: Suppose you truncate S
to have only its k
leading elements (instead of all elements on the diagonal) then
W_app = U S_trunc V*
是W
的秩k
近似值.
使用 SVD 逼近全连接层
假设我们有一个带有全连接层的模型 deploy_full.prototxt
# ... some layers here
layer {
name: "fc_orig"
type: "InnerProduct"
bottom: "in"
top: "out"
inner_product_param {
num_output: 1000
# more params...
}
# some more...
}
# more layers...
此外,我们有 trained_weights_full.caffemodel
- deploy_full.prototxt
模型的训练参数.
Furthermore, we have trained_weights_full.caffemodel
- trained parameters for deploy_full.prototxt
model.
将
deploy_full.protoxt
复制到deploy_svd.protoxt
并在您选择的编辑器中打开它.用这两层替换全连接层:
Copy
deploy_full.protoxt
todeploy_svd.protoxt
and open it in editor of your choice. Replace the fully connected layer with these two layers:
layer {
name: "fc_svd_U"
type: "InnerProduct"
bottom: "in" # same input
top: "svd_interim"
inner_product_param {
num_output: 20 # approximate with k = 20 rank matrix
bias_term: false
# more params...
}
# some more...
}
# NO activation layer here!
layer {
name: "fc_svd_V"
type: "InnerProduct"
bottom: "svd_interim"
top: "out" # same output
inner_product_param {
num_output: 1000 # original number of outputs
# more params...
}
# some more...
}
在 python 中,有点网络手术:
import caffe
import numpy as np
orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
# get the original weight matrix
W = np.array( orig_net.params['fc_orig'][0].data )
# SVD decomposition
k = 20 # same as num_ouput of fc_svd_U
U, s, V = np.linalg.svd(W)
S = np.zeros((U.shape[0], k), dtype='f4')
S[:k,:k] = s[:k] # taking only leading k singular values
# assign weight to svd net
svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)
svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]
svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias
# save the new weights
svd_net.save('trained_weights_svd.caffemodel')
现在我们有 deploy_svd.prototxt
和 trained_weights_svd.caffemodel
以更少的乘法和权重来近似原始网络.
Now we have deploy_svd.prototxt
with trained_weights_svd.caffemodel
that approximate the original net with far less multiplications, and weights.
这篇关于如何使用截断的 SVD 减少全连接(“InnerProduct")层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!