Where U and V are ortho-normal matrices, and S is diagonal with elements in decreasing magnitude on the diagonal.One of the interesting properties of SVD is that it allows to easily approximate W with a lower rank matrix: Suppose you truncate S to have only its k leading elements (instead of all elements on the diagonal) thenW_app = U S_trunc V*是W的等级k近似值. 使用SVD逼近完全连接的层假设我们有一个具有完全连接层的模型deploy_full.prototxtUsing SVD to approximate a fully connected layerSuppose we have a model deploy_full.prototxt with a fully connected layer# ... some layers herelayer { name: "fc_orig" type: "InnerProduct" bottom: "in" top: "out" inner_product_param { num_output: 1000 # more params... } # some more...}# more layers...此外,我们还有trained_weights_full.caffemodel-为deploy_full.prototxt模型训练的参数.Furthermore, we have trained_weights_full.caffemodel - trained parameters for deploy_full.prototxt model. 将deploy_full.protoxt复制到deploy_svd.protoxt并在您选择的编辑器中将其打开.用以下两层替换完全连接的层:Copy deploy_full.protoxt to deploy_svd.protoxt and open it in editor of your choice. Replace the fully connected layer with these two layers:layer { name: "fc_svd_U" type: "InnerProduct" bottom: "in" # same input top: "svd_interim" inner_product_param { num_output: 20 # approximate with k = 20 rank matrix bias_term: false # more params... } # some more...}# NO activation layer here!layer { name: "fc_svd_V" type: "InnerProduct" bottom: "svd_interim" top: "out" # same output inner_product_param { num_output: 1000 # original number of outputs # more params... } # some more...} 在python中,进行了净手术:import caffeimport numpy as nporig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)# get the original weight matrixW = np.array( orig_net.params['fc_orig'][0].data )# SVD decompositionk = 20 # same as num_ouput of fc_svd_UU, s, V = np.linalg.svd(W)S = np.zeros((U.shape[0], k), dtype='f4')S[:k,:k] = s[:k] # taking only leading k singular values# assign weight to svd netsvd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias# save the new weightssvd_net.save('trained_weights_svd.caffemodel')现在,我们有了deploy_svd.prototxt和trained_weights_svd.caffemodel,它们的乘积和权重都远小于原始网络.Now we have deploy_svd.prototxt with trained_weights_svd.caffemodel that approximate the original net with far less multiplications, and weights. 这篇关于如何使用截断的SVD减少完全连接的("InnerProduct"`)层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 1403页,肝出来的..
09-06 15:04