问题描述
我需要为卷积神经网络 (CNN) 中的单个卷积滤波器找到输入层的梯度,以作为 可视化过滤器.
在 Caffe 的 Python 接口中给定一个经过训练的网络,例如 这个例子,然后我怎样才能找到一个卷积过滤器的梯度相对于输入层中的数据?
I need to find the gradient with regards to the input layer for a single convolutional filter in a convolutional neural network (CNN) as a way to visualize the filters.
Given a trained network in the Python interface of Caffe such as the one in this example, how can I then find the gradient of a conv-filter with respect to the data in the input layer?
根据 cesans 的回答,我添加了下面的代码.我的输入层的尺寸是 [8, 8, 7, 96]
.我的第一个卷积层,conv1
,有 11 个过滤器,尺寸为 1x5
,产生的尺寸为 [8, 11, 7, 92]
.
Based on the answer by cesans, I added the code below. The dimensions of my input layer is [8, 8, 7, 96]
. My first conv-layer, conv1
, has 11 filters with a size of 1x5
, resulting in the dimensions [8, 11, 7, 92]
.
net = solver.net
diffs = net.backward(diffs=['data', 'conv1'])
print diffs.keys() # >> ['conv1', 'data']
print diffs['data'].shape # >> (8, 8, 7, 96)
print diffs['conv1'].shape # >> (8, 11, 7, 92)
从输出中可以看出,net.backward()
返回的数组的维度与我在 Caffe 中的层的维度相等.经过一些测试,我发现这个输出是分别关于 data
层和 conv1
层的损失梯度.
As you can see from the output, the dimensions of the arrays returned by net.backward()
are equal to the dimensions of my layers in Caffe. After some testing I've found that this output is the gradients of the loss with regards to respectively the data
layer and the conv1
layer.
然而,我的问题是如何找到单个 conv-filter 相对于输入层中的数据的梯度,这是另一回事.我怎样才能做到这一点?
However, my question was how to find the gradient of a single conv-filter with respect to the data in the input layer, which is something else. How can I achieve this?
推荐答案
Caffe net 处理两个数字流".
第一个是数据流":通过网络推送的图像和标签.当这些输入通过网络时,它们被转换为高级表示,并最终转换为类概率向量(在分类任务中).
第二个流"包含不同层的参数、卷积的权重、偏差等.这些数字/权重在网络的训练阶段发生变化和学习.
Caffe net juggles two "streams" of numbers.
The first is the data "stream": images and labels pushed through the net. As these inputs progress through the net they are converted into high-level representation and eventually into class probabilities vectors (in classification tasks).
The second "stream" holds the parameters of the different layers, the weights of the convolutions, the biases etc. These numbers/weights are changed and learned during the train phase of the net.
尽管这两个流"所起的作用完全不同,但 caffe 仍然使用相同的数据结构 blob
来存储和管理它们.
但是,对于每一层,有两个不同的 blob 向量,每个流一个.
Despite the fundamentally different role these two "streams" play, caffe nonetheless use the same data structure, blob
, to store and manage them.
However, for each layer there are two different blobs vectors one for each stream.
这是一个我希望能澄清的例子:
Here's an example that I hope would clarify:
import caffe
solver = caffe.SGDSolver( PATH_TO_SOLVER_PROTOTXT )
net = solver.net
如果你现在看
net.blobs
您将看到一个字典,为网络中的每一层存储一个caffe blob"对象.每个 blob 都有存储数据和梯度的空间
You will see a dictionary storing a "caffe blob" object for each layer in the net. Each blob has storing room for both data and gradient
net.blobs['data'].data.shape # >> (32, 3, 224, 224)
net.blobs['data'].diff.shape # >> (32, 3, 224, 224)
对于卷积层:
net.blobs['conv1/7x7_s2'].data.shape # >> (32, 64, 112, 112)
net.blobs['conv1/7x7_s2'].diff.shape # >> (32, 64, 112, 112)
net.blobs
保存第一个数据流,它的形状与输入图像的形状匹配,直到产生的类概率向量.
net.blobs
holds the first data stream, it's shape matches that of the input images up to the resulting class probability vector.
另一方面,你可以看到net
net.layers
这是一个存储不同层参数的caffe向量.
查看第一层('data'
层):
This is a caffe vector storing the parameters of the different layers.
Looking at the first layer ('data'
layer):
len(net.layers[0].blobs) # >> 0
输入层没有要存储的参数.
另一方面,对于第一个卷积层
There are no parameters to store for an input layer.
On the other hand, for the first convolutional layer
len(net.layers[1].blobs) # >> 2
网络为过滤器权重存储一个 blob,另一个用于恒定偏差.他们在这里
The net stores one blob for the filter weights and another for the constant bias. Here they are
net.layers[1].blobs[0].data.shape # >> (64, 3, 7, 7)
net.layers[1].blobs[1].data.shape # >> (64,)
如您所见,该层对 3 通道输入图像执行 7x7 卷积,并有 64 个这样的过滤器.
As you can see, this layer performs 7x7 convolutions on 3-channel input image and has 64 such filters.
现在,如何获得梯度?好吧,正如你所指出的
Now, how to get the gradients? well, as you noted
diffs = net.backward(diffs=['data','conv1/7x7_s2'])
返回数据流的梯度.我们可以通过
Returns the gradients of the data stream. We can verify this by
np.all( diffs['data'] == net.blobs['data'].diff ) # >> True
np.all( diffs['conv1/7x7_s2'] == net.blobs['conv1/7x7_s2'].diff ) # >> True
(TL;DR) 你想要参数的梯度,这些参数存储在 net.layers
中:
(TL;DR) You want the gradients of the parameters, these are stored in the net.layers
with the parameters:
net.layers[1].blobs[0].diff.shape # >> (64, 3, 7, 7)
net.layers[1].blobs[1].diff.shape # >> (64,)
为了帮助您将图层名称及其索引映射到 net.layers
向量中,您可以使用 net._layer_names
.
To help you map between the names of the layers and their indices into net.layers
vector, you can use net._layer_names
.
更新关于使用梯度可视化过滤器响应:
梯度通常是为标量函数定义的.损失是一个标量,因此您可以说像素/过滤器权重相对于标量损失的梯度.此渐变是每个像素/过滤器权重的单个数字.
如果您想获得最大激活特定内部隐藏节点的输入,您需要一个辅助"网络,其损失正是您想要的特定隐藏节点的激活度量形象化.一旦你有了这个辅助网络,你就可以从任意输入开始,并根据辅助损失对输入层的梯度来改变这个输入:
Update regarding the use of gradients to visualize filter responses:
A gradient is normally defined for a scalar function. The loss is a scalar, and therefore you can speak of a gradient of pixel/filter weight with respect to the scalar loss. This gradient is a single number per pixel/filter weight.
If you want to get the input that results with maximal activation of a specific internal hidden node, you need an "auxiliary" net which loss is exactly a measure of the activation to the specific hidden node you want to visualize. Once you have this auxiliary net, you can start from an arbitrary input and change this input based on the gradients of the auxilary loss to the input layer:
update = prev_in + lr * net.blobs['data'].diff
这篇关于查找 Caffe conv-filter 关于输入的梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!