

我想知道如果我有一个生成底部斑点的层,该斑点会被随后的两个后续层进一步消耗,这两个层都会在向后传播阶段生成一些渐变以填充bottom.diff.是否将两个梯度加起来形成最终的梯度?还是只有其中一个可以生存?以我的理解,Caffe层需要先将bottom.diff设置为全零,然后再填充一些计算出的渐变,对吗? memset会否清除另一层已经计算出的渐变?谢谢!

I'm wondering what if I have a layer generating a bottom blob that is further consumed by two subsequent layers, both of which will generate some gradients to fill bottom.diff in the back propagation stage. Will both two gradients be added up to form the final gradient? Or, only one of them can live? In my understanding, Caffe layers need to memset the bottom.diff to all zeros before filling it with some computed gradients, right? Will the memset flush out the already computed gradients by the other layer? Thank you!


使用一个以上的损失层并非非同寻常,请参阅 GoogLeNet :它具有三个损失层,分别在网络的不同深度推"梯度.
在caffe中,每个损失层都有一个关联的 loss_weight :此特定组件如何影响网络的损耗函数.因此,如果您的网络有两个损失层,分别为Loss1Loss1,则您的网络的总体损失为

Using more than a single loss layer is not out-of-the-ordinary, see GoogLeNet for example: It has three loss layers "pushing" gradients at different depths of the net.
In caffe, each loss layer has a associated loss_weight: how this particular component contribute to the loss function of the net. Thus, if your net has two loss layers, Loss1 and Loss1 the overall loss of your net is

Loss = loss_weight1*Loss1 + loss_weight2*Loss2


The backpropagation uses the chain rule to propagate the gradient of Loss (the overall loss) through all the layers in the net. The chain rule breaks down the derivation of Loss into partial derivatives, i.e., the derivatives of each layer, the overall effect is obtained by propagating the gradients through the partial derivatives. That is, by using top.diff and the layer's backward() function to compute bottom.diff one takes into account not only the layer's derivative, but also the effect of ALL higher layers expressed in top.diff.

您可以具有多个损失层. Caffe(以及其他任何体面的深度学习框架)都可以为您无缝地处理它.

You can have multiple loss layers. Caffe (as well as any other decent deep learning framework) handles it seamlessly for you.


07-25 12:11