问题描述
我正在浏览Caffe的 SigmoidCrossEntropyLoss层的代码和 docs ,我有些困惑.文档将损失函数列为对数损失(我在此处进行了复制,但是没有Latex,该公式将很难阅读.请查看docs链接,它位于最顶部).
I was looking through the code of Caffe's SigmoidCrossEntropyLoss layer and the docs and I'm a bit confused. The docs list the loss function as the logit loss (I'd replicate it here, but without Latex, the formula would be difficult to read. Check out the docs link, it's at the very top).
但是,代码本身(Forward_cpu(...)
)显示了不同的公式
However, the code itself (Forward_cpu(...)
) shows a different formula
Dtype loss = 0;
for (int i = 0; i < count; ++i) {
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
}
top[0]->mutable_cpu_data()[0] = loss / num;
是因为这说明了已经将Sigmoid函数应用于输入吗?
Is it because this is accounting for the sigmoid function having already been applied to the input?
但是,即使如此,(input_data[i] >= 0)
片段也使我感到困惑.这些似乎代替了文档中损失公式中的p_hat,这应该是S型函数所压缩的预测.那么,为什么他们只是采用二进制阈值?由于这种损失预测了[0,1]输出,因此变得更加混乱,因此(input_data[i] >= 0)
将是1
,除非100%肯定不是.
However, even so, the (input_data[i] >= 0)
snippets are confusing me as well. Those appear to be in place of the p_hat from the loss formula in the docs, which is supposed to be the prediction squashed by the sigmoid function. So why are they just taking a binary threshold? It's made even more confusing as this loss predicts [0,1] outputs, so (input_data[i] >= 0)
will be a 1
unless it's 100% sure it's not.
有人可以向我解释一下吗?
Can someone please explain this to me?
推荐答案
caffe中的SigmoidCrossEntropy
层将将在input_data
上执行的2个步骤(Sigmoid
+ CrossEntropy
)合并为一段代码:
The SigmoidCrossEntropy
layer in caffe combines 2 steps(Sigmoid
+ CrossEntropy
) that will perform on input_data
into one piece of code :
Dtype loss = 0;
for (int i = 0; i < count; ++i) {
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
}
top[0]->mutable_cpu_data()[0] = loss / num;
实际上,无论是否为input_data >= 0
,上面的代码在数学上始终等同于以下代码:
In fact, no matter whether input_data >= 0
or not, the above code is always equivalent to the following code in math:
Dtype loss = 0;
for (int i = 0; i < count; ++i) {
loss -= input_data[i] * (target[i] - 1) -
log(1 + exp(-input_data[i]);
}
top[0]->mutable_cpu_data()[0] = loss / num;
,此代码基于在input_data
上应用Sigmoid
和CrossEntropy
并进行数学组合后的简单数学公式.
, this code is based on the straightforward math formula after applying Sigmoid
and CrossEntropy
on input_data
and making some combinations in math.
但是第一段代码(caffe使用)具有更高的数值稳定性,并且具有较小的溢出风险,因为它避免了在input_data
的绝对值时计算较大的exp(input_data)
(或exp(-input_data)
).太大.这就是为什么您在caffe中看到该代码的原因.
But the first piece of code(caffe uses) owns more numerical stability and takes less risk of overflow, because it avoids calculating a large exp(input_data)
(or exp(-input_data)
) when the absolute value of input_data
is too large. That's why you saw that code in caffe.
这篇关于Caffe SigmoidCrossEntropyLoss层损失函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!