问题描述
当我使用caffe进行图像分类时,它通常会计算图像均值。为什么会这样?
When I use caffe for image classification, it often computes the image mean. Why is that the case?
有人说它可以提高准确性,但我不明白为什么会出现这种情况。
Someone said that it can improve the accuracy, but I don't understand why this should be the case.
推荐答案
神经网络(包括CNN)是具有数千个参数的模型,我们尝试使用梯度下降进行优化。这些模型能够通过在其节点处具有非线性φ来适应许多不同的功能。如果没有非线性激活功能,网络总体会崩溃为线性函数。这意味着我们需要非线性来解决大多数有趣的问题。
Neural networks (including CNNs) are models with thousands of parameters which we try to optimize with gradient descent. Those models are able to fit a lot of different functions by having a non-linearity φ at their nodes. Without a non-linear activation function, the network collapses to a linear function in total. This means we need the non-linearity for most interesting problems.
φ的常见选择是逻辑函数,tanh或ReLU。所有这些都有大约0的最有趣的区域。这是梯度要么大到足以快速学习,要么在ReLU的情况下非线性。像这样的权重初始化方案试图让网络从优化的好点。 等其他技术也会将节点的平均值保持在0左右。
Common choices for φ are the logistic function, tanh or ReLU. All of them have the most interesting region around 0. This is where the gradient either is big enough to learn quickly or where a non-linearity is at all in case of ReLU. Weight initialization schemes like Glorot initialization try to make the network start at a good point for the optimization. Other techniques like Batch Normalization also keep the mean of the nodes input around 0.
因此,您计算(并减去)图像的平均值,以便第一个计算节点获得表现良好的数据。它的平均值为0,因此直觉是这有助于优化过程。
So you compute (and subtract) the mean of the image so that the first computing nodes get data which "behaves well". It has a mean of 0 and thus the intuition is that this helps the optimization process.
理论上,网络可以自己减去均值。因此,如果你训练的时间足够长,这应该不会太重要。但是,根据激活功能,足够长可能很重要。
In theory, a network can be able to "subtract" the mean by itself. So if you train long enough, this should not matter too much. However, depending on the activation function "long enough" can be important.
这篇关于我们为什么要在训练CNN时计算图像的平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!