本文介绍了二元交叉熵损失如何在自编码器上工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只使用 Dense 层编写了一个普通的自动编码器.下面是我的代码:

I wrote a vanilla autoencoder using only Dense layer.Below is my code:

iLayer = Input ((784,))
layer1 = Dense(128, activation='relu' ) (iLayer)
layer2 = Dense(64, activation='relu') (layer1)
layer3 = Dense(28, activation ='relu') (layer2)
layer4 = Dense(64, activation='relu') (layer3)
layer5 = Dense(128, activation='relu' ) (layer4)
layer6 = Dense(784, activation='softmax' ) (layer5)
model = Model (iLayer, layer6)
model.compile(loss='binary_crossentropy', optimizer='adam')

(trainX, trainY), (testX, testY) =  mnist.load_data()
print ("shape of the trainX", trainX.shape)
trainX = trainX.reshape(trainX.shape[0], trainX.shape[1]* trainX.shape[2])
print ("shape of the trainX", trainX.shape)
model.fit (trainX, trainX, epochs=5, batch_size=100)

问题:

1) softmax 提供概率分布.明白了.这意味着,我将有一个包含 784 个值的向量,概率介于 0 和 1 之间.例如 [0.02, 0.03..... up to 784 items],将所有 784 个元素相加得到 1.

Questions:

1) softmax provides probability distribution. Understood. This means, I would have a vector of 784 values with probability between 0 and 1. For example [ 0.02, 0.03..... upto 784 items], summing all 784 elements provides 1.

2) 我不明白二元交叉熵是如何处理这些值的.二元交叉熵是针对两个输出值,对吧?

2) I don't understand how the binary crossentropy works with these values. Binary cross entropy is for two values of output, right?

推荐答案

在自动编码器的上下文中,模型的输入和输出是相同的.因此,如果输入值在 [0,1] 范围内,则可以使用 sigmoid 作为最后一层的激活函数.否则,您需要为最后一层使用适当的激活函数(例如 linear 这是默认的).

In the context of autoencoders the input and output of the model is the same. So, if the input values are in the range [0,1] then it is acceptable to use sigmoid as the activation function of last layer. Otherwise, you need to use an appropriate activation function for the last layer (e.g. linear which is the default one).

至于损失函数,它又回到了输入数据的值.如果输入数据介于零和一之间,则binary_crossentropy 作为损失函数是可以接受的.否则,您需要使用其他损失函数,例如 'mse'(即均方误差)或 'mae'(即平均绝对误差).请注意,如果输入值在 [0,1] 范围内,您可以使用 binary_crossentropy,因为它通常使用(例如 Keras 自动编码器教程这篇论文).但是,不要期望损失值会变为零,因为当预测和标签都不是零或一(无论它们是否相等)时,binary_crossentropy 不会返回零.这里是来自 Hugo Larochelle 在那里他解释了自动编码器中使用的损失函数(关于使用 binary_crossentropy 输入范围 [0,1] 开始于 5:30)

As for the loss function, it comes back to the values of input data again. If the input data are between zeros and ones , then binary_crossentropy is acceptable as the loss function. Otherwise, you need to use other loss functions such as 'mse' (i.e. mean squared error) or 'mae' (i.e. mean absolute error). Note that in the case of input values in range [0,1] you can use binary_crossentropy, as it is usually used (e.g. Keras autoencoder tutorial and this paper). However, don't expect that the loss value becomes zero since binary_crossentropy does not return zero when both prediction and label are not either zero or one (no matter they are equal or not). Here is a video from Hugo Larochelle where he explains the loss functions used in autoencoders (the part about using binary_crossentropy with inputs in range [0,1] starts at 5:30)

具体而言,在您的示例中,您使用的是 MNIST 数据集.所以默认情况下,MNIST 的值是 [0, 255] 范围内的整数.通常你需要先对它们进行标准化:

Concretely, in your example, you are using the MNIST dataset. So by default the values of MNIST are integers in the range [0, 255]. Usually you need to normalize them first:

trainX = trainX.astype('float32')
trainX /= 255.

现在值将在 [0,1] 范围内.所以 sigmoid 可以用作激活函数,binary_crossentropymse 作为损失函数.

Now the values would be in range [0,1]. So sigmoid can be used as the activation function and either of binary_crossentropy or mse as the loss function.

为什么即使真实标签值(即真实值)在 [0,1] 范围内也可以使用 binary_crossentropy?

请注意,我们试图在训练中最小化损失函数.因此,如果我们使用的损失函数在预测等于真实标签时达到其最小值(可能不一定等于零),那么这是一个可以接受的选择.让我们验证一下 binray 交叉熵的情况,其定义如下:

Note that we are trying to minimize the loss function in training. So if the loss function we have used reaches its minimum value (which may not be necessarily equal to zero) when prediction is equal to true label, then it is an acceptable choice. Let's verify this is the case for binray cross-entropy which is defined as follows:

bce_loss = -y*log(p) - (1-y)*log(1-p)

其中 y 是真实标签,p 是预测值.让我们将 y 视为固定的,看看 p 的什么值可以最小化这个函数:我们需要对 p 取导数(我已经假设log 是为了简化计算的自然对数函数):

where y is the true label and p is the predicted value. Let's consider y as fixed and see what value of p minimizes this function: we need to take the derivative with respect to p (I have assumed the log is the natural logarithm function for simplicity of calculations):

bce_loss_derivative = -y*(1/p) - (1-y)*(-1/(1-p)) = 0 =>
                      -y/p + (1-y)/(1-p) = 0 =>
                      -y*(1-p) + (1-y)*p = 0 =>
                      -y + y*p + p - y*p = 0 =>
                       p - y = 0 => y = p

正如您所看到的,当 y=p 时,即当真实标签等于预测标签时,二进制交叉熵具有最小值,这正是我们正在寻找的.

As you can see binary cross-entropy have the minimum value when y=p, i.e. when the true label is equal to predicted label and this is exactly what we are looking for.

这篇关于二元交叉熵损失如何在自编码器上工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-25 12:05