问题描述
我正在学习神经网络,我在 Keras 中构建了一个简单的神经网络,用于来自 UCI 机器学习存储库的虹膜数据集分类.我使用了一个带有 8 个隐藏节点的隐藏层网络.Adam 优化器以 0.0005 的学习率使用,运行 200 个 Epoch.Softmax 用于输出,损失作为分类交叉熵.我得到以下学习曲线.
如您所见,准确性的学习曲线有很多平坦区域,我不明白为什么.误差似乎在不断减少,但准确性似乎并没有以同样的方式增加.准确度学习曲线中的平坦区域意味着什么?为什么即使误差似乎在减少,这些区域的准确度也没有增加?
这在训练中是正常的还是更有可能是我做错了什么?
dataframe = pd.read_csv("iris.csv", header=None)数据集 = dataframe.valuesX = dataset[:,0:4].astype(float)y = 数据集[:,4]标量 = StandardScaler()X = scalar.fit_transform(X)label_encoder = LabelEncoder()y = label_encoder.fit_transform(y)编码器 = OneHotEncoder()y = encoder.fit_transform(y.reshape(-1,1)).toarray()# 创建模型模型 = 顺序()model.add(Dense(8, input_dim=4, activation='relu'))模型.添加(密集(3,激活=softmax"))# 编译模型adam = optimizers.Adam(lr=0.0005,beta_1=0.9,beta_2=0.999,epsilon=1e-08,decay=0.0)model.compile(loss='categorical_crossentropy',优化器=亚当,指标=['准确度'])# 拟合模型log = model.fit(X, y, epochs=200,batch_size=5,validation_split=0.2)fig = plt.figure()fig.suptitle("Adam, lr=0.0006, 一层隐藏层")ax = fig.add_subplot(1,2,1)ax.set_title('成本')ax.plot(log.history['loss'], label='Training')ax.plot(log.history['val_loss'], label='Validation')ax.legend()ax = fig.add_subplot(1,2,2)ax.set_title('准确度')ax.plot(log.history['acc'], label='Training')ax.plot(log.history['val_acc'], label='Validation')ax.legend()图.show()
稍微了解 loss 和 accuracy 的实际含义(和机制)将是这里有很多帮助(另请参阅我的
y[i]
是真正的标签(0 或 1)p[i]
是预测值([0,1] 中的实数),通常解释为概率output[i]
(等式中未显示)是p[i]
的舍入,以便将它们也进行转换到 0 或 1;正是这个数量进入了准确性的计算,隐含地涉及一个阈值(通常在0.5
用于二进制分类),因此如果p[i] >0.5
,则output[i] = 1
,否则如果p[i] ,
output[i] = 0
代码>.
现在,让我们假设我们有一个真实的标签 y[k] = 1
,为此,在训练的早期,我们对 p[k] = 0.1
;然后,将数字代入上面的损失方程:
- 这个样本对loss的贡献是
loss[k] = -log(0.1) = 2.3
- 因为
p[k] ,我们将有
output[k] = 0
,因此它对 accuracy 的贡献将为 0(错误分类)
现在假设,在下一个训练步骤中,我们确实变得更好了,我们得到 p[k] = 0.22
;现在我们有:
loss[k] = -log(0.22) = 1.51
- 因为它仍然是
p[k] ,我们又得到了一个错误的分类(
output[k] = 0
),对准确性的贡献为零
希望你开始明白这个想法,但让我们再看一个稍后的快照,我们得到,比如,p[k] = 0.49
;然后:
loss[k] = -log(0.49) = 0.71
- 仍然是
output[k] = 0
,即错误的分类对准确率的贡献为零
正如你所看到的,我们的分类器在这个特定的样本中确实变得更好了,即它的损失从 2.3 到 1.5 再到 0.71,但是这种改进仍然没有体现在准确度上,它只关心 正确分类:从准确性的角度来看,我们对 p[k]
的更好估计并不重要,只要这些估计值保持在 0.5 阈值以下即可.>
当我们的 p[k]
超过 0.5 的阈值时,损失继续像到目前为止一样平滑地减少,但现在我们有一个跳跃此样本的准确度贡献从 0 到 1/n
,其中 n
是样本总数.
同样,您可以自己确认,一旦我们的 p[k]
超过 0.5,因此给出正确的分类(现在对准确度有积极贡献),进一步改进它(即越来越接近1.0
)仍然会继续减少损失,但对准确性没有进一步影响.
类似的论点适用于真实标签 y[m] = 0
和 p[m]
的相应估计值开始于 0.5 阈值以上的情况;并且即使 p[m]
初始估计值低于 0.5(因此提供了正确的分类并且已经对准确性做出了积极贡献),它们向 0.0
的收敛将减少损失而不改善进一步提高准确性.
将各个部分放在一起,希望您现在可以说服自己平稳减少损失和更逐步"的损失.提高准确性不仅不相容,而且确实非常有意义.
在更一般的层面上:从数学优化的严格角度来看,没有所谓的准确性"这样的东西.——只有损失;仅从业务的角度讨论准确性(并且不同的业务逻辑甚至可能需要与默认值 0.5 不同的阈值).引用我自己的链接答案:
损失和准确性是不同的东西;粗略地说,从业务的角度来看,准确度是我们真正感兴趣的,而损失是学习算法(优化器)试图从数学中最小化的目标函数em> 观点.更粗略地说,您可以将损失视为翻译".业务目标(准确性)到数学领域的转换,分类问题中必要的转换(在回归问题中,通常损失和业务目标是相同的,或者至少原则上可以相同,例如RMSE)...
I am learning neural networks and I built a simple one in Keras for the iris dataset classification from the UCI machine learning repository. I used a one hidden layer network with a 8 hidden nodes. Adam optimizer is used with a learning rate of 0.0005 and is run for 200 Epochs. Softmax is used at the output with loss as catogorical-crossentropy. I am getting the following learning curves.
As you can see, the learning curve for the accuracy has a lot of flat regions and I don't understand why. The error seems to be decreasing constantly but the accuracy doesn't seem to be increasing in the same manner. What does the flat regions in the accuracy learning curve imply? Why is the accuracy not increasing at those regions even though error seems to be decreasing?
Is this normal in training or it is more likely that I am doing something wrong here?
dataframe = pd.read_csv("iris.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
y = dataset[:,4]
scalar = StandardScaler()
X = scalar.fit_transform(X)
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)
encoder = OneHotEncoder()
y = encoder.fit_transform(y.reshape(-1,1)).toarray()
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
# Compile model
adam = optimizers.Adam(lr=0.0005, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy',
optimizer=adam,
metrics=['accuracy'])
# Fit the model
log = model.fit(X, y, epochs=200, batch_size=5, validation_split=0.2)
fig = plt.figure()
fig.suptitle("Adam, lr=0.0006, one hidden layer")
ax = fig.add_subplot(1,2,1)
ax.set_title('Cost')
ax.plot(log.history['loss'], label='Training')
ax.plot(log.history['val_loss'], label='Validation')
ax.legend()
ax = fig.add_subplot(1,2,2)
ax.set_title('Accuracy')
ax.plot(log.history['acc'], label='Training')
ax.plot(log.history['val_acc'], label='Validation')
ax.legend()
fig.show()
A little understanding of the actual meanings (and mechanics) of both loss and accuracy will be of much help here (refer also to this answer of mine, although I will reuse some parts)...
For the sake of simplicity, I will limit the discussion to the case of binary classification, but the idea is generally applicable; here is the equation of the (logistic) loss:
y[i]
are the true labels (0 or 1)p[i]
are the predictions (real numbers in [0,1]), usually interpreted as probabilitiesoutput[i]
(not shown in the equation) is the rounding ofp[i]
, in order to convert them also to 0 or 1; it is this quantity that enters the calculation of accuracy, implicitly involving a threshold (normally at0.5
for binary classification), so that ifp[i] > 0.5
, thenoutput[i] = 1
, otherwise ifp[i] <= 0.5
,output[i] = 0
.
Now, let's suppose that we have a true label y[k] = 1
, for which, at an early point during training, we make a rather poor prediction of p[k] = 0.1
; then, plugging the numbers to the loss equation above:
- the contribution of this sample to the loss, is
loss[k] = -log(0.1) = 2.3
- since
p[k] < 0.5
, we'll haveoutput[k] = 0
, hence its contribution to the accuracy will be 0 (wrong classification)
Suppose now that, an the next training step, we are getting better indeed, and we get p[k] = 0.22
; now we have:
loss[k] = -log(0.22) = 1.51
- since it still is
p[k] < 0.5
, we have again a wrong classification (output[k] = 0
) with zero contribution to the accuracy
Hopefully you start getting the idea, but let's see one more later snapshot, where we get, say, p[k] = 0.49
; then:
loss[k] = -log(0.49) = 0.71
- still
output[k] = 0
, i.e. wrong classification with zero contribution to the accuracy
As you can see, our classifier indeed got better in this particular sample, i.e. it went from a loss of 2.3 to 1.5 to 0.71, but this improvement has still not shown up in the accuracy, which cares only for correct classifications: from an accuracy viewpoint, it doesn't matter that we get better estimates for our p[k]
, as long as these estimates remain below the threshold of 0.5.
The moment our p[k]
exceeds the threshold of 0.5, the loss continues to decrease smoothly as it has been so far, but now we have a jump in the accuracy contribution of this sample from 0 to 1/n
, where n
is the total number of samples.
Similarly, you can confirm by yourself that, once our p[k]
has exceeded 0.5, hence giving a correct classification (and now contributing positively to the accuracy), further improvements of it (i.e getting closer to 1.0
) still continue to decrease the loss, but have no further impact to the accuracy.
Similar arguments hold for cases where the true label y[m] = 0
and the corresponding estimates for p[m]
start somewhere above the 0.5 threshold; and even if p[m]
initial estimates are below 0.5 (hence providing correct classifications and already contributing positively to the accuracy), their convergence towards 0.0
will decrease the loss without improving the accuracy any further.
Putting the pieces together, hopefully you can now convince yourself that a smoothly decreasing loss and a more "stepwise" increasing accuracy not only are not incompatible, but they make perfect sense indeed.
On a more general level: from the strict perspective of mathematical optimization, there is no such thing called "accuracy" - there is only the loss; accuracy gets into the discussion only from a business perspective (and a different business logic might even call for a threshold different than the default 0.5). Quoting from my own linked answer:
这篇关于损失&准确性 - 这些是合理的学习曲线吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!