第三,至少可以说,您的训练/验证/测试集划分非常不寻常;标准做法要求分配大约70/15/15%或类似的费用,而您使用的分配是38/7/55%(即340/60/481样本)... 最后,在不知道数据详细信息的情况下,很可能只有340个样本不足以适合您这样的LSTM模型,从而无法完成4类分类任务. 对于初学者,首先将数据更适当地分配到训练/验证/测试集中,并确保将苹果与苹果进行比较... PS在类似的问题中,您还应该包括您的model.fit()部分.I need some help in order to understand how accuracy is calculated when fitting a model in Keras.This is a sample history of training the model:Train on 340 samples, validate on 60 samplesEpoch 1/100340/340 [==============================] - 5s 13ms/step - loss: 0.8081 - acc: 0.7559 - val_loss: 0.1393 - val_acc: 1.0000Epoch 2/100340/340 [==============================] - 3s 9ms/step - loss: 0.7815 - acc: 0.7647 - val_loss: 0.1367 - val_acc: 1.0000Epoch 3/100340/340 [==============================] - 3s 10ms/step - loss: 0.8042 - acc: 0.7706 - val_loss: 0.1370 - val_acc: 1.0000...Epoch 25/100340/340 [==============================] - 3s 9ms/step - loss: 0.6006 - acc: 0.8029 - val_loss: 0.2418 - val_acc: 0.9333Epoch 26/100340/340 [==============================] - 3s 9ms/step - loss: 0.5799 - acc: 0.8235 - val_loss: 0.3004 - val_acc: 0.8833So, validation accuracy is 1 in the first epochs? How can the validation accuracy be better than the training accuracy?This are figures that show all values of accuracy and loss:Then I use sklearn metrics to evaluate final results:def evaluate(predicted_outcome, expected_outcome): f1_score = metrics.f1_score(expected_outcome, predicted_outcome, average='weighted') balanced_accuracy_score = metrics.balanced_accuracy_score(expected_outcome, predicted_outcome) print('****************************') print('| MODEL PERFORMANCE REPORT |') print('****************************') print('Average F1 score = {:0.2f}.'.format(f1_score)) print('Balanced accuracy score = {:0.2f}.'.format(balanced_accuracy_score)) print('Confusion matrix') print(metrics.confusion_matrix(expected_outcome, predicted_outcome)) print('Other metrics') print(metrics.classification_report(expected_outcome, predicted_outcome))I get this output (as you can see, the results are terrible):****************************| MODEL PERFORMANCE REPORT |****************************Average F1 score = 0.25.Balanced accuracy score = 0.32.Confusion matrix[[ 7 24 2 40] [ 11 70 4 269] [ 0 0 0 48] [ 0 0 0 6]]Other metrics precision recall f1-score support 0 0.39 0.10 0.15 73 1 0.74 0.20 0.31 354 2 0.00 0.00 0.00 48 3 0.02 1.00 0.03 6 micro avg 0.17 0.17 0.17 481 macro avg 0.29 0.32 0.12 481weighted avg 0.61 0.17 0.25 481Why the accuracy and loss values of Keras fit functions are so different from the values of sklearn metrics?This is my model, in case it helps:model = Sequential()model.add(LSTM( units=100, # the number of hidden states return_sequences=True, input_shape=(timestamps,nb_features), dropout=0.2, recurrent_dropout=0.2 ) )model.add(Dropout(0.2))model.add(Flatten())model.add(Dense(units=nb_classes, activation='softmax'))model.compile(loss="categorical_crossentropy", metrics = ['accuracy'], optimizer='adadelta')Input data dimensions:400 train sequences481 test sequencesX_train shape: (400, 20, 17)X_test shape: (481, 20, 17)y_train shape: (400, 4)y_test shape: (481, 4)This is how I apply sklearn metrics:testPredict = model.predict(np.array(X_test))y_test = np.argmax(y_test.values, axis=1)y_pred = np.argmax(testPredict, axis=1)evaluate(y_pred, y_test)It looks that I miss something. 解决方案 You sound a little confused.To start with, you are comparing apples to oranges, i.e. the validation accuracy reported by Keras on a 60-sample set (notice the first informative message printed by Keras, Train on 340 samples, validate on 60 samples) with the test accuracy reported by scikit-learn on your 481-sample test set.Second, your validation set of only 60 samples is way too small; in such small samples, wild fluctuations of the calculated metrics such as the ones you report are certainly not unexpected (there is a reason why we need datasets of sufficient size, and not only training ones).Third, your training/validation/test set division is quite unusual, to say the least; standard practice asks for allocations of roughly 70/15/15 per cent or similar, while you are using an allocation of 38/7/55 per cent (i.e. 340/60/481 samples)...Lastly, and without knowing the details of your data, it may very well be the case that only 340 samples are not enough to fit an LSTM model such as yours for a good 4-class classification task. For starters, start with a more appropriate allocation of your data into training/validation/test sets and be sure you compare apples to apples...PS In similar questions, you should also include your model.fit() part. 这篇关于Sklearn指标值与Keras值有很大不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 10-11 00:31