删除了文本,因为我还没有找到解决方案,因此我意识到我不希望其他人窃取有效的第一部分。

最佳答案

您对confusion_matrix的输入必须是一个整数数组,而不是一个热编码。

# Predicting the Test set results
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5)
matrix = metrics.confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))


低于输出将以这种方式出现,因此通过给出概率阈值.5会将其转换为二进制。


  输出(y_pred):


[0.87812372 0.77490434 0.30319547 0.84999743]


sklearn.metrics.accuracy_score(y_true,y_pred)方法将y_pred定义为:

y_pred:类似1d数组,或标签指示符数组/稀疏矩阵。预测的标签,由分类器返回。

这意味着y_pred必须为1或0的数组(谓词标签)。他们不应该是概率。

错误的根本原因是理论上的而不是计算上的问题:您正在尝试在无意义的回归(即数值预测)模型(神经逻辑模型)中使用分类指标(准确性)。

就像大多数性能指标一样,准确性将苹果与苹果进行了比较(即,真实标签为0/1,预测值再次为0/1);因此,当您要求函数将二进制真标签(苹果)与连续预测(橙色)进行比较时,您会得到预期的错误,该错误消息从计算的角度确切地告诉您问题出在哪里:

Classification metrics can't handle a mix of binary and continuous target


尽管该消息并没有直接告诉您您正在尝试计算对您的问题无效的指标(并且我们实际上不应期望它走得那么远),但scikit-learning当然是一件好事至少会给您直接和明确的警告,表示您尝试做错事;在其他框架中并不一定是这种情况-例如,请参阅Keras在非常相似的情况下的行为,您根本不会得到任何警告,而最终只是抱怨在回归设置中“准确性”低下...

from keras import models
from keras.layers import Dense, Dropout
from keras.utils import to_categorical
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from keras.models import Sequential
from keras.layers import Dense, Activation
from sklearn.cross_validation import  train_test_split
from sklearn import metrics
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.preprocessing import StandardScaler


# read the csv file and convert into arrays for the machine to process
df = pd.read_csv('dataset_ori.csv')
dataset = df.values

# split the dataset into input features and the feature to predict
X = dataset[:,0:7]
Y = dataset[:,7]

# Splitting into Train and Test Set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset,
                                                    response,
                                                    test_size = 0.2,
                                                    random_state = 0)

# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu', input_dim =7 ))
model.add(Dropout(0.5))
# Adding the second hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.5))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 20)

# Train model
scaler = StandardScaler()
classifier.fit(scaler.fit_transform(X_train.values), y_train)

# Summary of neural network
classifier.summary()

# Predicting the Test set results & Giving a threshold probability
y_prediction = classifier.predict_classes(scaler.transform(X_test.values))
print ("\n\naccuracy" , np.sum(y_prediction == y_test) / float(len(y_test)))
y_prediction = (y_prediction > 0.5)




## EXTRA: Confusion Matrix Visualize
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred) # rows = truth, cols = prediction
df_cm = pd.DataFrame(cm, index = (0, 1), columns = (0, 1))
plt.figure(figsize = (10,7))
sn.set(font_scale=1.4)
sn.heatmap(df_cm, annot=True, fmt='g')
print("Test Data Accuracy: %0.4f" % accuracy_score(y_test, y_pred))

#Let's see how our model performed
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

关于python - 如何找到神经网络的假阳性率和假阴性率?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57687527/

10-12 21:16