python - KNN预测在y_test上为100％

我正在尝试在虹膜数据集上实现K近邻，但是在进行预测之后，yhat可以100％无错误地出现，肯定有什么问题，我也不知道它是什么...

我创建了一个名为class_id的列，在其中进行了更改：

setosa = 1.0
杂色= 2.0
弗吉尼亚= 3.0

该列是float类型。

得到X和Y


    x = df[['sepal length', 'sepal width', 'petal length', 'petal width']].values

type（x）显示nparray


    y = df['class_id'].values

类型（y）显示nparray

规范化数据


    x = preprocessing.StandardScaler().fit(x).transform(x.astype(float))

创建培训和测试


    x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2, random_state = 42)

检查最佳K值：


    Ks = 12
    for i in range(1,Ks):
       k = i
       neigh = KNeighborsClassifier(n_neighbors=k).fit(x_train,y_train)
       yhat = neigh.predict(x_test)
       score = metrics.accuracy_score(y_test,yhat)
       print('K: ', k, ' score: ', score, '\n')

结果：

K：1分：0.9666666666666667

K：2分：1.0

K：3分：1.0

K：4分：1.0

K：5分：1.0

K：6得分：1.0

K：7得分：1.0

K：8得分：1.0

K：9得分：1.0

K：10分：1.0

K：11得分：1.0

用K = 5打印y_test和yhat


    print(yhat)
    print(y_test)

结果：

yhat：[2。 1. 3. 2. 2. 1. 2. 3. 2. 2. 3. 1. 1. 1. 1. 1. 2. 3. 2. 2. 3. 1. 1. 3. 1. 3。
3. 3. 3. 3. 1. 1.]

y_test：[2。 1. 3. 2. 2. 1. 2. 3. 2. 2. 3. 1. 1. 1. 1. 1. 2. 3. 2. 2. 3. 1. 1. 3. 1. 3。
3. 3. 3. 3. 1. 1.]

所有这些都不应该是100％正确的，肯定有错误

最佳答案

尝试制作一个混淆矩阵。测试您的测试数据的每个示例，并检查特异性，敏感性，准确性和准确性的指标。

哪里：

TN = True Negative
FN = False Negative
FP = False Positive
TP = True Positive

在这里您可以检查特异性和敏感性之间的区别
https://dzone.com/articles/ml-metrics-sensitivity-vs-specificity-difference

这里有一个示例，说明如何使用sklearn在python中获得一个混淆矩阵。

同时尝试制作ROC曲线（可选）
https://en.wikipedia.org/wiki/Receiver_operating_characteristic

0K

python - KNN预测在y_test上为100％