python - 虹膜模型的K值不同的KNN模型的统计指标？

我写了一些python代码以使著名的虹膜数据集适合KNN模型，我尝试使用不同的k值，例如k = 2，k = 3，k = 5，以我对这些不同的k值（混淆矩阵，准确性得分）的理解和分类报告值应该不同，但是，无论我给出的k值是多少，统计指标输出都相同，并且“精度”，“召回”和“ f1-分数”都为1.00，如快照中所示codes and output。我在这里想念什么吗？谢谢！

from sklearn.model_selection import train_test_split

# first split the dataset into its attributes and labels
X = data.iloc[:, :-1].values
y = data.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30,
random_state=42)

from sklearn.neighbors import KNeighborsClassifier

# Instantiate learning model (k = 5)
clf = KNeighborsClassifier(n_neighbors=5)
# Fitting the model
clf.fit(X_train, y_train)
# Predicting the Test set results
y_pred = clf.predict(X_test)
print(y_pred)

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))
print("classification report:---------------------------\n")
print(classification_report(y_test, y_pred, labels=iris.target))

最佳答案

我认为您的输出是正确的：无论您为k选择什么值，您都将获得对测试集的完美分类。虹膜数据集相对容易。在杂色和维吉尼亚种之间只有真正的重叠，然后仅针对少数几个标本（可能是5-6个左右）。请查看this website中显示的一些图表。由于您仅测试了30％的数据，因此这些样本很可能不在测试集中。如果对整个数据集运行预测，则应该看到基于k的一些变化。

尝试更改这些行以查看它：

y_pred = clf.predict(X)
print(confusion_matrix(y, y_pred))

关于python - 虹膜模型的K值不同的KNN模型的统计指标？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/60133508/