我正在学习使用Python(scikit-学习)进行机器学习的一些基础知识,当我尝试实现K近邻算法时,发生错误:ValueError:找到样本数量不一致的输入变量:[426,143]。我不知道该如何处理。
这是我的代码:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
cancer = load_breast_cancer()
X_train, y_train, X_test, y_test = train_test_split(cancer.data,cancer.target,
                                                    stratify =
                                                    cancer.target,
                                                    random_state = 0)
clf = KNeighborsClassifier(n_neighbors = 6)
clf.fit(X_train, y_train)`

最佳答案

train_test_split按照X_train, X_test, y_train, y_test的顺序返回一个元组

您已将返回值分配给错误的变量,因此适合训练数据和测试数据,而不是训练数据和训练标签。

它应该是

X_train, X_test, y_train, y_test = train_test_split()

关于python - “样本数量不一致”-scikit-学习,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45400034/

10-12 19:36