问题描述
这是我的代码:
import pandas as pa
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
def get_accuracy(X_train, y_train, y_test):
perceptron = Perceptron(random_state=241)
perceptron.fit(X_train, y_train)
result = accuracy_score(y_train, y_test)
return result
test_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-test.csv")
test_data.columns = ["class", "f1", "f2"]
train_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-train.csv")
train_data.columns = ["class", "f1", "f2"]
accuracy = get_accuracy(train_data[train_data.columns[1:]], train_data[train_data.columns[0]], test_data[test_data.columns[0]])
print(accuracy)
我不明白为什么会出现此错误:
I don't understand why I get this error:
Traceback (most recent call last):
File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 35, in <module>
accuracy = get_accuracy(train_data[train_data.columns[1:]],
train_data[train_data.columns[0]], test_data[test_data.columns[0]])
File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 22, in get_accuracy
result = accuracy_score(y_train, y_test)
File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 172, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 72, in _check_targets
check_consistent_length(y_true, y_pred)
File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [199 299]
我想通过得到这种类型的错误的方法precision_score来获得准确性.我用Google搜索时找不到任何可以帮助我的东西.谁能解释我发生了什么事?
I want to get accuracy by method accuracy_score by get this type of error. I Googled by I cannot find anything that can help me. Who can explain me what happens?
推荐答案
sklearn.metrics.accuracy_score()
采用y_true
和y_pred
自变量.也就是说,对于相同的数据集(可能是测试集),它想知道基本事实和模型预测的值.这样,它就可以评估您的模型与假设的完美模型相比的效果.
sklearn.metrics.accuracy_score()
takes y_true
and y_pred
arguments. That is, for the same data set (presumably the test set), it wants to know the ground truth and the values predicted by your model. This will allow it to evaluate how well your model has performed compared to a hypothetical perfect model.
在您的代码中,您正在传递两个不同数据集的真实结果变量.这些结果都是真实的,绝不会反映模型对观察结果进行正确分类的能力!
In your code, you are passing the true outcome variables for two different data sets. These outcomes are both truth and in no way reflect your model's ability to correctly classify observations!
更新您的get_accuracy()
函数以也将X_test
作为参数,我认为这更符合您的意图:
Updating your get_accuracy()
function to also take X_test
as a parameter, I think this is more in line with what you intended to do:
def get_accuracy(X_train, y_train, X_test, y_test):
perceptron = Perceptron(random_state=241)
perceptron.fit(X_train, y_train)
pred_test = perceptron.predict(X_test)
result = accuracy_score(y_test, pred_test)
return result
这篇关于ValueError:找到的数组具有不一致的样本数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!