找到的数组具有不一致的样本数

找到的数组具有不一致的样本数

本文介绍了ValueError:找到的数组具有不一致的样本数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码:

import pandas as pa
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score

def get_accuracy(X_train, y_train, y_test):
    perceptron = Perceptron(random_state=241)
    perceptron.fit(X_train, y_train)
    result = accuracy_score(y_train, y_test)
    return result

test_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-test.csv")
test_data.columns = ["class", "f1", "f2"]
train_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-train.csv")
train_data.columns = ["class", "f1", "f2"]

accuracy = get_accuracy(train_data[train_data.columns[1:]], train_data[train_data.columns[0]], test_data[test_data.columns[0]])
print(accuracy)

我不明白为什么会出现此错误:

I don't understand why I get this error:

Traceback (most recent call last):
  File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 35, in <module>
    accuracy = get_accuracy(train_data[train_data.columns[1:]],
train_data[train_data.columns[0]], test_data[test_data.columns[0]])
  File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 22, in get_accuracy
    result = accuracy_score(y_train, y_test)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 72, in _check_targets
    check_consistent_length(y_true, y_pred)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
    "%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [199 299]

我想通过得到这种类型的错误的方法precision_score来获得准确性.我用Google搜索时找不到任何可以帮助我的东西.谁能解释我发生了什么事?

I want to get accuracy by method accuracy_score by get this type of error. I Googled by I cannot find anything that can help me. Who can explain me what happens?

推荐答案

sklearn.metrics.accuracy_score()采用y_truey_pred自变量.也就是说,对于相同的数据集(可能是测试集),它想知道基本事实和模型预测的值.这样,它就可以评估您的模型与假设的完美模型相比的效果.

sklearn.metrics.accuracy_score() takes y_true and y_pred arguments. That is, for the same data set (presumably the test set), it wants to know the ground truth and the values predicted by your model. This will allow it to evaluate how well your model has performed compared to a hypothetical perfect model.

在您的代码中,您正在传递两个不同数据集的真实结果变量.这些结果都是真实的,绝不会反映模型对观察结果进行正确分类的能力!

In your code, you are passing the true outcome variables for two different data sets. These outcomes are both truth and in no way reflect your model's ability to correctly classify observations!

更新您的get_accuracy()函数以也将X_test作为参数,我认为这更符合您的意图:

Updating your get_accuracy() function to also take X_test as a parameter, I think this is more in line with what you intended to do:

def get_accuracy(X_train, y_train, X_test, y_test):
    perceptron = Perceptron(random_state=241)
    perceptron.fit(X_train, y_train)
    pred_test = perceptron.predict(X_test)
    result = accuracy_score(y_test, pred_test)
    return result

这篇关于ValueError:找到的数组具有不一致的样本数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 18:55