问题描述
我是数据科学新手,正在研究 。我正在对其进行Logistic回归,以预测测试数据集中的乘客是否存活还是死亡。
I'm a data science noob and am working on the Kaggle Titanic dataset. I'm running a Logistic Regression on it to predict whether passengers in the test data set survived or died.
我同时清理训练数据和测试数据,并运行Logistic回归适合训练数据。
I clean both the training and test data and run the Logistic Regression fit on the training data. All good.
train = pd.read_csv('train.csv')
X_train = train.drop('Survived',axis=1)
y_train = train['Survived']
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
然后我运行预测模型像这样的测试数据:
Then I run the prediction model on the test data as such:
test = pd.read_csv('test.csv')
predictions = logmodel.predict(test)
然后我尝试打印混淆矩阵:
I then try to print the Confusion Matrix:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(test,predictions))
我收到一条错误消息:
这是什么意思,我该如何解决?
What does this mean and how do I fix it?
我看到的一些潜在问题是:
Some potential issues I see are:
- 我在做一些超级事测试数据上的预测模型愚蠢和错误。
- 要素年龄和票价(乘客的机票价格
的价格)的值是浮动的,其余的则为浮动值是整数。
我在哪里出错?谢谢您的帮助!
Where am I going wrong? Thanks for your help!
推荐答案
正如m-dz所说, confusion_matrix
,而在您的代码中,通过整个 test
数据框。
As m-dz has commented, confusion_matrix
expects 2 arrays, while in your code you pass the whole test
dataframe.
此外,另一个不符合顺序,这很重要。
Moreover, another common mistake is not respecting the order of the arguments, which matters.
总而言之,您应该要求
confusion_matrix(test['Survived'], predictions)
这篇关于Logistic回归-ValueError:分类指标无法处理连续多次输出和二进制目标的混合情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!