本文介绍了Sklearn 0.20+的交叉验证?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试进行交叉验证,但遇到一个错误:找到的样本数量不一致的输入变量:[18,1]"

I am trying to do cross validation and I am running into an error that says: 'Found input variables with inconsistent numbers of samples: [18, 1]'

我将熊猫数据框(df)中的不同列用作功能,最后一列用作标签.这源自UC Irvine的机器学习存储库.导入我过去使用过的交叉验证程序包时,出现错误,提示它可能已贬值.我将运行决策树,SVM和K-NN.

I am using different columns in a pandas data frame (df) as the features, with the last column as the label. This is derived from the machine learning repository for UC Irvine. When importing the cross-validation package that I have used in the past, I am getting an error that it may have depreciated. I am going to be running a decision tree, SVM, and K-NN.

我的代码如下:

feature = [df['age'], df['job'], df['marital'], df['education'], df['default'], df['housing'], df['loan'], df['contact'],
       df['month'], df['day_of_week'], df['campaign'], df['pdays'], df['previous'], df['emp.var.rate'], df['cons.price.idx'],
       df['cons.conf.idx'], df['euribor3m'], df['nr.employed']]
label = [df['y']]

from sklearn.cross_validation import train_test_split
from sklearn.model_selection import cross_val_score
# Model Training
x = feature[:]
y = label
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5)

任何帮助都会很棒!

推荐答案

cross_validation模块已弃用.新模块model_selection取代了它.因此,您使用cross_validation所做的所有事情.现在在model_selection中可用.然后您上面的代码将变为:

cross_validation module is deprecated. The new module model_selection has taken its place. So everything you did with cross_validation. is now available in model_selection. Then your above code becomes:

feature = [df['age'], df['job'], df['marital'], df['education'], df['default'], df['housing'], df['loan'], df['contact'],
       df['month'], df['day_of_week'], df['campaign'], df['pdays'], df['previous'], df['emp.var.rate'], df['cons.price.idx'],
       df['cons.conf.idx'], df['euribor3m'], df['nr.employed']]
label = [df['y']]

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score

现在,就声明X和y而言,为什么要将它们包装在列表中.像这样使用它们:

Now as far as declaring the X and y is concerned, why are you wrapping them in a list. Just use them like this:

feature = df[['age', 'job', 'marital', 'education', 'default', 'housing',
              'loan', 'contact', 'month', 'day_of_week', 'campaign',
              'pdays', 'previous', 'emp.var.rate', 'cons.price.idx',
              'cons.conf.idx', 'euribor3m', 'nr.employed']]
label = df['y']

然后您可以简单地使用您的代码,而无需进行任何更改.

And then you can simply use your code, without changing anything.

# Model Training
x = feature[:]
y = label
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5)

对于您关于交叉验证中折叠的最后一个问题,sklearn中有多个类可以做到这一点(取决于任务).请看一下:

And for your last question about folds in cross-validation, there are multiple classes in sklearn which does this (depending upon task). Please have a look at:

其中包含折叠迭代器.记住,所有这些都在model_selection软件包中.

Which contains fold iterators. And remember, all this is present in model_selection package.

这篇关于Sklearn 0.20+的交叉验证?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:35