返回的输出少于输入

返回的输出少于输入

本文介绍了r caret predict 返回的输出少于输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 caret 来训练下面的 rpart 模型.

I used caret to train an rpart model below.

trainIndex <- createDataPartition(d$Happiness, p=.8, list=FALSE)
dtrain <- d[trainIndex, ]
dtest <- d[-trainIndex, ]
fitControl <- trainControl(## 10-fold CV
  method = "repeatedcv", number=10, repeats=10)
fitRpart <- train(Happiness ~ ., data=dtrain, method="rpart",
                trControl = fitControl)
testRpart <- predict(fitRpart, newdata=dtest)

dtest 包含 1296 个观测值,所以我希望 testRpart 产生一个长度为 1296 的向量.相反,它是 1077 长,即 219 短.

dtest contains 1296 observations, so I expected testRpart to produce a vector of length 1296. Instead it's 1077 long, i.e. 219 short.

当我对 dtest 的前 220 行进行预测时,我得到的预测结果为 1,因此它始终为 219 短.

When I ran the prediction on the first 220 rows of dtest, I got a predicted result of 1, so it's consistently 219 short.

关于为什么会这样的任何解释,以及我可以做些什么来获得一致的输入输出?

Any explanation on why this is so, and what I can do to get a consistent output to the input?

d 可以从 此处 重现上述内容.

d can be loaded from here to reproduce the above.

推荐答案

我下载了您的数据并找到了解释差异的原因.

I downloaded your data and found what explains the discrepancy.

如果您只是从数据集中删除缺失值,则输出的长度匹配:

If you simply remove the missing values from your dataset, the length of the outputs match:

testRpart <- predict(fitRpart, newdata = na.omit(dtest))

注意 nrow(na.omit(dtest)) 是 1103,而 length(testRpart) 是 1103.所以你需要一个解决缺失值的策略.查看 ?predict.rpart 和 na.action 参数的选项来选择你想要的.

Note nrow(na.omit(dtest)) is 1103, and length(testRpart) is 1103. So you need a strategy to address missing values. See ?predict.rpart and the options for the na.action parameter to choose what you want.

这篇关于r caret predict 返回的输出少于输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:13