问题描述
我使用 caret
来训练下面的 rpart
模型.
I used caret
to train an rpart
model below.
trainIndex <- createDataPartition(d$Happiness, p=.8, list=FALSE)
dtrain <- d[trainIndex, ]
dtest <- d[-trainIndex, ]
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv", number=10, repeats=10)
fitRpart <- train(Happiness ~ ., data=dtrain, method="rpart",
trControl = fitControl)
testRpart <- predict(fitRpart, newdata=dtest)
dtest
包含 1296 个观测值,所以我希望 testRpart
产生一个长度为 1296 的向量.相反,它是 1077 长,即 219 短.
dtest
contains 1296 observations, so I expected testRpart
to produce a vector of length 1296. Instead it's 1077 long, i.e. 219 short.
当我对 dtest
的前 220 行进行预测时,我得到的预测结果为 1,因此它始终为 219 短.
When I ran the prediction on the first 220 rows of dtest
, I got a predicted result of 1, so it's consistently 219 short.
关于为什么会这样的任何解释,以及我可以做些什么来获得一致的输入输出?
Any explanation on why this is so, and what I can do to get a consistent output to the input?
d
可以从 此处 重现上述内容.
d
can be loaded from here to reproduce the above.
推荐答案
我下载了您的数据并找到了解释差异的原因.
I downloaded your data and found what explains the discrepancy.
如果您只是从数据集中删除缺失值,则输出的长度匹配:
If you simply remove the missing values from your dataset, the length of the outputs match:
testRpart <- predict(fitRpart, newdata = na.omit(dtest))
注意 nrow(na.omit(dtest))
是 1103,而 length(testRpart)
是 1103.所以你需要一个解决缺失值的策略.查看 ?predict.rpart
和 na.action 参数的选项来选择你想要的.
Note nrow(na.omit(dtest))
is 1103, and length(testRpart)
is 1103. So you need a strategy to address missing values. See ?predict.rpart
and the options for the na.action parameter to choose what you want.
这篇关于r caret predict 返回的输出少于输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!