问题描述
假设我有一些通用数据集,对于这些数据集,OLS回归是最佳选择.因此,我生成了一个带有一些一阶项的模型,并决定将R中的Caret用于我的回归系数估计和误差估计.
Let's say I have some generic dataset for which an OLS regression is the best choice. So, I generate a model with some first-order terms and decide to use Caret in R for my regression coefficient estimates and error estimates.
在插入符号中,最终结果是:
In caret, this ends up being:
k10_cv = trainControl(method="cv", number=10)
ols_model = train(Y ~ X1 + X2 + X3, data = my_data, trControl = k10_cv, method = "lm")
从那里,我可以使用summary(ols_model)
提取回归信息,还可以通过调用ols_model
提取更多信息.
From there, I can pull out regression information using summary(ols_model)
and can also pull some more information by just calling ols_model
.
当我只看ols_model
时,是否通过典型的k倍CV方法计算出RMSE/R-square/MAE?另外,当生成我在summary(ols_model)
中看到的模型时,该模型是在整个数据集中训练的,还是在每个折痕处生成的模型的平均值?
When I just look at ols_model
, is the RMSE/R-square/MAE being calculated via the typical k-fold CV approach? Also, when the model I see in summary(ols_model)
is generated, is this model trained on the entire dataset or is it an average of models generated across each of the folds?
如果没有,为了交易偏差的偏见,是否有办法在Caret中获取一次被训练一次的ems中的OLS模型?
If not, in the interest of trading variance for bias, is there a way to acquire an OLS model within Caret that is trained on one fold at a time?
推荐答案
以下是您示例的可复制数据.
Here's reproducible data for your example.
library("caret")
my_data <- iris
k10_cv <- trainControl(method="cv", number=10)
set.seed(100)
ols_model <- train(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width,
data = my_data, trControl = k10_cv, method = "lm")
> ols_model$results
intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 TRUE 0.3173942 0.8610242 0.2582343 0.03881222 0.04784331 0.02960042
1)上面的ols_model$results
基于下面每个不同重采样的平均值:
1)The ols_model$results
above is based on the mean of each of the different resampling below:
> (ols_model$resample)
RMSE Rsquared MAE Resample
1 0.3386472 0.8954600 0.2503482 Fold01
2 0.3154519 0.8831588 0.2815940 Fold02
3 0.3167943 0.8904550 0.2441537 Fold03
4 0.2644717 0.9085548 0.2145686 Fold04
5 0.3769947 0.8269794 0.3070733 Fold05
6 0.3720051 0.7792611 0.2746565 Fold06
7 0.3258501 0.8095141 0.2647466 Fold07
8 0.2962375 0.8530810 0.2731445 Fold08
9 0.3059100 0.8351535 0.2611982 Fold09
10 0.2615792 0.9286246 0.2108592 Fold10
即
> mean(ols_model$resample$RMSE)==ols_model$results$RMSE
[1] TRUE
2)在整个训练集上训练模型.您可以使用lm
进行检查或为trainControl
指定method = "none"
.
2)The model is trained on the whole training set. You can check this with either using lm
or specify method = "none"
for the trainControl
.
coef(lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = my_data))
(Intercept) Sepal.Width Petal.Length Petal.Width
1.8559975 0.6508372 0.7091320 -0.5564827
与ols_model$finalModel
相同.
这篇关于Caret如何通过K折交叉验证生成OLS模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!