问题描述
我希望使用 caret 包并行运行随机森林,并且我希望设置可重现结果的种子,如 使用 caret 的完全可重现的并行模型.但是,我不明白从插入符号帮助中获取的以下代码中的第 9 行:为什么我们采样 22(加上第 12、23 行中的最后一个模型)整数(评估参数 k 的 12 个值)?有关信息,我希望运行 5 倍 CV 来评估 RF 参数mtry"的 584 个值.任何帮助深表感谢.谢谢.
I wish to run random forest in parallel using caret package, and I wish to set the seeds for reproducible result as in Fully reproducible parallel models using caret. However, I don't understand line 9 in the following code taken from caret help: why do we sample 22 (plus the last model in line 12, 23) integer numbers (12 values for parameter k are evaluated)? For information, I wish to run 5-fold CV to evaluate 584 values for RF parameter 'mtry'. Any help is much appreciated. Thank you.
## Not run:
## Do 5 repeats of 10-Fold CV for the iris data. We will fit
## a KNN model that evaluates 12 values of k and set the seed
## at each iteration.
set.seed(123)
seeds <- vector(mode = "list", length = 51)
for(i in 1:50) seeds[[i]] <- sample.int(1000, 22) # Why 22?
## For the last model:
seeds[[51]] <- sample.int(1000, 1)
ctrl <- trainControl(method = "repeatedcv",
repeats = 5,
seeds = seeds)
推荐答案
我会说这是一个错误,应该是 12 而不是 22.
I'd say it is a mistake, and should be 12 instead of 22.
据我所知,您将运行模型 10*5 = 50 次,对于 k 的每个值.因此,对于 1:50 中的 每个 i,您将需要 12 个种子(每 k 个种子).获得最佳 k 后,您将运行最终模型.这一次,你只需要一个种子(不再重复重采样).
From what I understand, you will be running the model 10*5 = 50 times, for each value of k. Hence, for each i in 1:50, you'll need 12 seeds (one for every k). After obtaining the best k, you will run the final model. This time, you only need one seed (no more repeated resampling).
这篇关于在插入符号中设置种子平行随机森林以获得可重现的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!