本文介绍了插入符号上的 parRF 不适用于多个核心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

parRF 不适用于具有多个核心的我,这非常具有讽刺意味,因为 parRF 中的 par 代表并行.如果这是相关信息,我在 Windows 机器上.我检查了我是否使用了最新的关于 caret 和 doParallel 的最佳版本.

parRF from the caret R package is not working for me with more than one core, which is quite ironic, given the par in parRF stands for parallel. I'm on a windows machine, if that is a relevant piece of information. I checked that I'm using the latest an greatest regarding caret and doParallel.

我做了一个最小的例子,并在下面给出了结果.有什么想法吗?

I made a minimal example and and give the results below. Any ideas?

源代码

library(caret)
library(doParallel)

trCtrl <- trainControl(
  method = "repeatedcv"
  , number = 2
  , repeats = 5
  , allowParallel = TRUE
)

# WORKS
registerDoParallel(1)
train(form = Species~., data=iris, trControl = trCtrl, method="parRF")
closeAllConnections()

# FAILS
registerDoParallel(2)
train(form = Species~., data=iris, trControl = trCtrl, method="parRF")
closeAllConnections()

输出

> library(caret)
> library(doParallel)
>
> trCtrl <- trainControl(
+   method = "repeatedcv"
+   , number = 2
+   , repeats = 5
+   , allowParallel = TRUE
+ )
>
>
> # WORKS
> registerDoParallel(1)
> train(form = Species~., data=iris, trControl = trCtrl, method="parRF")
Parallel Random Forest

150 samples
  4 predictors
  3 classes: 'setosa', 'versicolor', 'virginica'

... some more model output, works fine!
> closeAllConnections()
>
> # FAILS
> registerDoParallel(2)
> train(form = Species~., data=iris, trControl = trCtrl, method="parRF")
Error in train.default(x, y, weights = w, ...) :
  final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
2: In train.default(x, y, weights = w, ...) :
  missing values found in aggregated results
> closeAllConnections()

会话信息

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] doParallel_1.0.8   iterators_1.0.7    foreach_1.4.2      e1071_1.6-3        randomForest_4.6-7 caret_6.0-30       ggplot2_1.0.0
[8] lattice_0.20-29

loaded via a namespace (and not attached):
 [1] BradleyTerry2_1.0-4 brglm_0.5-9         car_2.0-20          class_7.3-10        codetools_0.2-8     colorspace_1.2-4
 [7] compiler_3.1.0      digest_0.6.4        gnm_1.0-7           grid_3.1.0          gtable_0.1.2        gtools_3.4.1
[13] lme4_1.1-6          MASS_7.3-31         Matrix_1.1-3        minqa_1.2.3         munsell_0.4.2       nlme_3.1-117
[19] nnet_7.3-8          plyr_1.8.1          proto_0.3-10        qvcalc_0.8-8        Rcpp_0.11.2         RcppEigen_0.3.2.1.2
[25] relimp_1.0-3        reshape2_1.4        scales_0.2.4        splines_3.1.0       stringr_0.6.2       tcltk_3.1.0
[31] tools_3.1.0

更新

  • 使用 3.1.1(相同的软件包版本)尝试过,结果相同.
  • 使用 3.0.2 和一些旧版本的插入符号 doParallel 进行了尝试,效果很好(请参阅会话信息)

会话信息 2:

R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] e1071_1.6-1        class_7.3-9        randomForest_4.6-7 doParallel_1.0.6   iterators_1.0.6
 [6] caret_5.17-7       reshape2_1.2.2     plyr_1.8           lattice_0.20-24    foreach_1.4.1
[11] cluster_1.14.4

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.2  grid_3.0.2      stringr_0.6.2   tools_3.0.2

推荐答案

这显然是在 5.17-7 版本之后的某个时间引入的插入符号 6.0-30 中的一个错误.这也是另一个更可能影响 Windows 用户的问题,因为 doParallelmclapply 模式"有效,而clusterApplyLB 模式"失败.

This is clearly a bug in caret 6.0-30 that was introduced sometime after version 5.17-7. It's also another problem that is more likely to hit Windows users, since the doParallel "mclapply mode" works, while the "clusterApplyLB mode" fails.

我已经运行了一些测试,看来问题是由于集群工作器没有正确初始化来执行嵌套的并行计算,因此您可以通过在集群工作器之前加载 foreach 包来解决该错误叫火车".为此,您需要显式创建集群对象,而不是让registerDoParallel"函数为您创建它(它在 Windows 上这样做).例如:

I've run some tests, and it appears that the problem is due to the cluster workers not being properly initialized to perform nested parallel computations, so you can work-around the bug by loading the foreach package in the cluster workers before calling "train". To do this, you need to explicitly create the cluster object, rather than letting the "registerDoParallel" function create it for you (which it does on Windows). For example:

cl <- makePSOCKcluster(2)
clusterEvalQ(cl, library(foreach))
registerDoParallel(cl)

我会联系 caret 的作者讨论问题的解决方案.

I'll contact the author of caret to discuss a solution to the problem.

这篇关于插入符号上的 parRF 不适用于多个核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:15