问题描述
我使用 R 包 foreach()
和 %dopar%
来并行执行长时间(~天)计算.我希望能够在其中一个产生错误的情况下停止整个计算集.但是,我还没有找到实现此目的的方法,并且从文档和各种论坛中我没有发现任何迹象表明这是可能的.特别是,break()
不起作用,stop()
只停止当前计算,而不是整个 foreach
循环.
I am using the R package foreach()
with %dopar%
to do long (~days) calculations in parallel. I would like the ability to stop the entire set of calculations in the event that one of them produces an error. However, I have not found a way to achieve this, and from the documentation and various forums I have found no indication that this is possible. In particular, break()
does not work and stop()
only stops the current calculation, not the whole foreach
loop.
请注意,我不能使用简单的 for 循环,因为最终我想使用 doRNG 包对其进行并行化.
Note that I cannot use a simple for loop, because ultimately I want to parallelize this using the doRNG package.
这是我正在尝试的简化、可重现的版本(此处与 %do%
串行显示,但在使用 doRNG
和 时我遇到了同样的问题代码>%dopar%
).请注意,实际上我想并行运行此循环的所有元素(此处为 10 个).
Here is a simplified, reproducible version of what I am attempting (shown here in serial with %do%
, but I have the same problem when using doRNG
and %dopar%
). Note that in reality I want to run all of the elements of this loop (here 10) in parallel.
library(foreach)
myfunc <- function() {
x <- foreach(k = 1:10, .combine="cbind", .errorhandling="stop") %do% {
cat("Element ", k, "
")
Sys.sleep(0.5) # just to show that stop does not cause exit from foreach
if(is.element(k, 2:6)) {
cat("Should stop
")
stop("Has stopped")
}
k
}
return(x)
}
x <- myfunc()
# stop() halts the processing of k=2:6, but it does not stop the foreach loop itself.
# x is not returned. The execution produces the error message
# Error in { : task 2 failed - "Has stopped"
我想要实现的是整个 foreach 循环可以在某些条件下立即退出(这里,当遇到 stop()
时).
What I would like to achieve is that the entire foreach loop can be exited immediately upon some condition (here, when the stop()
is encountered).
我发现无法通过 foreach
实现这一点.看来我需要一种方法来向所有其他进程发送消息以让它们也停止.
I have found no way to achieve this with foreach
. It seems that I would need a way to send a message to all the other processes to make them stop too.
如果 foreach
无法实现,有人知道替代方案吗?我也尝试使用 parallel::mclapply
来实现这一点,但这也不起作用.
If not possible with foreach
, does anyone know of alternatives? I have also tried to achieve this with parallel::mclapply
, but that does not work either.
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] foreach_1.4.0
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.0 iterators_1.0.6
推荐答案
听起来您想要一个不耐烦版本的停止"错误处理.您可以通过编写自定义组合函数并安排 foreach
在每个结果返回后立即调用它来实现这一点.为此,您需要:
It sounds like you want an impatient version of the "stop" error handling. You could implement that by writing a custom combine function, and arranging for foreach
to call it as soon as each result is returned. To do that you need to:
- 使用支持动态调用
combine
的后端,例如doMPI
或doRedis
- 不要启用
.multicombine
- 将
.inorder
设置为FALSE
- 将
.init
设置为某些内容(例如NULL
)
- Use a backend that supports calling
combine
on-the-fly, likedoMPI
ordoRedis
- Don't enable
.multicombine
- Set
.inorder
toFALSE
- Set
.init
to something (likeNULL
)
这是一个例子:
library(foreach)
parfun <- function(errval, n) {
abortable <- function(errfun) {
comb <- function(x, y) {
if (inherits(y, 'error')) {
warning('This will leave your parallel backend in an inconsistent state')
errfun(y)
}
c(x, y)
}
foreach(i=seq_len(n), .errorhandling='pass', .export='errval',
.combine='comb', .inorder=FALSE, .init=NULL) %dopar% {
if (i == errval)
stop('testing abort')
Sys.sleep(10)
i
}
}
callCC(abortable)
}
请注意,我还将错误处理设置为pass",因此 foreach
将使用错误对象调用组合函数.callCC
函数用于从 foreach
循环返回,而不管 foreach
和后端中使用的错误处理.在这种情况下,callCC
将调用 abortable
函数,传递给它一个用于强制 callCC
立即返回的函数对象.通过从 combine 函数调用该函数,我们可以在检测到错误对象时从 foreach
循环中退出,并让 callCC
返回该对象.有关详细信息,请参阅 ?callCC
.
Note that I also set the error handling to "pass" so foreach
will call the combine function with an error object. The callCC
function is used to return from the foreach
loop regardless of the error handling used within foreach
and the backend. In this case, callCC
will call the abortable
function, passing it a function object that is used force callCC
to immediately return. By calling that function from the combine function we can escape from the foreach
loop when we detect an error object, and have callCC
return that object. See ?callCC
for more information.
您实际上可以在没有注册并行后端的情况下使用 parfun
并验证 foreach
循环在执行引发错误的任务时立即中断",但是由于任务是按顺序执行的,因此可能需要一段时间.例如,如果没有注册后端,这需要 20 秒才能执行:
You can actually use parfun
without a parallel backend registered and verify that the foreach
loop "breaks" as soon as it executes a task that throws an error, but that could take awhile since the tasks are executed sequentially. For example, this takes 20 seconds to execute if no backend is registered:
print(system.time(parfun(3, 4)))
当并行执行parfun
时,我们需要做的不仅仅是跳出foreach
循环:我们还需要停止worker,否则他们将继续计算他们分配的任务.使用 doMPI
,可以使用 mpi.abort
停止工作进程:
When executing parfun
in parallel, we need to do more than simply break out of the foreach
loop: we also need to stop the workers, otherwise they will continue to compute their assigned tasks. With doMPI
, the workers can be stopped using mpi.abort
:
library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
r <- parfun(getDoParWorkers(), getDoParWorkers())
if (inherits(r, 'error')) {
cat(sprintf('Caught error: %s
', conditionMessage(r)))
mpi.abort(cl$comm)
}
请注意,在循环中止后不能使用集群对象,因为事情没有被正确清理,这就是正常的停止"错误处理不能以这种方式工作的原因.
Note that the cluster object can't be used after the loop aborts, because things weren't properly cleaned up, which is why the normal "stop" error handling doesn't work this way.
这篇关于有没有办法跳出 foreach 循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!