如何使用dplyr平行列出do（）调用

本文介绍了如何使用dplyr平行列出do（）调用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图找出如何并行部署 dplyr :: do 函数。在阅读了一些文档后，似乎dplyr :: init_cluster（）应该足以告诉do（）并行运行。不幸的是，当我测试这个时，似乎不是这样的：

I'm trying to figure out how to deploy the dplyr::do function in parallel. After reading some the docs it seems that the dplyr::init_cluster() should be sufficient for telling the do() to run in parallel. Unfortunately this doesn't seem to be the case when I test this:

library(dplyr)
test <- data_frame(a=1:3, b=letters[c(1:2, 1)])

init_cluster()
system.time({
  test %>%
    group_by(b) %>%
    do({
      Sys.sleep(3)
      data_frame(c = rep(max(.$a), times = max(.$a)))
    })
})
stop_cluster()

提供此输出：

Initialising 2 core cluster.
|==========================================================================|100% ~0 s remaining
   user  system elapsed
   0.03    0.00    6.03

如果在两个内核之间分配了呼叫，我希望它为3。我也可以通过在主R终端上打印的do（）添加打印来确认。我在这里缺少什么？

I would expect it to be 3 if the do call was split between the two cores. I can also confirm this by adding a print to the do() that prints in the main R-terminal. What am I missing here?

我正在使用dplyr 0.4.2与R 3.2.1

I'm using dplyr 0.4.2 with R 3.2.1

cboettig

如何使用dplyr平行列出do（）调用

问题描述

推荐答案