问题描述
我试图找出如何并行部署 dplyr :: do
函数。在阅读了一些文档后,似乎dplyr :: init_cluster()应该足以告诉do()并行运行。不幸的是,当我测试这个时,似乎不是这样的:
I'm trying to figure out how to deploy the dplyr::do
function in parallel. After reading some the docs it seems that the dplyr::init_cluster() should be sufficient for telling the do() to run in parallel. Unfortunately this doesn't seem to be the case when I test this:
library(dplyr)
test <- data_frame(a=1:3, b=letters[c(1:2, 1)])
init_cluster()
system.time({
test %>%
group_by(b) %>%
do({
Sys.sleep(3)
data_frame(c = rep(max(.$a), times = max(.$a)))
})
})
stop_cluster()
提供此输出:
Initialising 2 core cluster.
|==========================================================================|100% ~0 s remaining
user system elapsed
0.03 0.00 6.03
如果在两个内核之间分配了呼叫,我希望它为3。我也可以通过在主R终端上打印的do()添加打印来确认。我在这里缺少什么?
I would expect it to be 3 if the do call was split between the two cores. I can also confirm this by adding a print to the do() that prints in the main R-terminal. What am I missing here?
我正在使用dplyr 0.4.2与R 3.2.1
I'm using dplyr 0.4.2 with R 3.2.1
推荐答案
根据此功能似乎不受支持。
According to https://twitter.com/cboettig/status/588068454239830017 this feature does not seem to be currently supported.
这篇关于如何使用dplyr平行列出do()调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!