问题描述
这是关于学习多个模型的的追踪。
用例是我对每个主题有多个意见,而
我想为每个主题训练一个模型。请参阅Hadley的,了解如何做这个。
总之,这可以使用 dply
和 purr
如此:
库(purrr)
库(dplyr)
库fitdistrplus)
dt%>%
split(dt $ subject_id)%>%
map(〜fitdist(。观察,规范))
想知道如果 dplyr code>, purrr
对于这样的任务有一个易于使用的并行化机制(如并行映射
)。
如果这些库不提供简单的并行化,可以使用经典的R并行化库( parallel
foreach
等)?
解决方案只需为completene添加答案即可在这里,您需要从Hadley的回购中安装以运行此更多信息, a href =https://github.com/hadley/multidplyr/blob/master/vignettes/multidplyr.md =nofollow noreferrer>小插曲:
$ b $
$ 库(dplyr)
库(multidplyr)
库(purrr)
集群& b $ b set_default_cluster(cluster)
cluster_library(cluster,fitdistrplus)
#dt是一个数据帧,subject_id标识每个对象的观察值
by_subject< - partition(dt ,subject_id)
适合< - by_subject%>%
do(fit = fitdist(。$ observation,norm)))
gather_fits< ; - 收集(适合)$ fit
gather_summaries< - collected_fits%>%map(summary)
This is a follow up to a previous question about learning multiple models.
The use case is that I have multiple observations for each subject, andI want to train a model for each of them. See Hadley's excellent presentation on how to do this.
In short, this is possible to do using
dply
and purr
like so:
library(purrr)
library(dplyr)
library(fitdistrplus)
dt %>%
split(dt$subject_id) %>%
map( ~ fitdist(.$observation, "norm"))
So since the model building is an embarrassingly parallel task, I waswondering if
dplyr
, purrr
have an easy to use parallelization mechanism for such tasks (like a parallel map
).
If these libraries don't provide easy parallelization could it be done using the classic R parallelization libraries (
parallel
, foreach
etc)?
解决方案
Just adding an answer for completeness here, you will need to install multidplyr from Hadley's repo to run this, more info in the vignette:
library(dplyr)
library(multidplyr)
library(purrr)
cluster <- create_cluster(4)
set_default_cluster(cluster)
cluster_library(cluster, "fitdistrplus")
# dt is a dataframe, subject_id identifies observations from each subject
by_subject <- partition(dt, subject_id)
fits <- by_subject %>%
do(fit = fitdist(.$observation, "norm")))
collected_fits <- collect(fits)$fit
collected_summaries <- collected_fits %>% map(summary)
这篇关于R - 并行化多模式学习(用dplyr和purrr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!