R - 并行化多模式学习（用dplyr和purrr）

本文介绍了R - 并行化多模式学习（用dplyr和purrr）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是关于学习多个模型的的追踪。

用例是我对每个主题有多个意见，而
我想为每个主题训练一个模型。请参阅Hadley的，了解如何做这个。

总之，这可以使用 dply 和 purr 如此：

 库（purrr）
库（dplyr）
库fitdistrplus）
 dt％>％
 split（dt $ subject_id）％>％
 map（〜fitdist（。观察，规范））
 想知道如果 dplyr  code>， purrr 对于这样的任务有一个易于使用的并行化机制（如并行映射）。 
 
 
 如果这些库不提供简单的并行化，可以使用经典的R并行化库（ parallel   foreach 等）？
解决方案
只需为completene添加答案即可在这里，您需要从Hadley的回购中安装以运行此更多信息， a href =https://github.com/hadley/multidplyr/blob/master/vignettes/multidplyr.md =nofollow noreferrer>小插曲：
 
 $ b $ 
 
 
 
 
 $ 库（dplyr）
库（multidplyr）
库（purrr）
 
集群& b $ b set_default_cluster（cluster）
 cluster_library（cluster，fitdistrplus）
 
＃dt是一个数据帧，subject_id标识每个对象的观察值
 by_subject<  -  partition（dt ，subject_id）
 
适合<  -  by_subject％>％
 do（fit = fitdist（。$ observation，norm）））
 
 gather_fits< ;  - 收集（适合）$ fit 
 gather_summaries<  -  collected_fits％>％map（summary）

This is a follow up to a previous question about learning multiple models.

The use case is that I have multiple observations for each subject, andI want to train a model for each of them. See Hadley's excellent presentation on how to do this.

In short, this is possible to do using dply and purr like so:

library(purrr)
library(dplyr)
library(fitdistrplus)
dt %>%
    split(dt$subject_id) %>%
    map( ~ fitdist(.$observation, "norm"))

So since the model building is an embarrassingly parallel task, I waswondering if dplyr, purrr have an easy to use parallelization mechanism for such tasks (like a parallel map).

If these libraries don't provide easy parallelization could it be done using the classic R parallelization libraries (parallel, foreach etc)?

解决方案

Just adding an answer for completeness here, you will need to install multidplyr from Hadley's repo to run this, more info in the vignette:

library(dplyr)
library(multidplyr)
library(purrr)

cluster <- create_cluster(4)
set_default_cluster(cluster)
cluster_library(cluster, "fitdistrplus")

# dt is a dataframe, subject_id identifies observations from each subject
by_subject <- partition(dt, subject_id)

fits <- by_subject %>%
    do(fit = fitdist(.$observation, "norm")))

collected_fits <- collect(fits)$fit
collected_summaries <- collected_fits %>% map(summary)

这篇关于R - 并行化多模式学习（用dplyr和purrr）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！