问题描述
我希望在 MC-Stan . org/"rel =" nofollow>火花,但是Google似乎没有搜索到任何相关页面.
I hope to use MC-Stan on Spark, but it seems there is no related page searched by Google.
我想知道这种方法是否可以在Spark上实现,因此,如果有人让我知道,我将不胜感激.
I wonder if this approach is even possible on Spark, therefore I would appreciate if someone let me know.
此外,我还想知道在Spark上使用MCMC的广泛使用的方法是什么.我听说Scala被广泛使用,但是我需要某种具有像MC-Stan这样的体面的MCMC库的语言.
Moreover, I also wonder what is the widely-used approach to use MCMC on Spark. I heard Scala is widely used, but I need some language that has a decent MCMC library such as MC-Stan.
推荐答案
是的,虽然可以,但是需要更多的工作. Stan(以及我所知的流行MCMC工具)并非旨在通过Spark或其他方式在分布式环境中运行.通常,分布式MCMC是一个活跃的研究领域.对于最近的评论,我建议使用可扩展贝叶斯推断模式(aFS)的第4节.您可能有多种可能的方法来拆分大型MCMC计算,但我认为,更直接的方法之一是拆分数据并在每个分区上运行具有相同模型的Stan等现成工具.每个模型都会产生一个 subpostterior ,可以将它们简化在一起形成后验. PoFSBI讨论了组合这些后验的几种方法.
Yes it's certainly possible but requires a bit more work. Stan (and popular MCMC tools that I know of) are not designed to be run in a distributed setting, via Spark or otherwise. In general, distributed MCMC is an area of active research. For a recent review, I'd recommend section 4 of Patterns of Scalable Bayesian Inference (PoFSBI). There are multiple possible ways you might want to split up a big MCMC computation but I think one of the more straightforward ways would be splitting up the data and running an off-the-shelf tool like Stan, with the same model, on each partition. Each model will produce a subposterior which can be reduce'd together to form a posterior. PoFSBI discusses several ways of combining such subposteriors.
我已经放在一起使用pyspark和pystan(python是具有Stan和Spark支持最多的通用语言).这是PoFSBI中加权平均共识算法的粗略且有限的实现,它在微小的8个学校的数据集上运行.我认为该示例实际上并没有什么用处,但是它应该提供一些有关将Stan作为Spark程序运行的必要条件的想法:分区数据,在每个分区上正常运行,组合后验.
I've put together a very rough proof of concept using pyspark and pystan (python is the common language with the most Stan and Spark support). It's a rough and limited implementation of the weighted-average consensus algorithm in PoFSBI, running on the tiny 8-schools dataset. I don't think this example would be practically very useful but it should provide some idea of what might be necessary to run Stan as a Spark program: partition data, run stan on each partition, combine the subposteriors.
这篇关于MC-Stan on Spark?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!