将观测值传递到模型的方式上的差异

将观测值传递到模型的方式上的差异

本文介绍了PyMC3-将观测值传递到模型的方式上的差异->结果不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解将数据传递到模型中的方式是否存在任何有意义的差异-汇总或作为单一试验(请注意,这仅对某些分布(例如二项式)而言是一个有意义的问题).

I'm trying to understand if there is any meaningful difference in the ways of passing data into a model - either aggregated or as single trials (note this will only be a sensical question for certain distributions e.g. Binomial).

使用具有二项分布的简单模型预测 p 是/否轨迹.

Predicting p for a yes/no trail, using a simple model with a Binomial distribution.

以下模型(如果有)的计算/结果有什么区别?

What is the difference in the computation/results of the following models (if any)?

我选择两个极端,一次通过一次(减少到伯努利),或者一次通过整个系列的总和,尽管我也对这两个极端之间的差异感兴趣,但我还是选择了这两个例子来说明我的意思.

I choose the two extremes, either passing in a single trail at once (reducing to Bernoulli) or passing in the sum of the entire series of trails, to exemplify my meaning though I am interested in the difference in between these extremes also.

# set up constants
p_true = 0.1
N = 3000
observed = scipy.stats.bernoulli.rvs(p_true, size=N)

模型1:将所有观测值合并到一个数据点

Model 1: combining all observations into a single data point

with pm.Model() as binomial_model1:
    p = pm.Uniform('p', lower=0, upper=1)
    observations = pm.Binomial('observations', N, p, observed=np.sum(observed))
    trace1 = pm.sample(40000)

模型2:分别使用每个观察值

Model 2: using each observation individually

with pm.Model() as binomial_model2:
    p = pm.Uniform('p', lower=0, upper=1)
    observations = pm.Binomial('observations', 1, p, observed=observed)
    trace2 = pm.sample(40000)

在这种情况下,迹线或后代没有任何明显的区别.我试图深入研究pymc3源代码,以尝试查看观察结果是如何处理的,但是找不到正确的部分.

There is isn't any noticeable difference in the trace or posteriors in this case. I attempted to dig into the pymc3 source code to try to see how the observations were being processed but couldn't find the right part.

可能的预期答案:

  • pymc3始终汇总二项式的内幕观察,因此它们没有区别
  • 最终的后表面(在采样过程中进行了探索)在每种情况下都是相同的->两种模型之间没有有意义/统计上的差异
  • 由于这个原因,结果统计中存在差异...

推荐答案

这是一个有趣的示例!您的第二个建议是正确的:您实际上可以分析得出后验,并且将根据

This is an interesting example! Your second suggestion is correct: you can actually work out the posterior analytically, and it will be distributed according to

Beta(sum(observed), N - sum(observed))

在任何情况下

.

in either case.

如果使用例如pm.sample_ppc,则会显示建模方法的差异,因为第一个将根据Binomial(N, p)进行分配,第二个将是NBinomial(1, p)绘制.

The difference in modelling approach would show up if you used, for example, pm.sample_ppc, in that the first would be distributed according to Binomial(N, p) and the second would be N draws of Binomial(1, p).

这篇关于PyMC3-将观测值传递到模型的方式上的差异->结果不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 22:23