本文介绍了从stan分布线性模型中提取sigma值并将其添加到数据框中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于示例数据sampleDT和下面的brms模型brm.fitbrm.fit.distr,我想:

Given the sample data sampleDT and the brms models brm.fit and brm.fit.distr below, I would like to:

我可以使用brm.fit来做到这一点,但是当我使用brm.fit.distr时,我的方法会失败.

I can do this using brm.fit, but my approach fails when I use brm.fit.distr.

样本数据

sampleDT<-structure(list(id = 1:10, N = c(10L, 10L, 10L, 10L, 10L, 10L, 
    10L, 10L, 10L, 10L), A = c(62L, 96L, 17L, 41L, 212L, 143L, 143L, 
    143L, 73L, 73L), B = c(3L, 1L, 0L, 2L, 170L, 21L, 0L, 33L, 62L, 
    17L), C = c(0.05, 0.01, 0, 0.05, 0.8, 0.15, 0, 0.23, 0.85, 0.23
    ), employer = c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L), F = c(0L, 
    0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), G = c(1.94, 1.19, 1.16, 
    1.16, 1.13, 1.13, 1.13, 1.13, 1.12, 1.12), H = c(0.14, 0.24, 
    0.28, 0.28, 0.21, 0.12, 0.17, 0.07, 0.14, 0.12), dollar.wage_1 = c(1.94, 
    1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_2 = c(1.93, 
    1.18, 3.15, 3.15, 1.12, 1.12, 2.12, 1.12, 1.11, 1.11), dollar.wage_3 = c(1.95, 
    1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.13, 1.13), dollar.wage_4 = c(1.94, 
    1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_5 = c(1.94, 
    1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_6 = c(1.94, 
    1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_7 = c(1.94, 
    1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_8 = c(1.94, 
    1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_9 = c(1.94, 
    1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_10 = c(1.94, 
    1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12)), row.names = c(NA, 
    -10L), class = "data.frame")

我的模型

library(brms)

brm.fit <-brm(dollar.wage_1 ~ A + B + C + employer + F + G + H,
            data=sampleDT, iter = 4000, family = gaussian())

brm.fit.distr <-brm(bf(dollar.wage_1 ~ A + B + C + employer + F + G + H, 
                      sigma ~ A + B + C + employer + F + G + H),
                      data=sampleDT, iter = 4000, family = gaussian())

我对brm.fit的尝试和对brm.fit.distr

My approach for brm.fit and attempt for brm.fit.distr

sampleDT$sd_brm_fit<-summary(brm.fit)$spec_pars[1] //this works
sampleDT$sd_brm_fit_distr<-summary(brm.fit.distr)$spec_pars[1] //this does not work

在此先感谢您的帮助.

推荐答案

正如贝叶斯模型所期望的那样,有多种方法可以查看不确定性的程度.因此,首先,我们不再有单个参数sigma;而是在其中有几个标准差参数

As expected in Bayesian models, there are different ways to look at the extent of uncertainty. So, first, we no longer have a single parameter sigma; instead there are several standard deviation parameters in

summary(brm.fit.distr)$fixed

,尤其是

exp(summary(brm.fit.distr)$fixed[, 1])[grep("sigma", rownames(summary(brm.fit.distr)$fixed))]
# sigma_Intercept         sigma_A         sigma_B         sigma_C  sigma_employer 
#      1.17043390      0.99913160      1.01382623      0.28655150      1.06713923 
#         sigma_F         sigma_G         sigma_H 
#      0.50428952      0.87669186      0.01203015 

我在其中使用exp进行数字正数.

where I use exp to make the number positive.

现在,作为不确定性的总度量,我们可以看看

Now as an aggregate measure of uncertainty we may look at

predict(brm.fit.distr)[, 2]

请注意,这些数字是随机的(!)在某些情况下,这些数字非常大

Note that those are random (!) In some cases those number are pretty large

predict(brm.fit.distr)[, 2]
#  [1]  34.620936   4.456770   2.837869   1.727396 107.116980   2.238100   2.350523   3.037880
#  [9]   6.266055   2.517457

但是我们有,例如,

sampleDT[5, 1:5]
#   id  N   A   B   C
# 5  5 10 212 170 0.8

,因此AB的值非常大.同样,您可以查看

so that the values for A and B are very large. Similarly you could look at

predict(brm.fit)[, 2]
# [1] 5.203937 4.846928 4.960600 4.827138 4.937323 4.625976 5.122794 4.767257 4.862458 4.219394

这也是随机的.

这篇关于从stan分布线性模型中提取sigma值并将其添加到数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 09:05