我对处理data.frame中的数据有疑问。
本质上,我有一个很大的数据集-以下是简化版本:
structure(list(nm_mean = c(194213914.326, 194213914.326, 194213914.326,
194213914.326, 194213914.326, 217947112.739), nm_se = c(9984735.05918367,
9984735.05918367, 9984735.05918367, 9984735.05918367, 9984735.05918367,
11010386.0760204), alpha = c(193.197697846336, 214.592588477741,
240.246557258741, 258.116959355425, 282.560024775668, 306.610038660465
), beta = c(61526.2664158025, 57950.9563448233, 56085.1512614369,
52919.4794239927, 51483.4591654126, 50405.8186695088)), .Names = c("nm_mean",
"nm_se", "alpha", "beta"), row.names = c(NA, 6L), class = "data.frame")
我想使用rbeta使用beta分布以及alpha和beta作为参数来生成概率
同样,我想使用rmrm使用正态分布生成随机数,其中nm_mean和nm_se为均值和sd。
然后,我想乘以rmrm值生成的rbeta值,并将第50、25和75分位数提取回数据帧
以第1行为例
x <- rbeta(1000,193.1977,61526.27)
y <- rnorm(1000,194213914,9984735)
z <- x*y
dat$ce <- quantile(z,0.5)
dat$ll <- quantile(z,0.25)
dat$ul <- quantile(z,0.975)
本质上,我将rbeta和rnorm乘积的ce,ll和ul附加回数据库。
最佳答案
这是基于我与@thelatemail的对话的矢量化解决方案:
n <- 1000
grp <- nrow(dat)
z <- with(dat, rnorm(grp*n, nm_mean, nm_se) * rbeta(grp*n, alpha, beta) )
m <- 1
for(i in 1:nrow(dat)){
dat$ce[i] <- quantile(z[m:(i*1000)],0.5)
dat$ll[i] <- quantile(z[m:(i*1000)],0.25)
dat$ul[i] <- quantile(z[m:(i*1000)],0.975)
m <- m + 1000
}
较少向量化的解决方案是:
for(i in 1:nrow(dat)){
x <- rbeta(1000, shape1 = dat$alpha[i], shape2 = dat$beta[i])
y <- rnorm(n=1000,dat$nm_mean[i],dat$nm_se[i])
z <- x*y
dat$ce[i] <- quantile(z,0.5)
dat$ll[i] <- quantile(z,0.25)
dat$ul[i] <- quantile(z,0.975)
}
dat
nm_mean nm_se alpha beta ce ll ul
1 194213914 9984735 193.1977 61526.27 607563.9 573229.9 713057.2
2 194213914 9984735 214.5926 57950.96 712268.5 674826.3 836950.8
3 194213914 9984735 240.2466 56085.15 823322.9 777482.8 981156.7
4 194213914 9984735 258.1170 52919.48 937331.2 884945.0 1095876.3
5 194213914 9984735 282.5600 51483.46 1059980.4 1003596.4 1225615.6
6 217947113 11010386 306.6100 50405.82 1316733.1 1250190.1 1515185.0