本文介绍了从具有样本的多项分布中抽取一个巨大的样本 (1e09)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想从多项分布中取样.我会通过使用样本并指定一些概率来做到这一点.例如:我有 3 个类别,我想采样 10 次.
I would like to sample from a multinomial distribution. I would do this by using sample and specifying some probabilites.E.g: I have 3 categories, and I want to sample 10 times.
> my_prob = c(0.2, 0.3, 0.5)
> x = sample(c(0:2), 100, replace = T, prob = my_prob)
> head(x)
[1] 2 0 2 1 1 2
我的设置现在仅在以下方面有所不同:我想对大量(例如 1e09)数字进行采样.实际上我只对每个类别的频率感兴趣.所以在上面提到的例子中,这意味着:
My setting is now only different in the following aspect: I want to sample a lot (e.g. 1e09) numbers. And actually I am only interested in the frequency of each category.So in the above mentioned example this would mean:
> table(x)
x
0 1 2
27 29 44
有人知道如何尽可能高效地计算吗?
Does anybody have an idea how to compute this as efficient as possible?
谢谢,斯蒂菲
推荐答案
您需要 rmultinom
.
my_prob <- c(0.2,0.3,0.5)
number_of_experiments <- 10
number_of_samples <- 100
experiments <- rmultinom(n=number_of_experiments, size=number_of_samples, prob=my_prob)
experiments
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 14 18 15 19 14 17 23 18 24 15
[2,] 33 34 36 30 40 30 27 38 24 30
[3,] 53 48 49 51 46 53 50 44 52 55
这篇关于从具有样本的多项分布中抽取一个巨大的样本 (1e09)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!