本文介绍了从具有样本的多项分布中抽取一个巨大的样本 (1e09)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从多项分布中取样.我会通过使用样本并指定一些概率来做到这一点.例如:我有 3 个类别,我想采样 10 次.

I would like to sample from a multinomial distribution. I would do this by using sample and specifying some probabilites.E.g: I have 3 categories, and I want to sample 10 times.

> my_prob = c(0.2, 0.3, 0.5)
> x = sample(c(0:2), 100, replace = T, prob = my_prob)
> head(x)
[1] 2 0 2 1 1 2

我的设置现在仅在以下方面有所不同:我想对大量(例如 1e09)数字进行采样.实际上我只对每个类别的频率感兴趣.所以在上面提到的例子中,这意味着:

My setting is now only different in the following aspect: I want to sample a lot (e.g. 1e09) numbers. And actually I am only interested in the frequency of each category.So in the above mentioned example this would mean:

> table(x)
x
 0  1  2
27 29 44

有人知道如何尽可能高效地计算吗?

Does anybody have an idea how to compute this as efficient as possible?

谢谢,斯蒂菲

推荐答案

您需要 rmultinom.

my_prob <- c(0.2,0.3,0.5)
number_of_experiments <- 10
number_of_samples <- 100
experiments <- rmultinom(n=number_of_experiments, size=number_of_samples, prob=my_prob)
experiments

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
     [1,]   14   18   15   19   14   17   23   18   24    15
     [2,]   33   34   36   30   40   30   27   38   24    30
     [3,]   53   48   49   51   46   53   50   44   52    55

这篇关于从具有样本的多项分布中抽取一个巨大的样本 (1e09)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-19 18:43