问题描述
我被要求将 x 模拟为一个独立的同分布 (iid) 正态变量,均值=0,std=1.5,样本长度为 500"
I am asked to "simulate x as an independent identically distributed (iid) normal variable with mean=0, std=1.5 with sample length 500"
我通过以下两种方式进行采样:
I am doing the sampling in following two ways:
set.seed(8402)
X <- rnorm(500, 0, 1.5)
head(X)
我得到了
-1.8297969 -0.1862884 1.4219400 -1.0841421 -1.5276701 1.6159368
但是,如果我这样做
X <- replicate(500, rnorm(1,0,1.5))
head(X)
我得到了
-0.04032755 0.92002552 -2.28001943 -1.36840869 1.49820718 0.06205003
我的问题是生成 iid 普通变量的正确方法是什么?这两种方式有什么区别?
My question is what is the right way to generate iid normal variable? What is the difference between those two ways?
非常感谢!
推荐答案
R 内部
在 R 内部,来自 : double rnorm (double mean, double sd)
函数的 C 函数一次生成一个随机数.当你调用它的 R 包装函数 rnorm(n, mean, sd)
时,它会调用 C 级函数 n
次.
Internally in R, the C function from <Rmath.h>: double rnorm (double mean, double sd)
function generates one random number at a time. When you call its R wrapper function rnorm(n, mean, sd)
, it calls the C level function n
times.
这与您仅使用 n = 1
调用一次 R 级函数相同,但是使用 replicate
将其复制 n
次.
This is as same as you call R level function only once with n = 1
, but replicate it n
times using replicate
.
第一种方法要快得多(当 n
非常大时,可能会看到差异),因为一切都是在 C 级别完成的.然而,replicate
是 sapply
的包装器,因此它并不是真正的矢量化函数(阅读 *apply"系列真的没有向量化吗?).
The first method is much faster (possibly the difference will be seen when n
is really large), as everything is done at C level. replicate
however, is a wrapper of sapply
, so it is not really a vectorized function (read on Is the "*apply" family really not vectorized?).
此外,如果您为两者设置相同的随机种子,您将获得相同的随机数集.
In addition, if you set the same random seed for both, you are going to get the same set of random numbers.
更具说明性的实验
在我下面的评论中,我说随机种子只在进入时设置一次.为了帮助人们理解这一点,我提供了这个例子.没有必要使用大的n
.n = 4
就足够了.
In my comment below, I say that random seed is only set once on entry. To help people understand this, I provide this example. There is no need to use large n
. n = 4
is sufficient.
首先,让我们将种子设为 0,同时生成 4 个标准正态样本:
First, let's set seed at 0, while generating 4 standard normal samples:
set.seed(0); rnorm(4, 0, 1)
## we get
[1] 1.2629543 -0.3262334 1.3297993 1.2724293
请注意,在这种情况下,所有 4 个数字都是从条目种子 0 中获得的.
Note that in this case, all 4 numbers are obtained from the entry seed 0.
现在,让我们这样做:
set.seed(0)
rnorm(2, 0, 1)
## we get
[1] 1.2629543 -0.3262334
## do not reset seed, but continue with the previous seed
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.329799 1.272429
看到了吗?
但是如果我们在中间重置种子,例如,将其设置回0
But if we reset seed in the middle, for example, set it back to 0
set.seed(0)
rnorm(2, 0, 1)
## we get
[1] 1.2629543 -0.3262334
## reset seed
set.seed(0)
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.2629543 -0.3262334
这就是我所说的进入".
This is what I mean by "entry".
这篇关于在 R 的采样中复制 n 次和直接生成 n 之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!