问题描述
在R中做作业时,我想到了一个奇怪的结果,谁能向我解释发生了什么事?
I came up with a strange result when doing my homework in R, can anyone explain to me what's going on?
指令要求我设置种子1以保持一致性.
The instruction told me to set seed 1 to keep consistency.
首先,我将seed(1)设置两次
At first, I set seed(1) twice
set.seed(1)
x <- rnorm(100, mean = 0, sd = 1)
set.seed(1)
epsilon <- rnorm(100, mean = 0, sd = 0.25)
y <- 0.5 * x + epsilon -1
plot(x,y,main = "Scatter plot between X and Y", xlab = "X", ylab = "Y")
我得到这样的散点图:有两个种子的情节
I get scatter plot like this:The plot with two set seed
我只使用一组种子后的代码是:
After I only use one set seed the code is:
set.seed(1)
x <- rnorm(100, mean = 0, sd = 1)
epsilon <- rnorm(100, mean = 0, sd = 0.25)
y <- 0.5 * x + epsilon -1
plot(x,y,main = "Scatter plot between X and Y", xlab = "X", ylab = "Y")
情节变得合理:有一个种子的情节
有人可以通过添加额外的"set.seed(1)"来解释为什么两个结果不同吗?
Can anyone explain to me why two results are different by adding an extra "set.seed(1)"?
推荐答案
Set.seed()确定之后将生成的随机数.通常,它用于创建可重现的示例,因此,如果我们都运行相同的代码,我们将获得相同的结果.为了说明:
Set.seed() determines the random numbers that will be generated afterwards. In general it is used to create reproducible examples, so that if we both run the same code, we get the same results. To illustrate:
set.seed(1234)
runif(3)
[1] 0.1137034 0.6222994 0.6092747
set.seed(1234)
runif(3)
[1] 0.1137034 0.6222994 0.6092747
set.seed(12345)
runif(3)
[1] 0.7209039 0.8757732 0.7609823
因此,如您所见,当使用相同的编号两次设置.seed(x)时,从那时起您将生成相同的随机数. (对于具有相同分布的变量.对于其他变量,请参见下面的详细说明).因此,您在第一个绘图中获得一条直线的原因是因为
So as you can see, when you set.seed(x) twice with the same number, you are generating the same random numbers from that point on. (For variables with the same distribution. For others, see the elaboration below).So the reason you are getting a straight line in the first plot, is because
y <- 0.5 * x + epsilon -1
实际上变成了
y <- 0.5 * x + x -1
因为您两次使用相同的随机数序列.减少到
because you are using the same sequence of random numbers two times. That reduces to
y <- 1.5 * x -1
那是一个简单的线性方程.
And that is a simple linear equation.
因此,通常,在脚本开始处只执行一次set.seed(x)
.
So in general, you should only perform set.seed(x)
once, at the beginning of your script.
评论的详细说明:但是我用不同的sd生成了Epsilon,尽管剧情似乎与解释相符,但为什么那仍然是相同的x?"
这实际上是一个非常有趣的问题.分布为~N(mean,sd)
的随机数通常如下生成:
That's actually a really interesting question. Random numbers with distribution ~N(mean,sd)
are usually generated as follows:
- 生成随机统一编号.
- 对这些数字进行转换,通常 Box-Muller转换./a>,我们称这些数字为X.
- 通过应用转换
sd * X + mean
再次对这些数字进行转换
- Random Uniform numbers are generated.
- A transformation is applied to these numbers, usually the Box-Muller transformation., let's call these numbers X.
- These numbers are transformed once more by applying the transformation
sd * X + mean
当您使用相同的种子但均值和标准差不同运行两次时,前两个步骤将产生完全相同的结果,因为生成的随机数相同,并且均值和标准差尚未使用.仅在第三步中,平均值和标准差才起作用.我们可以轻松地验证这一点:
When you run this twice with the same seed but a different mean and sd, the first two steps will create exactly the same results, since the random numbers generated are the same, and the mean and sd are not used yet. Only in the third step do the mean and sd come into play. We can easily verify this:
set.seed(1)
rnorm(4, mean = 0, sd = 1)
[1] -0.6264538 0.1836433 -0.8356286 1.5952808
set.seed(1)
rnorm(4, mean = 0, sd = 0.25)
[1] -0.15661345 0.04591083 -0.20890715 0.39882020
实际上,第二次生成的随机数正好是第一次生成的随机数的0.25倍.
Indeed, the random numbers generated the second time are exactly 0.25 times the numbers generated the first time.
因此,在我上面的解释中,epsilon实际上为0.25 * x,结果函数为y <- 0.75 * x - 1
,它仍然只是线性函数.
So in my explanation above, epsilon is actually 0.25*x, and your resulting function is y <- 0.75 * x - 1
, which is still just a linear function.
这篇关于多次使用set.seed的怪异行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!