在Monte Carlo模拟中避免基本的rand（）偏差？

本文介绍了在Monte Carlo模拟中避免基本的rand（）偏差？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用Objective C重写C中的蒙特卡洛模拟，以用于VBA / Excel中的dll。计算中的引擎是创建0到10001之间的随机数，并将其与5000-7000邻域中的变量进行比较。每次迭代使用4-800次，而我使用100000次迭代。因此，每次运行大约有50.000.000代的随机数。

I am rewriting a monte carlo simulation in C from Objective C to use in a dll from VBA/Excel. The "engine" in the calculation is the creation of a random number between 0 and 10001 that is compared to a variable in the 5000-7000 neighbourhood. This is used 4-800 times per iteration and I use 100000 iterations. So that is about 50.000.000 generations of random numbers per run.

在Objective C中，测试没有偏见，但是C代码有很多问题。目标C是C的超集，因此95％的代码是复制粘贴的，很难搞清楚。我昨天和今天整天都经历了很多次，但没有发现任何问题。

While in Objective C the tests showed no bias, I have huge problems with the C code. Objective C is a superset of C, so 95% of the code was copy paste and hard to screw up. I have gone through the rest many times all day yesterday and today and I have found no problems.

我留下了arc4random_uniform（）和rand（）之间的区别，使用srand（），尤其是因为偏向0到10000的较低数字。我进行的测试与偏向0.5到2％的低于5000左右的数字是一致的。任何其他解释是如果我代码避免了重复，但我猜想是不会的。

I am left with the difference between arc4random_uniform() and rand() with the use of srand(), especially because a bias towards the lower numbers of 0 to 10000. The test I have conducted is consistent with such a bias of .5 to 2 % towards numbers below circa 5000. The any other explanation is if my code avoided repeats which I guess it doesn´t do.

代码真的很简单（ spiller1evne和 spiller2evne是介于5500和6500之间的数字）：

the code is really simple ("spiller1evne" and "spiller2evne" being a number between 5500 and 6500):

srand((unsigned)time(NULL));
for (j=0;j<antala;++j){
[..]
        for (i=1;i<450;i++){
            chance = (rand() % 10001);

[..]

             if (grey==1) {


                 if (chance < spiller1evnea) vinder = 1;
                 else vinder = 2;
            }
            else{
                if (chance < spiller2evnea) vinder = 2;
                else vinder = 1;
            }

现在我不需要真正的随机性，伪随机性还不错。我只需要大约均匀地分布它（就像5555出现的可能性是5556的两倍就没关系。5500-5599是否比5600-5699的可能性高5％并不重要，如果对0-4000的明显偏差是0.5-2％，而不是6000-9999。

Now I don´t need true randomness, pseudorandomness is quite fine. I only need it to be approximatly even distributed on a cummulative basis (like it doesn´t matter much if 5555 is twice as likely to come out as 5556. It does matter if 5500-5599 is 5% more likely as 5600-5699 and if there is a clear 0.5-2% bias towards 0-4000 than 6000-9999.

首先，rand（）是我的问题听起来是否合理，是否存在可以满足我的低需求的简单实现吗？

First, does it sound plausible that rand() is my problem and Is there an easy implementation that meets my low needs?

编辑：如果我的怀疑合理，我可以在此使用任何东西：

if my suspicion is plausible, could I use any on this:

我能复制粘贴此内容作为替换吗（我用C语言编写，并且使用Visual Studio，真的是新手）？

Would I be able to just copy paste this in as a replacement (I am writing in C and using Visual Studio, really novice)?:

#include <stdlib.h>

#define RS_SCALE (1.0 / (1.0 + RAND_MAX))

double drand (void) {
    double d;
    do {
       d = (((rand () * RS_SCALE) + rand ()) * RS_SCALE + rand ()) * RS_SCALE;
    } while (d >= 1); /* Round off */
    return d;
}

#define irand(x) ((unsigned int) ((x) * drand ()))

编辑2：显然，上面的代码在没有相同偏见的情况下有效，因此，对于那些需要与中间路线相同的人，我建议这样做我在上面描述了。它确实会受到惩罚，因为它会调用rand（）3次。所以我仍在寻找一种更快的解决方案。

Well clearly the above code works without the same bias so I would this be a recommendation for anyone who have the same "middle-of-the-road"-need as I described above. It does come with a penalty as it calls rand() three times. So I am still looking for a faster solution.

推荐答案

rand（）函数会在范围[0， RAND_MAX ]中生成 int 。如果像原始代码那样通过模数运算符（％）将其转换为其他范围，则除非目标范围的大小恰好等于平均除 RAND_MAX + 1 。

The rand() function generates an int in the range [0, RAND_MAX]. If you convert this to a different range via the modulus operator (%), as your original code does, then that introduces non-uniformity unless the size of your target range happens to evenly divide RAND_MAX + 1. That sounds like exactly what you see.

您有多种选择，但是如果您想坚持使用 rand（），那么我建议您采用原来的方法：

You have multiple options, but if you want to stick with something based on rand() then I suggest this variation on your original approach:

/*
 * Returns a pseudo-random int selected from the uniform distribution
 * over the half-open interval [0, limit), provided that limit does not
 * exceed RAND_MAX.
 */
int range_rand(int limit) {
    int rand_bound = (RAND_MAX / limit) * limit;
    int r;
    while ((r = rand()) >= rand_bound) { /* empty */ }
    return r % limit;
}

尽管原则上 rand（）的数量对该函数的每次调用都会无限制地进行调用，实际上，对于较小的 limit 值，平均调用次数仅略大于1，并且对于每个限额值，平均值小于2。它从[0， RAND_MAX ]的子集中选择初始随机数，消除了前面所述的不均匀性，该子集的大小除以限制。

Although in principle the number of rand() calls each call to that function will generate is unbounded, in practice the average number of calls is only slightly greater than 1 for relatively small limit values, and the average is less than 2 for every limit value. It removes the non-uniformity I described earlier by choosing the initial random number from a subset of [0, RAND_MAX] whose size is evenly divided by the limit.

这篇关于在Monte Carlo模拟中避免基本的rand（）偏差？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..