为什么人们在使用随机数生成器时会说存在模偏差?

本文介绍了为什么人们在使用随机数生成器时会说存在模偏差?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我看到这个问题问了很多，但从未见过真正具体的答案.所以我将在这里发布一个，希望能帮助人们理解为什么在使用随机数生成器时会出现模偏差"，比如 C++ 中的 rand().

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like rand() in C++.

推荐答案

So rand() 是一个伪随机数生成器，它在 0 和 RAND_MAX，这是一个在 cstdlib 中定义的常量(参见这个关于rand()的一般概述的文章.

So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).

现在如果你想生成一个介于 0 和 2 之间的随机数会发生什么?为了便于说明，假设 RAND_MAX 为 10，我决定通过调用 rand()%3 生成 0 到 2 之间的随机数.但是，rand()%3 不会以相等的概率生成 0 到 2 之间的数字！

Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability!

当 rand() 返回 0、3、6 或 9 时， rand()%3 == 0.因此，P(0) = 4/11

When rand() returns 0, 3, 6, or 9, rand()%3 == 0. Therefore, P(0) = 4/11

当 rand() 返回 1、4、7 或 10 时， rand()%3 == 1.因此，P(1) = 4/11

When rand() returns 1, 4, 7, or 10, rand()%3 == 1. Therefore, P(1) = 4/11

当 rand() 返回 2、5 或 8 时， rand()%3 == 2.因此，P(2) = 3/11

When rand() returns 2, 5, or 8, rand()%3 == 2. Therefore, P(2) = 3/11

这不会以相等的概率生成 0 到 2 之间的数字.当然，对于小范围，这可能不是最大的问题，但对于更大的范围，这可能会扭曲分布，偏向较小的数字.

This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.

那么 rand()%n 什么时候以相同的概率返回从 0 到 n-1 的数字范围?当RAND_MAX%n == n - 1.在这种情况下，连同我们之前的假设 rand() 确实以相等的概率返回 0 和 RAND_MAX 之间的数字，n 的模类也将均匀分布.

So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.

那么我们如何解决这个问题呢?一种粗略的方法是不断生成随机数，直到获得所需范围内的数字:

So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:

int x;
do {
    x = rand();
} while (x >= n);

但是对于 n 的低值来说这是低效的，因为您只有 n/RAND_MAX 机会在您的范围内获得一个值，因此您需要平均对 rand() 执行 RAND_MAX/n 次调用.

but that's inefficient for low values of n, since you only have a n/RAND_MAX chance of getting a value in your range, and so you'll need to perform RAND_MAX/n calls to rand() on average.

更有效的公式方法是取一些长度可被 n 整除的大范围，例如 RAND_MAX - RAND_MAX % n，继续生成随机数，直到得到位于范围内的一个，然后取模数:

A more efficient formula approach would be to take some large range with a length divisible by n, like RAND_MAX - RAND_MAX % n, keep generating random numbers until you get one that lies in the range, and then take the modulus:

int x;

do {
    x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));

x %= n;

对于较小的 n 值，这很少需要多次调用 rand().

For small values of n, this will rarely require more than one call to rand().

作品引用和进一步阅读:

Works cited and further reading:

CPlusPlus 参考

永远困惑

这篇关于为什么人们在使用随机数生成器时会说存在模偏差?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！