本文介绍了在Python中生成非重复随机数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,这比听起来要棘手的问题之一,所以我转向堆栈溢出,因为我想不到一个好的答案.这就是我想要的:我需要Python以随机顺序生成一个从0到1,000,000,000的简单数字列表,以用于序列号(使用随机数,这样您就无法知道已分配了多少个数字或进行了计时攻击也很容易,例如,猜测将要发生的下一个攻击).这些数字与链接到它们的信息一起存储在数据库表(索引)中.生成它们的程序不会永远运行,因此它不能依赖内部状态.

Ok this is one of those trickier than it sounds questions so I'm turning to stack overflow because I can't think of a good answer. Here is what I want: I need Python to generate a simple a list of numbers from 0 to 1,000,000,000 in random order to be used for serial numbers (using a random number so that you can't tell how many have been assigned or do timing attacks as easily, i.e. guessing the next one that will come up). These numbers are stored in a database table (indexed) along with the information linked to them. The program generating them doesn't run forever so it can't rely on internal state.

没什么大不了的吗?只需生成一个数字列表,将它们推入数组并使用Python"random.shuffle(big_number_array)",我们就完成了.问题是我想避免必须存储数字列表(从而读取文件,从顶部弹出一个,保存文件并关闭它).我宁愿即时生成它们.问题是我能想到的解决方案有问题:

No big deal right? Just generate a list of numbers, shove them into an array and use Python "random.shuffle(big_number_array)" and we're done. Problem is I'd like to avoid having to store a list of numbers (and thus read the file, pop one off the top, save the file and close it). I'd rather generate them on the fly. Problem is that the solutions I can think of have problems:

1)生成一个随机数,然后检查它是否已被使用.如果已使用它生成一个新号码,请检查并根据需要重复,直到找到未使用的号码为止.这里的问题是,我可能会很不幸,并且在得到未使用的数字之前会生成很多已使用的数字.可能的解决方法:使用大量的数字来减少这种情况发生的可能性(但后来我得到的却是愚蠢的长数字).

1) Generate a random number and then check if it has already been used. If it has been used generate a new number, check, repeat as needed until I find an unused one. Problem here is that I may get unlucky and generate a lot of used numbers before getting one that is unused. Possible fix: use a very large pool of numbers to reduce the chances of this (but then I end up with silly long numbers).

2)生成一个随机数,然后检查它是否已被使用.如果已使用过,请从该数字中增加或减去一个,然后再次检查,继续重复直到我碰到一个未使用的数字.问题在于,这不再是随机数,因为我引入了偏见(最终,我会得到大量的数字,您将能够预测下一个具有更大成功机会的数字).

2) Generate a random number and then check if it has already been used. If it has been used add or subtract one from the number and check again, keep repeating until I hit an unused number. Problem is this is no longer a random number as I have introduced bias (eventually I will get clumps of numbers and you'd be able to predict the next number with a better chance of success).

3)生成一个随机数,然后检查它是否已被使用.如果已使用它来添加或减去另一个随机生成的随机数,然后再次检查,问题是我们回到了简单地生成随机数并按照解决方案1进行检查的问题.

3) Generate a random number and then check if it has already been used. If it has been used add or subtract another randomly generated random number and check again, problem is we're back to simply generating random numbers and checking as in solution 1.

4)吸取它并生成随机列表并保存,将一个守护程序放入一个队列中,以便有可用的数字(并避免不断打开和关闭文件,而是对它进行批处理).

4) Suck it up and generate the random list and save it, have a daemon put them into a Queue so there are numbers available (and avoid constantly opening and closing a file, batching it instead).

5)生成更大的随机数并对其进行哈希处理(即使用MD5)以得到较小的数值,我们很少会发生冲突,但是最终我又得到了比所需数字更大的数字.

5) Generate much larger random numbers and hash them (i.e. using MD5) to get a smaller numeric value, we should rarely get collisions, but I end up with larger than needed numbers again.

6)将基于时间的信息添加或添加到随机数(即unix时间戳)以减少发生碰撞的机会,同样,我得到的数字也比我所需的大.

6) Prepend or append time based information to the random number (i.e. unix timestamp) to reduce chances of a collision, again I get larger numbers than I need.

任何人都有任何聪明的主意,这些主意将减少冲突"的可能性(即生成已被采用的随机数),但也可以让我将数字保持为小"(即少于10亿(或一万亿欧元给您的欧洲人=)).

Anyone have any clever ideas that will reduce the chances of a "collision" (i.e. generating a random number that is already taken) but will also allow me to keep the number "small" (i.e. less than a billion (or a thousand million for your europeans =)).

答案以及我为什么接受它:

Answer and why I accepted it:

所以我将简单地选择1,并希望这不是问题,但是如果是,我将使用确定性的解决方案,即生成所有数字并存储它们,以确保获得新的随机数,我可以使用小"数字(即9位数字,而不是MD5/等).

So I will simply go with 1, and hope it's not an issue, however if it is I will go with the deterministic solution of generating all the numbers and storing them so that there is a guarentee of getting a new random number, and I can use "small" numbers (i.e. 9 digits instead of an MD5/etc.).

推荐答案

这是一个整洁的问题,而且我已经考虑了一段时间(使用类似于),但最后,我的想法是:

This is a neat problem, and I've been thinking about it for a while (with solutions similar to Sjoerd's), but in the end, here's what I think:

使用您的观点1),不要再担心了.

Use your point 1) and stop worrying.

假设真正的随机性,那么之前已经选择一个随机数的概率就是先前选择的数字的数量除以您的存储池大小(即最大数量).

Assuming real randomness, the probability that a random number has already been chosen before is the count of previously chosen numbers divided by the size of your pool, i.e. the maximal number.

如果您说只需要十亿个数字,即九个数字:请多给自己3个数字,这样您就有12个数字的序列号(这是三组,每组四个数字,很好看且可读).

If you say you only need a billion numbers, i.e. nine digits: Treat yourself to 3 more digits, so you have 12-digit serial numbers (that's three groups of four digits – nice and readable).

即使您之前已接近选择10亿个数字,新数字被采用的可能性仍然仅为0.1%.

Even when you're close to having chosen a billion numbers previously, the probability that your new number is already taken is still only 0,1%.

执行步骤1并再次绘制.您仍然可以检查无限"循环,说不要尝试超过1000次左右,然后回退到加1(或其他).

Do step 1 and draw again. You can still check for an "infinite" loop, say don't try more than 1000 times or so, and then fallback to adding 1 (or something else).

在该后备功能得到使用之前,您将赢得彩票.

You'll win the lottery before that fallback ever gets used.

这篇关于在Python中生成非重复随机数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:04
查看更多