问题描述
好的,这是一个比听起来更棘手的问题,所以我转向堆栈溢出,因为我想不出一个好的答案.这就是我想要的:我需要 Python 生成一个简单的数字列表,从 0 到 1,000,000,000,以随机顺序用于序列号(使用随机数,这样您就无法知道分配了多少或进行计时攻击一样容易,即猜测将出现的下一个).这些数字与链接到它们的信息一起存储在数据库表(索引)中.生成它们的程序不会永远运行,因此它不能依赖于内部状态.
Ok this is one of those trickier than it sounds questions so I'm turning to stack overflow because I can't think of a good answer. Here is what I want: I need Python to generate a simple a list of numbers from 0 to 1,000,000,000 in random order to be used for serial numbers (using a random number so that you can't tell how many have been assigned or do timing attacks as easily, i.e. guessing the next one that will come up). These numbers are stored in a database table (indexed) along with the information linked to them. The program generating them doesn't run forever so it can't rely on internal state.
没什么大不了的吧?只需生成一个数字列表,将它们放入一个数组中并使用 Pythonrandom.shuffle(big_number_array)",我们就完成了.问题是我想避免存储数字列表(从而读取文件,从顶部弹出一个,保存文件并关闭它).我宁愿即时生成它们.问题是我能想到的解决方案都有问题:
No big deal right? Just generate a list of numbers, shove them into an array and use Python "random.shuffle(big_number_array)" and we're done. Problem is I'd like to avoid having to store a list of numbers (and thus read the file, pop one off the top, save the file and close it). I'd rather generate them on the fly. Problem is that the solutions I can think of have problems:
1) 生成一个随机数,然后检查它是否已经被使用过.如果它已被使用生成一个新号码,检查,根据需要重复,直到我找到一个未使用的号码.这里的问题是我可能会倒霉,在得到一个未使用的数字之前生成了很多使用过的数字.可能的解决方法:使用非常大的数字池来减少这种可能性(但我最终得到了愚蠢的长数字).
1) Generate a random number and then check if it has already been used. If it has been used generate a new number, check, repeat as needed until I find an unused one. Problem here is that I may get unlucky and generate a lot of used numbers before getting one that is unused. Possible fix: use a very large pool of numbers to reduce the chances of this (but then I end up with silly long numbers).
2) 生成一个随机数,然后检查它是否已经被使用过.如果它已被使用,从数字中加减一并再次检查,继续重复直到我找到一个未使用的数字.问题是这不再是一个随机数,因为我引入了偏差(最终我会得到一堆数字,你将能够以更好的成功机会预测下一个数字).
2) Generate a random number and then check if it has already been used. If it has been used add or subtract one from the number and check again, keep repeating until I hit an unused number. Problem is this is no longer a random number as I have introduced bias (eventually I will get clumps of numbers and you'd be able to predict the next number with a better chance of success).
3) 生成一个随机数,然后检查它是否已经被使用过.如果已使用它添加或减去另一个随机生成的随机数并再次检查,问题是我们回到简单地生成随机数并检查如解决方案 1.
3) Generate a random number and then check if it has already been used. If it has been used add or subtract another randomly generated random number and check again, problem is we're back to simply generating random numbers and checking as in solution 1.
4) 提取它并生成随机列表并保存它,让守护进程将它们放入队列中,以便有可用的数字(并避免不断打开和关闭文件,而是对其进行批处理).
4) Suck it up and generate the random list and save it, have a daemon put them into a Queue so there are numbers available (and avoid constantly opening and closing a file, batching it instead).
5) 生成更大的随机数并对它们进行散列(即使用 MD5)以获得更小的数值,我们应该很少发生冲突,但我最终会再次得到大于所需的数字.
5) Generate much larger random numbers and hash them (i.e. using MD5) to get a smaller numeric value, we should rarely get collisions, but I end up with larger than needed numbers again.
6) 将基于时间的信息预先或附加到随机数(即 unix 时间戳)以减少发生冲突的机会,我再次得到比我需要的更大的数字.
6) Prepend or append time based information to the random number (i.e. unix timestamp) to reduce chances of a collision, again I get larger numbers than I need.
任何人都有任何聪明的想法可以减少碰撞"的机会(即生成一个已经被采用的随机数),但也可以让我保持数字小"(即小于十亿(或为您的欧洲人带来 1 亿美元 =)).
Anyone have any clever ideas that will reduce the chances of a "collision" (i.e. generating a random number that is already taken) but will also allow me to keep the number "small" (i.e. less than a billion (or a thousand million for your europeans =)).
答案以及我接受它的原因:
Answer and why I accepted it:
所以我将简单地使用 1,并希望这不是问题,但是如果是,我将使用生成所有数字并存储它们的确定性解决方案,以便保证获得新的随机数,我可以使用小"数字(即 9 位数字而不是 MD5/等).
So I will simply go with 1, and hope it's not an issue, however if it is I will go with the deterministic solution of generating all the numbers and storing them so that there is a guarentee of getting a new random number, and I can use "small" numbers (i.e. 9 digits instead of an MD5/etc.).
推荐答案
这是一个很好的问题,我已经考虑了一段时间(解决方案类似于 Sjoerd 的),但最后,我的想法是:
This is a neat problem, and I've been thinking about it for a while (with solutions similar to Sjoerd's), but in the end, here's what I think:
运用你的观点 1) 并停止担心.
Use your point 1) and stop worrying.
假设真正的随机性,之前已经选择了一个随机数的概率是先前选择的数字的数量除以池的大小,即最大数量.
Assuming real randomness, the probability that a random number has already been chosen before is the count of previously chosen numbers divided by the size of your pool, i.e. the maximal number.
如果您说您只需要 10 亿个数字,即 9 位数字:请多使用 3 位数字,这样您就有了 12 位数字的序列号(即三组四位数字——很好且易读).
If you say you only need a billion numbers, i.e. nine digits: Treat yourself to 3 more digits, so you have 12-digit serial numbers (that's three groups of four digits – nice and readable).
即使您之前已经接近选择了 10 亿个数字,您的新数字已经被采用的概率仍然只有 0.1%.
Even when you're close to having chosen a billion numbers previously, the probability that your new number is already taken is still only 0,1%.
执行步骤 1 并再次绘制.您仍然可以检查无限"循环,例如不要尝试超过 1000 次左右,然后回退到添加 1(或其他内容).
Do step 1 and draw again. You can still check for an "infinite" loop, say don't try more than 1000 times or so, and then fallback to adding 1 (or something else).
您将在使用该后备之前赢得彩票.
You'll win the lottery before that fallback ever gets used.
这篇关于在 Python 中生成非重复随机数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!